Pre-Pascal architecture, using cudaMallocManaged() can be 2x slower than cudaMalloc().

Post-Pascal architecture, cudeMallocManaged() is faster. (Results TBD.)

Did this answer your question?