Today's deep learning accelerators are blazing fast. Google's TPU and Nvidia's V100 are destroying benchmark tests. 

To saturate these beasts, we need even-faster data pipelines to feed their thirst.  However, these data pipelines are burdened by slow networks, remote storage (S3, HDFS), and CPU-based data transformations.

There are really only a few ways to increase the performance of your data pipelines to saturate your GPUs:

  • Do less transformations
  • Move the data closer to the GPUs
  • Parallelize your pipeline

These optimizations are enabled by the following layer:

Did this answer your question?