Today's deep learning accelerators are blazing fast. Google's TPU and Nvidia's V100 are destroying benchmark tests.
To saturate these beasts, we need even-faster data pipelines to feed their thirst. However, these data pipelines are burdened by slow networks, remote storage (S3, HDFS), and CPU-based data transformations.
There are really only a few ways to increase the performance of your data pipelines to saturate your GPUs:
- Do less transformations
- Move the data closer to the GPUs
- Parallelize your pipeline
These optimizations are enabled by the following layer: