In order to train a Machine Learning model, you need training data. TensorFlow offers multiple methods of loading datasets. These are your three main options:
- Preload data into memory
- Feed data at request
- Set up a custom data pipeline
Option 1: Preload data into memory
Preload all your data into memory and pass it to TensorFlow
- Easiest and fastest approach
- All of the data must fit in your computer’s memory
- Uses regular Python code, no TensorFlow
Option 2: Feed data at request
Portions of your data are fed to TensorFlow periodically, as TensorFlow requests it, instead of preloading it all at once
- Gives you more control, but also more manual responsibility
- Work with larger datasets
- Normal Python code
Option 3: Set up a custom data pipeline
A data pipeline lets TensorFlow load data into memory by itself as it needs it
- Can hold as much data as you want
- Requires some TensorFlow code
- Supports parallel processing (makes training larger datasets more efficient)
Use the straightforward preloading method if your dataset is smaller. Use the data pipeline if your dataset is extremely large