Loading Data in TensorFlow

In order to train a Machine Learning model, you need training data. TensorFlow offers multiple methods of loading datasets. These are your three main options:

  1. Preload data into memory
  2. Feed data at request
  3. Set up a custom data pipeline

Option 1: Preload data into memory

Preload all your data into memory and pass it to TensorFlow

  • Easiest and fastest approach
  • All of the data must fit in your computer’s memory
  • Uses regular Python code, no TensorFlow

Option 2: Feed data at request

Portions of your data are fed to TensorFlow periodically, as TensorFlow requests it, instead of preloading it all at once

  • Gives you more control, but also more manual responsibility
  • Work with larger datasets
  • Normal Python code

Option 3: Set up a custom data pipeline

A data pipeline lets TensorFlow load data into memory by itself as it needs it

  • Can hold as much data as you want
  • Requires some TensorFlow code
  • Supports parallel processing (makes training larger datasets more efficient)

Use the straightforward preloading method if your dataset is smaller. Use the data pipeline if your dataset is extremely large

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s