Loading Data in TensorFlow

jnbowen Data Science June 6, 2019November 25, 2020 1 Minute

In order to train a Machine Learning model, you need training data. TensorFlow offers multiple methods of loading datasets. These are your three main options:

Preload data into memory
Feed data at request
Set up a custom data pipeline

Option 1: Preload data into memory

Preload all your data into memory and pass it to TensorFlow

Easiest and fastest approach
All of the data must fit in your computer’s memory
Uses regular Python code, no TensorFlow

Option 2: Feed data at request

Portions of your data are fed to TensorFlow periodically, as TensorFlow requests it, instead of preloading it all at once

Gives you more control, but also more manual responsibility
Work with larger datasets
Normal Python code

Option 3: Set up a custom data pipeline

A data pipeline lets TensorFlow load data into memory by itself as it needs it

Can hold as much data as you want
Requires some TensorFlow code
Supports parallel processing (makes training larger datasets more efficient)

Use the straightforward preloading method if your dataset is smaller. Use the data pipeline if your dataset is extremely large

Published by jnbowen

View all posts by jnbowen

Published June 6, 2019November 25, 2020

Leave a comment Cancel reply