Load a pandas DataFrame | TensorFlow Core

This tutorial provides examples of how to load giant panda DataFrames into TensorFlow .
You will use a modest heart disease dataset provided by the UCI Machine Learning Repository. There are several hundred rows in the CSV. Each quarrel describes a patient, and each column describes an impute. You will use this information to predict whether a affected role has heart disease, which is a binary star categorization job.

Read data using pandas

import pandas as pd
import tensorflow as tf

SHUFFLE_BUFFER = 500
BATCH_SIZE = 2

Download the CSV file containing the heart disease dataset :

csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/download.tensorflow.org/data/heart.csv')
Downloading data from https&colon//storage.googleapis.com/download.tensorflow.org/data/heart.csv
16384/13273 [=====================================] - 0s 0us/step
24576/13273 [=======================================================] - 0s 0us/step

Read the CSV file using pandas :

df = pd.read_csv(csv_file)

This is what the data looks like :

df.head()
df.dtypes
age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal         object
target        int64
dtype&colon object

You will build models to predict the label contained in the target column .

target = df.pop('target')

A DataFrame as an array

If your datum has a uniform datatype, or dtype, it ‘s potential to use a giant panda DataFrame anywhere you could use a NumPy array. This works because the pandas.DataFrame class supports the __array__ protocol, and TensorFlow ‘s tf.convert_to_tensor routine accepts objects that back the protocol .
Take the numeral features from the dataset ( skip the categorical features for now ) :

numeric_feature_names = ['age', 'thalach', 'trestbps',  'chol', 'oldpeak']
numeric_features = df[numeric_feature_names]
numeric_features.head()

The DataFrame can be converted to a NumPy array using the DataFrame.values property or numpy.array(df). To convert it to a tensor, manipulation tf.convert_to_tensor :

tf.convert_to_tensor(numeric_features)

In general, if an object can be converted to a tensor with tf.convert_to_tensor it can be passed anywhere you can pass a tf.Tensor .

With Model.fit

A DataFrame, interpreted as a individual tensor, can be used immediately as an argument to the Model.fit method acting .
Below is an example of training a model on the numeral features of the dataset .
The inaugural pace is to normalize the input ranges. Use a tf.keras.layers.Normalization layer for that .
To set the layer ‘s mean and standard-deviation before running it be sure to call the Normalization.adapt method :

normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(numeric_features)

Call the layer on the first three rows of the DataFrame to visualize an case of the output from this layer :

normalizer(numeric_features.iloc[:3])

Use the standardization level as the beginning layer of a childlike model :

def get_basic_model():
  model = tf.keras.Sequential([
    normalizer,
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(1)
  ])

  model.compile(optimizer='adam',
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model

When you pass the DataFrame as the x controversy to Model.fit, Keras treats the DataFrame as it would a NumPy align :

model = get_basic_model()
model.fit(numeric_features, target, epochs=15, batch_size=BATCH_SIZE)
Epoch 1/15
152/152 [==============================] - 1s 2ms/step - loss&colon 0.5875 - accuracy&colon 0.7261
Epoch 2/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.5041 - accuracy&colon 0.7294
Epoch 3/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4691 - accuracy&colon 0.7360
Epoch 4/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4530 - accuracy&colon 0.7327
Epoch 5/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4418 - accuracy&colon 0.7624
Epoch 6/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4339 - accuracy&colon 0.7789
Epoch 7/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4284 - accuracy&colon 0.7789
Epoch 8/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4236 - accuracy&colon 0.7921
Epoch 9/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4198 - accuracy&colon 0.7888
Epoch 10/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4154 - accuracy&colon 0.7855
Epoch 11/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4121 - accuracy&colon 0.7888
Epoch 12/15
152/152 [==============================] - 0s 3ms/step - loss&colon 0.4092 - accuracy&colon 0.7822
Epoch 13/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4074 - accuracy&colon 0.7855
Epoch 14/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4061 - accuracy&colon 0.7921
Epoch 15/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4051 - accuracy&colon 0.7888

With tf.data

If you want to apply tf.data transformations to a DataFrame of a uniform dtype, the Dataset.from_tensor_slices method acting will create a dataset that iterates over the rows of the DataFrame. Each row is initially a vector of values. To train a model, you need (inputs, labels) pairs, so base on balls (features, labels) and Dataset.from_tensor_slices will return the need pair of slices :

numeric_dataset = tf.data.Dataset.from_tensor_slices((numeric_features, target))

for row in numeric_dataset.take(3):
  print(row)
(, )
(, )
(, )
numeric_batches = numeric_dataset.shuffle(1000).batch(BATCH_SIZE)

model = get_basic_model()
model.fit(numeric_batches, epochs=15)
Epoch 1/15
152/152 [==============================] - 1s 3ms/step - loss&colon 0.7083 - accuracy&colon 0.7129
Epoch 2/15
152/152 [==============================] - 0s 3ms/step - loss&colon 0.5811 - accuracy&colon 0.7393
Epoch 3/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.5061 - accuracy&colon 0.7426
Epoch 4/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4682 - accuracy&colon 0.7525
Epoch 5/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4481 - accuracy&colon 0.7624
Epoch 6/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4377 - accuracy&colon 0.7756
Epoch 7/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4312 - accuracy&colon 0.7822
Epoch 8/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4283 - accuracy&colon 0.7921
Epoch 9/15
152/152 [==============================] - 0s 3ms/step - loss&colon 0.4264 - accuracy&colon 0.7921
Epoch 10/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4237 - accuracy&colon 0.7987
Epoch 11/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4224 - accuracy&colon 0.7921
Epoch 12/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4210 - accuracy&colon 0.7921
Epoch 13/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4195 - accuracy&colon 0.7987
Epoch 14/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4191 - accuracy&colon 0.7921
Epoch 15/15
152/152 [==============================] - 0s 2ms/step - loss&colon 0.4170 - accuracy&colon 0.7954

A DataFrame as a dictionary

When you start dealing with heterogeneous data, it is no longer possible to treat the DataFrame as if it were a unmarried range. TensorFlow tensors require that all elements have the lapp dtype .
thus, in this case, you need to start treating it as a dictionary of column, where each column has a uniform dtype. A DataFrame is a fortune like a dictionary of arrays, so typically all you need to do is cast the DataFrame to a Python dict. many important TensorFlow APIs support ( nested- ) dictionaries of arrays as inputs .
tf.data input pipelines handle this quite well. All tf.data operations handle dictionaries and tuples mechanically. therefore, to make a dataset of dictionary-examples from a DataFrame, just cast it to a dict before slicing it with Dataset.from_tensor_slices :

numeric_dict_ds = tf.data.Dataset.from_tensor_slices((dict(numeric_features), target))

hera are the first three examples from that dataset :

for row in numeric_dict_ds.take(3):
  print(row)
({'age'&colon , 'thalach'&colon , 'trestbps'&colon , 'chol'&colon , 'oldpeak'&colon }, )
({'age'&colon , 'thalach'&colon , 'trestbps'&colon , 'chol'&colon , 'oldpeak'&colon }, )
({'age'&colon , 'thalach'&colon , 'trestbps'&colon , 'chol'&colon , 'oldpeak'&colon }, )

Dictionaries with Keras

typically, Keras models and layers expect a single remark tensor, but these classes can accept and return nest structures of dictionaries, tuples and tensors. These structures are known as “ nests ” ( refer to the tf.nest module for details ) .
There are two equivalent ways you can write a Keras model that accepts a dictionary as input signal .

1. The Model-subclass style

You write a subclass of tf.keras.Model ( or tf.keras.Layer ). You directly handle the inputs, and create the outputs :

def stack_dict(inputs, fun=tf.stack):
    values = []
    for key in sorted(inputs.keys()):
      values.append(tf.cast(inputs[key], tf.float32))

    return fun(values, axis=-1)
class MyModel(tf.keras.Model):
  def __init__(self):
    # Create all the internal layers in init.
    super().__init__(self)

    self.normalizer = tf.keras.layers.Normalization(axis=-1)

    self.seq = tf.keras.Sequential([
      self.normalizer,
      tf.keras.layers.Dense(10, activation='relu'),
      tf.keras.layers.Dense(10, activation='relu'),
      tf.keras.layers.Dense(1)
    ])

  def adapt(self, inputs):
    # Stack the inputs and `adapt` the normalization layer.
    inputs = stack_dict(inputs)
    self.normalizer.adapt(inputs)

  def call(self, inputs):
    # Stack the inputs
    inputs = stack_dict(inputs)
    # Run them through all the layers.
    result = self.seq(inputs)

    return result

model = MyModel()

model.adapt(dict(numeric_features))

model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'],
              run_eagerly=True)

This model can accept either a dictionary of column or a dataset of dictionary-elements for trail :

model.fit(dict(numeric_features), target, epochs=5, batch_size=BATCH_SIZE)
Epoch 1/5
152/152 [==============================] - 3s 16ms/step - loss&colon 0.6437 - accuracy&colon 0.7261
Epoch 2/5
152/152 [==============================] - 2s 16ms/step - loss&colon 0.5672 - accuracy&colon 0.7261
Epoch 3/5
152/152 [==============================] - 2s 16ms/step - loss&colon 0.5162 - accuracy&colon 0.7261
Epoch 4/5
152/152 [==============================] - 2s 16ms/step - loss&colon 0.4815 - accuracy&colon 0.7360
Epoch 5/5
152/152 [==============================] - 2s 16ms/step - loss&colon 0.4577 - accuracy&colon 0.7492

numeric_dict_batches = numeric_dict_ds.shuffle(SHUFFLE_BUFFER).batch(BATCH_SIZE)
model.fit(numeric_dict_batches, epochs=5)
Epoch 1/5
152/152 [==============================] - 2s 14ms/step - loss&colon 0.4464 - accuracy&colon 0.7591
Epoch 2/5
152/152 [==============================] - 2s 14ms/step - loss&colon 0.4374 - accuracy&colon 0.7756
Epoch 3/5
152/152 [==============================] - 2s 14ms/step - loss&colon 0.4335 - accuracy&colon 0.7822
Epoch 4/5
152/152 [==============================] - 2s 14ms/step - loss&colon 0.4297 - accuracy&colon 0.7855
Epoch 5/5
152/152 [==============================] - 2s 14ms/step - loss&colon 0.4271 - accuracy&colon 0.7855

here are the predictions for the first three examples :

model.predict(dict(numeric_features.iloc[:3]))
array([[[-0.34175086]],

       [[-0.13775246]],

       [[ 0.53759265]]], dtype=float32)

2. The Keras functional style

inputs = {}
for name, column in numeric_features.items():
  inputs[name] = tf.keras.Input(
      shape=(1,), name=name, dtype=tf.float32)

inputs
{'age'&colon ,
 'thalach'&colon ,
 'trestbps'&colon ,
 'chol'&colon ,
 'oldpeak'&colon }
x = stack_dict(inputs, fun=tf.concat)

normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(stack_dict(dict(numeric_features)))

x = normalizer(x)
x = tf.keras.layers.Dense(10, activation='relu')(x)
x = tf.keras.layers.Dense(10, activation='relu')(x)
x = tf.keras.layers.Dense(1)(x)

model = tf.keras.Model(inputs, x)

model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'],
              run_eagerly=True)
tf.keras.utils.plot_model(model, rankdir="LR", show_shapes=True)

png
You can train the functional model the same way as the model subclass :

model.fit(dict(numeric_features), target, epochs=5, batch_size=BATCH_SIZE)
Epoch 1/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.6568 - accuracy&colon 0.7261
Epoch 2/5
152/152 [==============================] - 2s 16ms/step - loss&colon 0.6069 - accuracy&colon 0.7261
Epoch 3/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.5605 - accuracy&colon 0.7261
Epoch 4/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.5224 - accuracy&colon 0.7261
Epoch 5/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.4906 - accuracy&colon 0.7261

numeric_dict_batches = numeric_dict_ds.shuffle(SHUFFLE_BUFFER).batch(BATCH_SIZE)
model.fit(numeric_dict_batches, epochs=5)
Epoch 1/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.4719 - accuracy&colon 0.7294
Epoch 2/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.4600 - accuracy&colon 0.7459
Epoch 3/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.4526 - accuracy&colon 0.7459
Epoch 4/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.4471 - accuracy&colon 0.7558
Epoch 5/5
152/152 [==============================] - 2s 15ms/step - loss&colon 0.4419 - accuracy&colon 0.7558

Full example

If you’re passing a heterogeneous DataFrame to Keras, each column may need singular preprocessing. You could do this preprocessing immediately in the DataFrame, but for a model to work correctly, inputs constantly need to be preprocessed the same way. then, the best set about is to build the preprocessing into the model. Keras preprocessing layers overlay many coarse tasks .

Build the preprocessing head

In this dataset some of the “ integer ” features in the raw data are actually categorical indices. These indices are not in truth order numeric values ( refer to the the dataset description for details ). Because these are disordered they are inappropriate to feed directly to the exemplar ; the model would interpret them as being ordered. To use these inputs you ‘ll need to encode them, either as one-hot vectors or embedding vectors. The lapp applies to string-categorical features .
Note: If you have many features that need identical preprocessing it ‘s more efficient to concatenate them together before applying the preprocessing .
binary features on the other bridge player do not by and large need to be encoded or normalized .
begin by by creating a tilt of the features that fall into each group :

binary_feature_names = ['sex', 'fbs', 'exang']
categorical_feature_names = ['cp', 'restecg', 'slope', 'thal', 'ca']

The future step is to build a preprocessing model that will apply allow preprocessing to each input and concatenate the results .
This section uses the Keras Functional API to implement the preprocessing. You start by creating one tf.keras.Input for each column of the dataframe :

inputs = {}
for name, column in df.items():
  if type(column[0]) == str:
    dtype = tf.string
  elif (name in categorical_feature_names or
        name in binary_feature_names):
    dtype = tf.int64
  else:
    dtype = tf.float32

  inputs[name] = tf.keras.Input(shape=(), name=name, dtype=dtype)
inputs
{'age'&colon ,
 'sex'&colon ,
 'cp'&colon ,
 'trestbps'&colon ,
 'chol'&colon ,
 'fbs'&colon ,
 'restecg'&colon ,
 'thalach'&colon ,
 'exang'&colon ,
 'oldpeak'&colon ,
 'slope'&colon ,
 'ca'&colon ,
 'thal'&colon }

For each input you ‘ll apply some transformations using Keras layers and TensorFlow ops. Each feature of speech starts as a batch of scalars ( shape=(batch,) ). The output for each should be a batch of tf.float32 vectors ( shape=(batch, n) ). The last gradation will concatenate all those vectors in concert .

Binary inputs

Since the binary inputs do n’t need any preprocessing, fair add the vector axis, cast them to float32 and add them to the list of preprocessed inputs :

preprocessed = []

for name in binary_feature_names:
  inp = inputs[name]
  inp = inp[:, tf.newaxis]
  float_value = tf.cast(inp, tf.float32)
  preprocessed.append(float_value)

preprocessed
[,
 ,
 ]

Numeric inputs

Like in the earlier section you ‘ll want to run these numeral inputs through a tf.keras.layers.Normalization layer before using them. The deviation is that this clock they ‘re input as a dict. The code below collects the numeral features from the DataFrame, stacks them together and passes those to the Normalization.adapt method acting .

normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(stack_dict(dict(numeric_features)))

The code below stacks the numeric features and runs them through the standardization layer .

numeric_inputs = {}
for name in numeric_feature_names:
  numeric_inputs[name]=inputs[name]

numeric_inputs = stack_dict(numeric_inputs)
numeric_normalized = normalizer(numeric_inputs)

preprocessed.append(numeric_normalized)

preprocessed
[,
 ,
 ,
 ]

Categorical features

To use categorical features you ‘ll first gear need to encode them into either binary star vectors or embeddings. Since these features only contain a small number of categories, convert the inputs directly to one-hot vectors using the output_mode='one_hot' option, supported by both the tf.keras.layers.StringLookup and tf.keras.layers.IntegerLookup layers .
here is an exemplar of how these layers work :

vocab = ['a','b','c']
lookup = tf.keras.layers.StringLookup(vocabulary=vocab, output_mode='one_hot')
lookup(['c','a','a','b','zzz'])

vocab = [1,4,7,99]
lookup = tf.keras.layers.IntegerLookup(vocabulary=vocab, output_mode='one_hot')

lookup([-1,4,1])

To determine the vocabulary for each stimulation, create a layer to convert that vocabulary to a one-hot vector :

for name in categorical_feature_names:
  vocab = sorted(set(df[name]))
  print(f'name: {name}')
  print(f'vocab: {vocab}\n')

  if type(vocab[0]) is str:
    lookup = tf.keras.layers.StringLookup(vocabulary=vocab, output_mode='one_hot')
  else:
    lookup = tf.keras.layers.IntegerLookup(vocabulary=vocab, output_mode='one_hot')

  x = inputs[name][:, tf.newaxis]
  x = lookup(x)
  preprocessed.append(x)
name&colon cp
vocab&colon [0, 1, 2, 3, 4]

name&colon restecg
vocab&colon [0, 1, 2]

name&colon slope
vocab&colon [1, 2, 3]

name&colon thal
vocab&colon ['1', '2', 'fixed', 'normal', 'reversible']

name&colon ca
vocab&colon [0, 1, 2, 3]

Assemble the preprocessing head

At this point preprocessed is merely a Python list of all the preprocessing results, each result has a shape of (batch_size, depth) :

preprocessed
[,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ]

Concatenate all the preprocessed features along the depth axis, so each dictionary-example is converted into a single vector. The vector contains categoric features, numeral features, and categoric one-hot features :

preprocesssed_result = tf.concat(preprocessed, axis=-1)
preprocesssed_result

now create a mannequin out of that calculation so it can be reused :

preprocessor = tf.keras.Model(inputs, preprocesssed_result)
tf.keras.utils.plot_model(preprocessor, rankdir="LR", show_shapes=True)

png
To test the preprocessor, use the DataFrame.iloc accessor to slice the first exercise from the DataFrame. then convert it to a dictionary and pass the dictionary to the preprocessor. The result is a individual vector containing the binary features, normalized numeric features and the one-hot categoric features, in that order :

preprocessor(dict(df.iloc[:1]))

Create and train a model

now build the main body of the model. Use the lapp shape as in the previous exercise : A couple of Dense rectified-linear layers and a Dense(1) output signal layer for the classification.

body = tf.keras.Sequential([
  tf.keras.layers.Dense(10, activation='relu'),
  tf.keras.layers.Dense(10, activation='relu'),
  tf.keras.layers.Dense(1)
])

nowadays put the two pieces together using the Keras functional API .

inputs
{'age'&colon ,
 'sex'&colon ,
 'cp'&colon ,
 'trestbps'&colon ,
 'chol'&colon ,
 'fbs'&colon ,
 'restecg'&colon ,
 'thalach'&colon ,
 'exang'&colon ,
 'oldpeak'&colon ,
 'slope'&colon ,
 'ca'&colon ,
 'thal'&colon }
x = preprocessor(inputs)
x

result = body(x)
result

model = tf.keras.Model(inputs, result)

model.compile(optimizer='adam',
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=['accuracy'])

This model expects a dictionary of inputs. The simplest direction to pass it the data is to convert the DataFrame to a dict and guide that dict as the x argument to Model.fit :

history = model.fit(dict(df), target, epochs=5, batch_size=BATCH_SIZE)
Epoch 1/5
152/152 [==============================] - 1s 4ms/step - loss&colon 0.6572 - accuracy&colon 0.7294
Epoch 2/5
152/152 [==============================] - 1s 4ms/step - loss&colon 0.5407 - accuracy&colon 0.7261
Epoch 3/5
152/152 [==============================] - 1s 4ms/step - loss&colon 0.4357 - accuracy&colon 0.7261
Epoch 4/5
152/152 [==============================] - 1s 4ms/step - loss&colon 0.3783 - accuracy&colon 0.7261
Epoch 5/5
152/152 [==============================] - 1s 4ms/step - loss&colon 0.3467 - accuracy&colon 0.7591

Using tf.data works vitamin a well :

ds = tf.data.Dataset.from_tensor_slices((
    dict(df),
    target
))

ds = ds.batch(BATCH_SIZE)
import pprint

for x, y in ds.take(1):
  pprint.pprint(x)
  print()
  print(y)
{'age'&colon ,
 'ca'&colon ,
 'chol'&colon ,
 'cp'&colon ,
 'exang'&colon ,
 'fbs'&colon ,
 'oldpeak'&colon ,
 'restecg'&colon ,
 'sex'&colon ,
 'slope'&colon ,
 'thal'&colon ,
 'thalach'&colon ,
 'trestbps'&colon }

tf.Tensor([0 1], shape=(2,), dtype=int64)
history = model.fit(ds, epochs=5)
Epoch 1/5
152/152 [==============================] - 1s 6ms/step - loss&colon 0.3249 - accuracy&colon 0.7822
Epoch 2/5
152/152 [==============================] - 1s 6ms/step - loss&colon 0.3092 - accuracy&colon 0.8053
Epoch 3/5
152/152 [==============================] - 1s 6ms/step - loss&colon 0.2961 - accuracy&colon 0.8449
Epoch 4/5
152/152 [==============================] - 1s 6ms/step - loss&colon 0.2844 - accuracy&colon 0.8416
Epoch 5/5
152/152 [==============================] - 1s 6ms/step - loss&colon 0.2757 - accuracy&colon 0.8548
source : https://coinselected
Category : coin 4u

Leave a Reply

Your email address will not be published.