Keras FAQ

Keras FAQ

A number of frequently Asked Keras Questions .

General questions

Training-related questions

Modeling-related questions

General questions

How can I train a Keras model on multiple GPUs (on a single machine)?

There are two ways to run a unmarried model on multiple GPUs : data parallelism and device parallelism. In most cases, what you need is most likely data parallelism .
1) Data parallelism
Data parallelism consists in replicating the target mannequin once on each device, and using each replica to process a different fraction of the input signal data.

Reading: Keras FAQ

The best room to do data parallelism with Keras models is to use the tf.distribute API. Make indisputable to read our usher about using [ tf.distribute ] ( hypertext transfer protocol : //www.tensorflow.org/api_docs/python/tf/distribute ) with Keras .
The kernel of it is the keep up :
a ) instantiate a “ distribution strategy ” object, e.g. MirroredStrategy ( which replicates your model on each available device and keeps the state of each model in synchronize ) :

 scheme  =  tf. distribute. MirroredStrategy ( )

b ) Create your model and compile it under the scheme ‘s oscilloscope :

 with  scheme. scope ( ) :
     # This could be any kind of model -- Functional, subclass ...
     model  =  tf. keras. consecutive ( [
         tf. keras. layers. Conv2D ( 32,  3,  activation = 'relu ',  input_shape = ( 28,  28,  1 ) ) ,
         tf. keras. layers. GlobalMaxPooling2D ( ) ,
         tf. keras. layers. dense ( 10 )
     ] )
     exemplary. roll up ( loss = tf. keras. losses. SparseCategoricalCrossentropy ( from_logits = true ) ,
                   optimizer = tf. keras. optimizers. adam ( ) ,
                   metrics = [ tf. keras. metrics. SparseCategoricalAccuracy ( ) ] )

note that it ‘s significant that all state varying creation should happen under the setting. so in case you create any extra variables, do that under the oscilloscope .
c ) Call fit() with a tf.data.Dataset object as input. distribution is broadly compatible with all callbacks, including custom-made callbacks. notice that this call does not need to be under the scheme setting, since it does n’t create new variables .

 model. fit ( train_dataset,  era = 12,  callbacks = callbacks )

2) Model parallelism
Model parallelism consists in running different parts of a lapp model on different devices. It works best for models that have a analogue architecture, e.g. a exemplary with two branches .
This can be achieved by using TensorFlow device scopes. here is a quick exemplar :

 # Model where a shared LSTM is used to encode two different sequences in latitude
 input_a  =  keras. input ( shape = ( 140,  256 ) )
 input_b  =  keras. stimulation ( shape = ( 140,  256 ) )

 shared_lstm  =  keras. layers. LSTM ( 64 )

 # Process the first sequence on one GPU
 with  tf. device_scope ( '/gpu:0 ' ) :
     encoded_a  =  shared_lstm ( input_a )
 # Process the adjacent sequence on another GPU
 with  tf. device_scope ( '/gpu:1 ' ) :
     encoded_b  =  shared_lstm ( input_b )

 # Concatenate results on CPU
 with  tf. device_scope ( '/cpu:0 ' ) :
     merged_vector  =  keras. layers. concatenate (
         [ encoded_a,  encoded_b ],  axis =- 1 )

How can I distribute training across multiple machines?

TensorFlow 2 enables you to write code that is largely agnostic to how you will distribute it : any code that can run locally can be distributed to multiple workers and accelerators by alone adding to it a distribution strategy ( tf.distribute.Strategy ) corresponding to your hardware of option, without any other code changes .
This besides applies to any Keras model : barely add a tf.distribute distribution scheme scope enclosing the mannequin build and roll up code, and the trail will be distributed according to the tf.distribute distribution scheme .
For distribute aim across multiple machines ( as opposed to training that lone leverages multiple devices on a single machine ), there are two distribution strategies you could use : MultiWorkerMirroredStrategy and ParameterServerStrategy :

  • tf.distribute.MultiWorkerMirroredStrategy implements a synchronous CPU/GPU
    multi-worker solution to work with Keras-style model building and training loop,
    using synchronous reduction of gradients across the replicas.
  • tf.distribute.experimental.ParameterServerStrategy implements an asynchronous CPU/GPU
    multi-worker solution, where the parameters are stored on parameter servers, and
    workers update the gradients to parameter servers asynchronously.

Distributed prepare is slightly more imply than single-machine multi-device training. With ParameterServerStrategy, you will need to launch a distant cluster of machines consisting “ actor ” and “ p ”, each running a tf.distribute.Server, then run your python program on a “ foreman ” machine that holds a TF_CONFIG environment variable that specifies how to communicate with the other machines in the bunch. With MultiWorkerMirroredStrategy, you will run the same program on each of the chief and workers, again with a TF_CONFIG environment variable star that specifies how to communicate with the bunch. From there, the work flow is like to using single-machine prepare, with the main remainder being that you will use ParameterServerStrategy or MultiWorkerMirroredStrategy as your distribution strategy .
importantly, you should :

  • Make sure your dataset is so configured that all workers in the cluster are able to
    efficiently pull data from it (e.g. if your cluster is running on Google Cloud,
    it’s a good idea to host your data on Google Cloud Storage).
  • Make sure your training is fault-tolerant
    (e.g. by configuring a keras.callbacks.BackupAndRestore callback).

Below, we provide a copulate of code snippets that cover the basic work flow. For more data about CPU/GPU multi-worker trail, see Multi-GPU and distributed aim ; for TPU discipline, see How can I train a Keras model on TPU ? .
With ParameterServerStrategy :

 cluster_resolver  =  ...
 if  cluster_resolver. task_type  in  ( `` proletarian '',  `` phosphorus '' ) :
   # Start a [ `tf.distribute.Server` ] ( hypertext transfer protocol : //www.tensorflow.org/api_docs/python/tf/distribute/Server ) and wait .
   ...
 elif  cluster_resolver. task_type  ==  `` evaluator '' :
   # Run an ( optional ) side-car evaluation
   ...

 # otherwise, this is the coordinator that controls the aim w/ the scheme .
 scheme  =  tf. distribute. experimental. ParameterServerStrategy (
     cluster_resolver = ... )
 train_dataset  =  ...

 with  scheme. oscilloscope ( ) :
   model  =  tf. keras. consecutive ( [
       layers. Conv2D ( 32,  3,  energizing = 'relu ',  input_shape = ( 28,  28,  1 ) ) ,
       layers. MaxPooling2D ( ) ,
       layers. Flatten ( ) ,
       layers. dense ( 64,  activation = 'relu ' ) ,
       layers. dense ( 10,  energizing = 'softmax ' )
   ] )
   model. compile (
       loss = 'sparse_categorical_crossentropy ' ,
       optimizer = tf. keras. optimizers. SGD ( learning_rate = 0.001 ) ,
       metrics = [ 'accuracy ' ] ,
       steps_per_execution = 10 )

 model. burst ( adam = train_dataset,  era = 3,  steps_per_epoch = 100 )

With MultiWorkerMirroredStrategy :

 # By default `MultiWorkerMirroredStrategy` uses bunch information
 # from `TF_CONFIG`, and `` AUTO '' collective op communication .
 scheme  =  tf. distribute. experimental. MultiWorkerMirroredStrategy ( )
 train_dataset  =  get_training_dataset ( )
 with  strategy. setting ( ) :
   # Define and compile the model in the scope of the scheme. Doing so
   # ensures the variables created are distributed and initialized by rights
   # according to the scheme .
   model  =  tf. keras. consecutive ( [
       layers. Conv2D ( 32,  3,  activation = 'relu ',  input_shape = ( 28,  28,  1 ) ) ,
       layers. MaxPooling2D ( ) ,
       layers. Flatten ( ) ,
       layers. dense ( 64,  energizing = 'relu ' ) ,
       layers. dense ( 10,  activation = 'softmax ' )
   ] )
   model. roll up (
       loss = 'sparse_categorical_crossentropy ' ,
       optimizer = tf. keras. optimizers. SGD ( learning_rate = 0.001 ) ,
       metrics = [ 'accuracy ' ] )
 model. fit ( x = train_dataset,  era = 3,  steps_per_epoch = 100 )

How can I train a Keras model on TPU?

TPUs are a fast & effective hardware accelerator for deep eruditeness that is publicly available on Google Cloud. You can use TPUs via Colab, AI Platform ( ML Engine ), and Deep Learning VMs ( provided the TPU_NAME environment varying is set on the VM ) .
Make certain to read the TPU use guide first. here ‘s a promptly drumhead :
After connecting to a TPU runtime ( e.g. by selecting the TPU runtime in Colab ), you will need to detect your TPU using a TPUClusterResolver, which mechanically detects a linked TPU on all supported platforms :

 tpu  =  tf. distribute. cluster_resolver. TPUClusterResolver ( )   # TPU detection
 print ( 'Running on TPU : ',  tpu. cluster_spec ( ). as_dict ( ) [ 'worker ' ] )

 tf. config. experimental_connect_to_cluster ( tpu )
 tf. tpu. experimental. initialize_tpu_system ( tpu )
 scheme  =  tf. distribute. experimental. TPUStrategy ( tpu )
 print ( 'Replicas : ',  scheme. num_replicas_in_sync )

 with  strategy. telescope ( ) :
     # Create your model hera .
     ...

After the initial setup, the work flow is like to using single-machine multi-GPU educate, with the main dispute being that you will use TPUStrategy as your distribution strategy .
importantly, you should :

  • Make sure your dataset yields batches with a fixed static shape. A TPU graph can only process inputs with a constant shape.
  • Make sure you are able to read your data fast enough to keep the TPU utilized. Using the TFRecord format to store your data may be a good idea.
  • Consider running multiple steps of gradient descent per graph execution in order to keep the TPU utilized. You can do this via the experimental_steps_per_execution argument compile(). It will yield a significant speed up for small models.

Where is the Keras configuration file stored?

The default option directory where all Keras data is stored is :
$HOME/.keras/
For exemplify, for me, on a MacBook Pro, it ‘s /Users/fchollet/.keras/ .
note that Windows users should replace $HOME with %USERPROFILE% .
In case Keras can not create the above directory ( e.g. due to permission issues ), /tmp/.keras/ is used as a stand-in .
The Keras shape charge is a JSON file stored at $HOME/.keras/keras.json. The default shape file looks like this :

{
    "image_data_format": "channels_last",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

It contains the follow fields :

  • The image data format to be used as default by image processing layers and utilities (either channels_last or channels_first).
  • The epsilon numerical fuzz factor to be used to prevent division by zero in some operations.
  • The default float data type.
  • The default backend. This is legacy; nowadays there is only TensorFlow.

besides, cached dataset files, such as those downloaded with get_file(), are stored by nonpayment in $HOME/.keras/datasets/, and cached model weights files from Keras Applications are stored by nonpayment in $HOME/.keras/models/ .

How to do hyperparameter tuning with Keras?

We recommend using KerasTuner .

How can I obtain reproducible results using Keras during development?

During development of a model, sometimes it is utilitarian to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data change, or merely a result of a newly random seed .
inaugural, you need to set the PYTHONHASHSEED environment variable to 0 before the plan starts ( not within the plan itself ). This is necessary in Python 3.2.3 onwards to have reproducible behavior for certain hash-based operations ( for example, the item order in a set or a dict, see Python ‘s software documentation or publish # 2280 for farther details ). One room to set the environment varying is when starting python like this :

$ cat test_hash.py
print (hash ( `` keras '' ) )
$ python3 test_hash.py                   # non-reproducible hash ( Python 3.2.3+ )
 8127205062320133199
$ python3 test_hash.py                   # non-reproducible hash ( Python 3.2.3+ )
 3204480642156461591
$  PYTHONHASHSEED = 0 python3 test_hash.py  # reproducible hash
 4883664951434749476
$  PYTHONHASHSEED = 0 python3 test_hash.py  # reproducible hashish
 4883664951434749476

furthermore, when running on a GPU, some operations have non-deterministic outputs, in particular tf.reduce_sum(). This is due to the fact that GPUs run many operations in parallel, so the order of execution is not constantly guaranteed. due to the limited preciseness of floats, even adding several numbers together may give slightly different results depending on the order in which you add them. You can try to avoid the non-deterministic operations, but some may be created automatically by TensorFlow to compute the gradients, so it is much childlike to good run the code on the CPU. For this, you can set the CUDA_VISIBLE_DEVICES environment variable to an vacate string, for model :

$  CUDA_VISIBLE_DEVICES = `` ``  PYTHONHASHSEED = 0 python your_program.py

The under snip of code provides an exemplar of how to obtain reproducible results :

 significance  numpy  as  nurse practitioner
 significance  tensorflow  as  tf
 import  random  as  python_random

 # The downstairs is necessity for starting Numpy generated random numbers
 # in a chiseled initial state .
 nurse practitioner. random. seed ( 123 )

 # The below is necessary for starting effect Python generated random numbers
 # in a well-defined state .
 python_random. seed ( 123 )

 # The below set_seed ( ) will make random number generation
 # in the TensorFlow backend have a well-defined initial department of state .
 # For further details, determine :
 # hypertext transfer protocol : //www.tensorflow.org/api_docs/python/tf/random/set_seed
 tf. random. set_seed ( 1234 )

 # Rest of code follows ...

note that you do n’t have to set seeds for person initializers in your code if you do the steps above, because their seeds are determined by the combination of the seeds set above .

What are my options for saving models?

note : it is not recommended to use pickle or cPickle to save a Keras model .
1) Whole-model saving (configuration + weights)
Whole-model spare means creating a file that will contain :

  • the architecture of the model, allowing you to re-create the model
  • the weights of the model
  • the training configuration (loss, optimizer)
  • the state of the optimizer, allowing you to resume training exactly where you left off.

The default and recommended format to use is the TensorFlow SavedModel format. In TensorFlow 2.0 and higher, you can equitable do : model.save(your_file_path) .
For explicitness, you can besides use model.save(your_file_path, save_format='tf') .
Keras even supports its original HDF5-based saving format. To save a model in HDF5 format, use model.save(your_file_path, save_format='h5'). note that this choice is automatically used if your_file_path ends in .h5 or .keras. Please besides see How can I install HDF5 or h5py to save my models ? for instructions on how to install h5py .
After saving a model in either format, you can reinstantiate it via model = keras.models.load_model(your_file_path) .
Example:

 from  tensorflow.keras.models  import  load_model

 model. save ( 'my_model ' )   # creates a HDF5 file 'my_model.h5 '
 del  model   # deletes the existing model

 # returns a compile model
 # identical to the previous one
 model  =  load_model ( 'my_model ' )

2) Weights-only saving
If you need to save the weights of a model, you can do then in HDF5 with the code below :

 model. save_weights ( 'my_model_weights.h5 ' )

Assuming you have code for instantiating your model, you can then load the weights you saved into a exemplary with the same computer architecture :

 model. load_weights ( 'my_model_weights.h5 ' )

If you need to load the weights into a different computer architecture ( with some layers in common ), for case for calibrate or transfer-learning, you can load them by layer appoint :

 exemplary. load_weights ( 'my_model_weights.h5 ',  by_name = true )

Example :

 `` `` ''
 Assuming the original model looks like this :

 model = Sequential ( )
 model.add ( Dense ( 2, input_dim=3, name='dense_1 ' ) )
 model.add ( Dense ( 3, name='dense_2 ' ) )
 ...
 model.save_weights ( fname )
 `` `` ''

 # newfangled model
 model  =  consecutive ( )
 model. add ( dense ( 2,  input_dim = 3,  list = 'dense_1 ' ) )   # will be loaded
 model. add ( dense ( 10,  appoint = 'new_dense ' ) )   # will not be loaded

 # lode weights from the first gear exemplary ; will alone affect the first layer, dense_1 .
 model. load_weights ( fname,  by_name = true )

Please besides see How can I install HDF5 or h5py to save my models ? for instructions on how to install h5py .
3) Configuration-only saving (serialization)
If you only need to save the architecture of a model, and not its weights or its train configuration, you can do :

 # save as JSON
 json_string  =  model. to_json ( )

The generate JSON file is human-readable and can be manually edited if needed .
You can then build a fresh model from this datum :

 # model reconstruction from JSON :
 from  tensorflow.keras.models  consequence  model_from_json
 model  =  model_from_json ( json_string )

4) Handling custom layers (or other custom objects) in saved models
If the model you want to load includes custom-made layers or other custom classes or functions, you can pass them to the loading mechanism via the custom_objects argument :

 from  tensorflow.keras.models  import  load_model
 # Assuming your model includes case of an `` AttentionLayer '' class
 model  =  load_model ( 'my_model.h5 ',  custom_objects = { 'AttentionLayer ' :  AttentionLayer } )

alternatively, you can use a custom object oscilloscope :

 from  tensorflow.keras.utils  import  CustomObjectScope

 with  CustomObjectScope ( { 'AttentionLayer ' :  AttentionLayer } ) :
     model  =  load_model ( 'my_model.h5 ' )

Custom objects handling works the lapp room for load_model & model_from_json :

 from  tensorflow.keras.models  consequence  model_from_json
 model  =  model_from_json ( json_string,  custom_objects = { 'AttentionLayer ' :  AttentionLayer } )

How can I install HDF5 or h5py to save my models?

In order to save your Keras models as HDF5 files, Keras uses the h5py Python package. It is a addiction of Keras and should be installed by default. On Debian-based distributions, you will have to additionally install libhdf5 :

sudo apt-get install libhdf5-serial-dev

If you are diffident if h5py is installed you can open a Python blast and load the module via

import h5py

If it imports without error it is installed, otherwise you can find detail initiation instructions here .

How should I cite Keras?

Please quote Keras in your publications if it helps your inquiry. here is an exercise BibTeX submission :

@misc{chollet2015keras,
  title={Keras},
  author={Chollet, Fran\c{c}ois and others},
  year={2015},
  howpublished={\url{https://coinselected}},
}

Training-related questions

What do “sample”, “batch”, and “epoch” mean?

Below are some common definitions that are necessity to know and understand to correctly utilize Keras fit() :

  • Sample: one element of a dataset. For instance, one image is a sample in a convolutional network. One audio snippet is a sample for a speech recognition model.
  • Batch: a set of N samples. The samples in a batch are processed independently, in parallel. If training, a batch results in only one update to the model. A batch generally approximates the distribution of the input data better than a single input. The larger the batch, the better the approximation; however, it is also true that the batch will take longer to process and will still result in only one update. For inference (evaluate/predict), it is recommended to pick a batch size that is as large as you can afford without going out of memory (since larger batches will usually result in faster evaluation/prediction).
  • Epoch: an arbitrary cutoff, generally defined as “one pass over the entire dataset”, used to separate training into distinct phases, which is useful for logging and periodic evaluation.
    When using validation_data or validation_split with the fit method of Keras models, evaluation will be run at the end of every epoch.
    Within Keras, there is the ability to add callbacks specifically designed to be run at the end of an epoch. Examples of these are learning rate changes and model checkpointing (saving).

Why is my training loss much higher than my testing loss?

A Keras model has two modes : prepare and test. Regularization mechanism, such as Dropout and L1/L2 weight regulation, are turned off at testing time. They are reflected in the train time passing but not in the screen time loss .
Besides, the train loss that Keras displays is the average of the losses for each batch of trail data, over the current epoch. Because your model is changing over time, the loss over the first batches of an epoch is by and large higher than over the last batches. This can bring the epoch-wise average down. On the early hand, the quiz passing for an epoch is computed using the exemplary as it is at the end of the epoch, resulting in a lower loss .

How can I use Keras with datasets that don’t fit in memory?

You should use the tf.data API to create tf.data.Dataset objects — an abstraction over a data grapevine that can pull data from local harrow, from a distributed file system, from GCS, and so forth, american samoa well as efficiently apply respective data transformations .
For example, the utility tf.keras.preprocessing.image_dataset_from_directory will create a dataset that reads effigy data from a local directory. Likewise, the utility tf.keras.preprocessing.text_dataset_from_directory will create a dataset that reads text files from a local directory .
Dataset objects can be directly passed to fit(), or can be iterated over in a custom subordinate train loop .

 model. fit ( dataset,  epoch = 10,  validation_data = val_dataset )

How can I ensure my training run can recover from program interruptions?

To ensure the ability to recover from an fitful trail function at any time ( fault tolerance ), you should use a tf.keras.callbacks.experimental.BackupAndRestore that regularly saves your training advance, including the epoch count and weights, to disk, and loads it the following meter you call Model.fit() .

 meaning  tensorflow  as  tf
 from  tensorflow  import  keras

 classify  InterruptingCallback ( keras. callbacks. recall ) :
   `` `` '' A recall to intentionally introduce pause to discipline. '' '' ''
   def  on_epoch_end ( self,  epoch,  log = none ) :
     if  era  ==  15 :
       raise  RuntimeError ( 'Interruption ' )

 model  =  keras. consecutive ( [ keras. layers. dense ( 10 ) ] )
 optimizer  =  keras. optimizers. SGD ( )
 model. compose ( optimizer,  passing = `` mse '' )

 x  =  tf. random. uniform ( ( 24,  10 ) )
 yttrium  =  tf. random. uniform ( ( 24, ) )
 dataset  =  tf. data. Dataset. from_tensor_slices ( ( ten,  y ) ). repeat ( ). batch ( 2 )

 backup_callback  =  keras. callbacks. experimental. BackupAndRestore (
     backup_dir = '/tmp/backup ' )
 judge :
   exemplary. fit ( dataset,  era = 20,  steps_per_epoch = 5, 
             callbacks = [ backup_callback,  InterruptingCallback ( ) ] )
 except  RuntimeError :
   print ( '***Handling interruption*** ' )
   # This continues at the epoch where it left off .
   model. fit ( dataset,  era = 20,  steps_per_epoch = 5, 
             callbacks = [ backup_callback ] )

Find out more in the callbacks documentation .

How can I interrupt training when the validation loss isn’t decreasing anymore?

You can use an EarlyStopping recall :

 from  tensorflow.keras.callbacks  significance  EarlyStopping

 early_stopping  =  EarlyStopping ( admonisher = 'val_loss ',  solitaire = 2 )
 exemplar. match ( ten,  y,  validation_split = 0.2,  callbacks = [ early_stopping ] )

Find out more in the callbacks software documentation .

How can I freeze layers and do fine-tuning?

Setting the trainable attribute
All layers & models have a layer.trainable boolean attribute :

>>>  layer  = Dense ( 3 )
>>> layer.trainable
True

On all layers & models, the trainable attribute can be set ( to True or False ). When set to False, the layer.trainable_weights assign is empty :

 > > >  layer  =  dense ( 3 )
 > > >  layer. build ( input_shape = ( 3,  3 ) )  # Create the weights of the layer
 > > >  layer. trainable
 true
 > > >  layer. trainable_weights
 [ < tf. variable star  'kernel:0 '  determine = ( 3,  3 )  dtype = float32,  numpy =
 array ( [ [ ... ] ],  dtype = float32 ) >,  < tf. variable  'bias:0 '  determine = ( 3, )  dtype = float32,  numpy = array ( [ ... ],  dtype = float32 ) > ]
 > > >  layer. trainable  =  false
 > > >  level. trainable_weights
 [ ]

Setting the trainable attribute on a layer recursively sets it on all children layers ( contents of self.layers ) .
1) When training with fit():
To do polish with fit(), you would :

  • Instantiate a base model and load pre-trained weights
  • Freeze that base model
  • Add trainable layers on top
  • Call compile() and fit()

Like this :

 exemplar  =  consecutive ( [
     ResNet50Base ( input_shape = ( 32,  32,  3 ),  weights = 'pretrained ' ) ,
     dense ( 10 ) ,
 ] )
 model. layers [ 0 ]. trainable  =  false   # Freeze ResNet50Base .

 affirm  model. layers [ 0 ]. trainable_weights  ==  [ ]   # ResNet50Base has no trainable weights .
 insist  len (  exemplar. trainable_weights )  ==  2   # Just the bias & kernel of the Dense layer .

 model. roll up ( ... )
 model. fit ( ... )   # Train Dense while excluding ResNet50Base .

You can follow a similar work flow with the Functional API or the model subclassing API. Make sure to call compile() after changing the measure of trainable in ordering for your changes to be taken into account. Calling compile() will freeze the department of state of the training step of the model .
2) When using a custom training loop:
When writing a training cringle, make sure to only update weights that are share of model.trainable_weights ( and not all model.weights ) .

 model  =  consecutive ( [
     ResNet50Base ( input_shape = ( 32,  32,  3 ),  weights = 'pretrained ' ) ,
     dense ( 10 ) ,
 ] )
 model. layers [ 0 ]. trainable  =  assumed   # Freeze ResNet50Base .

 # Iterate over the batches of a dataset .
 for  inputs,  targets  in  dataset :
     # Open a GradientTape .
     with  tf. GradientTape ( )  as  tape :
         # Forward passing .
         predictions  =  model ( inputs )
         # Compute the personnel casualty value for this batch .
         loss_value  =  loss_fn ( targets,  predictions )

     # Get gradients of personnel casualty wrt the *trainable* weights .
     gradients  =  tape. gradient ( loss_value,  exemplary. trainable_weights )
     # Update the weights of the model .
     optimizer. apply_gradients ( travel rapidly ( gradients,  model. trainable_weights ) )

Interaction between trainable and compile()
Calling compile() on a model is meant to “ freeze ” the demeanor of that model. This implies that the trainable property values at the time the model is compiled should be preserved throughout the life of that model, until compile is called again. Hence, if you change trainable, make sure to call compile() again on your model for your changes to be taken into account .
For exemplify, if two models A & B share some layers, and :

  • Model A gets compiled
  • The trainable attribute value on the shared layers is changed
  • Model B is compiled

then model A and B are using different trainable values for the share layers. This mechanism is critical for most existing GAN implementations, which do :

 differentiator. compose ( ... )   # the weights of `discriminator` should be updated when `discriminator` is trained
 differentiator. trainable  =  faithlessly
 gin. roll up ( ... )   # `discriminator` is a submodel of `gan`, which should not be updated when `gan` is trained

What’s the difference between the training argument in call() and the trainable attribute?

training is a boolean argument in call that determines whether the call should be run in inference modality or train mode. For exercise, in education mood, a Dropout layer applies random dropout and rescales the output. In inference mode, the same layer does nothing. model :

 yttrium  =  dropout ( 0.5 ) ( ten,  train = true )   # Applies dropout at discipline time *and* inference clock

trainable is a boolean layer assign that determines the trainable weights of the layer should be updated to minimize the loss during train. If layer.trainable is set to False, then layer.trainable_weights will constantly be an empty tilt. exercise :

 model  =  consecutive ( [
     ResNet50Base ( input_shape = ( 32,  32,  3 ),  weights = 'pretrained ' ) ,
     dense ( 10 ) ,
 ] )
 model. layers [ 0 ]. trainable  =  false   # Freeze ResNet50Base .

 assert  model. layers [ 0 ]. trainable_weights  ==  [ ]   # ResNet50Base has no trainable weights .
 affirm  len ( model. trainable_weights )  ==  2   # Just the bias & kernel of the Dense layer .

 model. compile ( ... )
 model. match ( ... )   # Train Dense while excluding ResNet50Base .

As you can see, “ inference manner vs training mode ” and “ layer weight unit trainability ” are two very unlike concepts .
You could imagine the following : a dropout layer where the scaling factor is learned during train, via backpropagation. Let ‘s list it AutoScaleDropout. This level would have simultaneously a trainable country, and a different behavior in inference and train. Because the trainable attribute and the training predict argument are mugwump, you can do the follow :

 level  =  AutoScaleDropout ( 0.5 )

 # Applies dropout at trail time *and* inference time
 # *and* learns the scale factor during coach
 y  =  layer ( adam,  train = true )

 insist  len ( level. trainable_weights )  ==  1
 # Applies dropout at train clock *and* inference clock time
 # with a *frozen* scale agent

 layer  =  AutoScaleDropout ( 0.5 )
 layer. trainable  =  false
 yttrium  =  layer ( ten,  aim = true )

Special case of the BatchNormalization layer
Consider a BatchNormalization level in the freeze separate of a exemplary that ‘s used for polish .
It has long been debated whether the moving statistics of the BatchNormalization layer should stay freeze or adapt to the modern data. Historically, bn.trainable = False would only stop backprop but would not prevent the training-time statistics update. After extensive test, we have found that it is normally better to freeze the moving statistics in fine-tune use cases. Starting in TensorFlow 2.0, setting bn.trainable = False
will also force the layer to run in inference mode.

This behavior alone applies for BatchNormalization. For every other level, weight trainability and ” inference vs train manner ” remain independent .

In fit(), how is the validation split computed?

If you set the validation_split argumentation in model.fit to e.g. 0.1, then the establishment data used will be the last 10 % of the data. If you set it to 0.25, it will be the death 25 % of the data, etc. bill that the data is n’t shuffled ahead extracting the establishment separate, so the establishment is literally just the last x % of samples in the input signal you passed .
The same establishment sic is used for all era ( within the lapp margin call to fit ) .
note that the validation_split option is only available if your data is passed as Numpy arrays ( not tf.data.Datasets, which are not indexable ) .

In fit(), is the data shuffled during training?

If you pass your data as NumPy arrays and if the shuffle argument in model.fit() is set to True ( which is the default ), the train data will be globally randomly shuffled at each epoch .
If you pass your data as a tf.data.Dataset object and if the shuffle controversy in model.fit() is set to True, the dataset will be locally shuffled ( buffer shuffle ) .
When using tf.data.Dataset objects, prefer shuffling your data ahead ( e.g. by calling dataset = dataset.shuffle(buffer_size) ) so as to be in control of the buff size .
Validation data is never shuffled .

Loss values and metric values are reported via the default option advance bar displayed by calls to fit(). however, staring at changing american standard code for information interchange numbers in a console is not an optimum metric-monitoring feel. We recommend the consumption of TensorBoard, which will display nice-looking graph of your aim and establishment metrics, regularly updated during prepare, which you can access from your browser .
You can use TensorBoard with fit() via the TensorBoard recall .

What if I need to customize what fit() does?

You have two options :
1) Subclass the Model class and override the train_step (and test_step) methods
This is a better option if you want to use customs update rules but still want to leverage the functionality provided by fit(), such as callbacks, effective step fuse, etc .
note that this design does not prevent you from building models with the Functional API, in which encase you will use the class you created to instantiate the model with the inputs and outputs. same goes for Sequential models, in which shell you will subclass keras.Sequential and override its train_step rather of keras.Model .
The case below shows a functional model with a custom train_step .

 from  tensorflow  meaning  keras
 significance  tensorflow  as  tf
 import  numpy  as  nurse practitioner

 class  MyCustomModel ( keras. model ) :

     def  train_step ( self,  data ) :
         # Unpack the datum. Its structure depends on your model and
         # on what you pass to `fit ( ) ` .
         x,  yttrium  =  data

         with  tf. GradientTape ( )  as  tape :
             y_pred  =  self ( ten,  education = true )   # Forward pass
             # Compute the loss prize
             # ( the passing officiate is configured in `compile ( ) ` )
             personnel casualty  =  self. compiled_loss ( yttrium,  y_pred ,
                                       regularization_losses = self. losses )

         # Compute gradients
         trainable_vars  =  self. trainable_variables
         gradients  =  tape. gradient ( loss,  trainable_vars )
         # Update weights
         self. optimizer. apply_gradients ( energy ( gradients,  trainable_vars ) )
         # Update metrics ( includes the metric unit that tracks the personnel casualty )
         self. compiled_metrics. update_state ( yttrium,  y_pred )
         # Return a dict mapping measured names to current value
         render  { thousand. identify :  megabyte. result ( )  for  thousand  in  self. metrics }


 # Construct and compile an case of MyCustomModel
 inputs  =  keras. input signal ( human body = ( 32, ) )
 outputs  =  keras. layers. dense ( 1 ) ( inputs )
 model  =  MyCustomModel ( inputs,  outputs )
 model. compose ( optimizer = 'adam ',  loss = 'mse ',  metrics = [ 'accuracy ' ] )

 # Just use `fit` as common
 x  =  neptunium. random. random ( ( 1000,  32 ) )
 y  =  neptunium. random. random ( ( 1000,  1 ) )
 model. fit ( x,  yttrium,  era = 10 )

You can besides easily add hold for sample weight :

 class  MyCustomModel ( keras. model ) :

     def  train_step ( self,  data ) :
         # Unpack the datum. Its structure depends on your mannequin and
         # on what you pass to `fit ( ) ` .
         if  len ( data )  ==  3 :
             ten,  yttrium,  sample_weight  =  data
         else :
             adam,  y  =  data

         with  tf. GradientTape ( )  as  videotape :
             y_pred  =  self ( x,  train = truthful )   # Forward pass
             # Compute the personnel casualty value .
             # The personnel casualty function is configured in `compile ( ) ` .
             loss  =  self. compiled_loss ( yttrium,  y_pred ,
                                       sample_weight = sample_weight ,
                                       regularization_losses = self. losses )

         # Compute gradients
         trainable_vars  =  self. trainable_variables
         gradients  =  tape. gradient ( loss,  trainable_vars )

         # Update weights
         self. optimizer. apply_gradients ( zip code ( gradients,  trainable_vars ) )

         # Update the metrics .
         # Metrics are configured in `compile ( ) ` .
         self. compiled_metrics. update_state (
             yttrium,  y_pred,  sample_weight = sample_weight )

         # Return a dict mapping system of measurement names to stream value .
         # bill that it will include the personnel casualty ( tracked in self.metrics ) .
         return  { megabyte. name :  megabyte. resultant role ( )  for  meter  in  self. metrics }


 # Construct and compile an exemplify of MyCustomModel
 inputs  =  keras. input ( form = ( 32, ) )
 outputs  =  keras. layers. dense ( 1 ) ( inputs )
 model  =  MyCustomModel ( inputs,  outputs )
 model. roll up ( optimizer = 'adam ',  loss = 'mse ',  metrics = [ 'accuracy ' ] )

 # You can now use sample_weight argument
 ten  =  nurse practitioner. random. random ( ( 1000,  32 ) )
 y  =  neptunium. random. random ( ( 1000,  1 ) )
 southwest  =  neptunium. random. random ( ( 1000,  1 ) )
 model. meet ( ten,  y,  sample_weight = southwest,  era = 10 )

similarly, you can besides customize evaluation by overriding test_step :

 class  MyCustomModel ( keras. model ) :

     def  test_step ( self,  data ) :
       # Unpack the data
       x,  yttrium  =  data
       # Compute predictions
       y_pred  =  self ( ten,  training = false )
       # Updates the metrics tracking the loss
       self. compiled_loss (
           yttrium,  y_pred,  regularization_losses = self. losses )
       # Update the metrics .
       self. compiled_metrics. update_state ( y,  y_pred )
       # Return a dict mapping metric unit names to current rate .
       # note that it will include the passing ( tracked in self.metrics ) .
       tax return  { thousand. mention :  meter. consequence ( )  for  megabyte  in  self. metrics }

2) Write a low-level custom training loop
This is a effective option if you want to be in control of every last little detail. But it can be reasonably long-winded. example :

 # Prepare an optimizer .
 optimizer  =  tf. keras. optimizers. adam ( )
 # Prepare a loss function .
 loss_fn  =  tf. keras. losses. kl_divergence

 # Iterate over the batches of a dataset .
 for  inputs,  targets  in  dataset :
     # Open a GradientTape .
     with  tf. GradientTape ( )  as  tape :
         # Forward crack .
         predictions  =  model ( inputs )
         # Compute the loss measure for this batch .
         loss_value  =  loss_fn ( targets,  predictions )

     # Get gradients of loss wrt the weights .
     gradients  =  tape. gradient ( loss_value,  exemplar. trainable_weights )
     # Update the weights of the model .
     optimizer. apply_gradients ( nothing ( gradients,  exemplary. trainable_weights ) )

This example does not include a fortune of substantive functionality like displaying a build up stripe, calling callbacks, updating metrics, etc. You would have to do this yourself. It’s not difficult at all, but it ‘s a moment of study .

How can I train models in mixed precision?

Keras has built-in back for blend preciseness train on GPU and TPU. See this extensive guide .

What’s the difference between Model methods predict() and __call__()?

Let ‘s answer with an excerpt from Deep Learning with Python, Second Edition :

Both y = model.predict(x) and y = model(x) ( where x is an range of remark data ) mean “ run the model on x and retrieve the output y. ” Yet they are n’t precisely the same thing .
predict() loops over the data in batches ( in fact, you can specify the batch size via predict(x, batch_size=64) ), and it extracts the NumPy value of the outputs. It’s schematically equivalent to this :

 def  bode ( ten ) :
     y_batches  =  [ ]
     for  x_batch  in  get_batches ( ten ) :
         y_batch  =  exemplar ( ten ). numpy ( )
         y_batches. append ( y_batch )
     return  neptunium. concatenate ( y_batches )

This means that predict() calls can scale to identical large arrays. interim, model(x) happens in-memory and does n’t scale. On the other hand, predict() is not differentiable : you can not retrieve its gradient if you call it in a GradientTape scope .
You should use model(x) when you need to retrieve the gradients of the model call, and you should use predict() if you equitable need the output value. In other words, constantly use predict() unless you’re in the middle of writing a low-level gradient descent loop ( as we are now ) .

Modeling-related questions

How can I obtain the output of an intermediate layer (feature extraction)?

In the Functional API and Sequential API, if a level has been called precisely once, you can retrieve its output via layer.output and its input via layer.input. This enables you do cursorily instantiate feature-extraction models, like this one :

 from  tensorflow  spell  keras
 from  tensorflow.keras  significance  layers

 exemplary  =  consecutive ( [
     layers. Conv2D ( 32,  3,  energizing = 'relu ' ) ,
     layers. Conv2D ( 32,  3,  activation = 'relu ' ) ,
     layers. MaxPooling2D ( 2 ) ,
     layers. Conv2D ( 32,  3,  energizing = 'relu ' ) ,
     layers. Conv2D ( 32,  3,  activation = 'relu ' ) ,
     layers. GlobalMaxPooling2D ( ) ,
     layers. dense ( 10 ) ,
 ] )
 extractor  =  keras. model ( inputs = model. inputs ,
                         outputs = [ layer. output  for  layer  in  model. layers ] )
 features  =  extractor ( data )

naturally, this is not possible with models that are subclasses of Model that overrule call .
here ‘s another example : instantiating a Model that returns the output signal of a specific named layer :

 model  =  ...   # create the original model

 layer_name  =  'my_layer '
 intermediate_layer_model  =  keras. model ( inputs = model. input signal ,
                                        outputs = model. get_layer ( layer_name ). output )
 intermediate_output  =  intermediate_layer_model ( data )

How can I use pre-trained models in Keras?

You could leverage the models available in keras.applications, or the models available on TensorFlow Hub. TensorFlow Hub is well-integrated with Keras .

How can I use stateful RNNs?

Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch .
When using stateful RNNs, it is consequently bear that :

  • all batches have the same number of samples
  • If x1 and x2 are successive batches of samples, then x2[i] is the follow-up sequence to x1[i], for every i.

To use statefulness in RNNs, you need to :

  • explicitly specify the batch size you are using, by passing a batch_size argument to the first layer in your model. E.g. batch_size=32 for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
  • set stateful=True in your RNN layer(s).
  • specify shuffle=False when calling fit().

To reset the states accumulated :

  • use model.reset_states() to reset the states of all layers in the model
  • use layer.reset_states() to reset the states of a specific stateful RNN layer

Example :

 from  tensorflow  import  keras
 from  tensorflow.keras  import  layers
 import  numpy  as  neptunium

 ten  =  nurse practitioner. random. random ( ( 32,  21,  16 ) )   # this is our input data, of human body ( 32, 21, 16 )
 # we will feed it to our model in sequences of duration 10

 model  =  keras. consecutive ( )
 model. add ( layers. LSTM ( 32,  input_shape = ( 10,  16 ),  batch_size = 32,  stateful = genuine ) )
 model. add ( layers. dense ( 16,  activation = 'softmax ' ) )

 model. compile ( optimizer = 'rmsprop ',  personnel casualty = 'categorical_crossentropy ' )

 # we train the network to predict the 11th timestep given the first 10 :
 model. train_on_batch ( x [ :,  : 10,  : ],  neptunium. reshape ( x [ :,  10,  : ],  ( 32,  16 ) ) )

 # the department of state of the network has changed. We can feed the follow-up sequences :
 model. train_on_batch ( x [ :,  10 : 20,  : ],  nurse practitioner. reshape ( x [ :,  20,  : ],  ( 32,  16 ) ) )

 # let 's reset the states of the LSTM layer :
 model. reset_states ( )

 # another manner to do it in this case :
 exemplary. layers [ 0 ].  reset_states ( )

notice that the methods predict, fit, train_on_batch, etc. will all update the states of the stateful layers in a model. This allows you to do not alone stateful train, but besides stateful prediction .

reservoir : https://coinselected
Category : coin 4u

Leave a Reply

Your email address will not be published.