emloop_tensorflow

Main emloop-tensorflow module exposing the emloop_tensorflow.BaseModel allowing to define emloop trainable models (networks).

Additional hooks, ops and util functions are available in the respective sub-modules.

The main design goal is to allow focusing on the model architecture while most of the burden code is hidden to the user. In fact, in most cases one will override only a single method emloop_tensorflow.BaseModel._create_model().

Classes

class emloop_tensorflow.BaseModel(dataset, log_dir, inputs, outputs, session_config=None, n_gpus=0, restore_from=None, optimizer=None, freeze=False, loss_name='loss', monitor=None, restore_fallback=None, clip_gradient=None, profile=False, keep_profiles=5, **kwargs)[source]

Bases: models.AbstractModel

Emloop AbstractModel implementation for TensorFlow models.

To define a emloop trainable model in TensorFlow, derive your class from BaseModel and override _create_model() method.

See the method references for additional customization options.

Inheritance diagram of BaseModel

SIGNAL_MEAN_NAME = 'signal_mean'

Name of the monitored signal mean tensor/output.

SIGNAL_VAR_NAME = 'signal_variance'

Name of the monitored signal variance tensor/output.

TRAINING_FLAG_NAME = 'el_is_training'

Training flag variable name.

TRAIN_OP_NAME = 'train_op'

Expected train op tensor name prefix.

__init__(dataset, log_dir, inputs, outputs, session_config=None, n_gpus=0, restore_from=None, optimizer=None, freeze=False, loss_name='loss', monitor=None, restore_fallback=None, clip_gradient=None, profile=False, keep_profiles=5, **kwargs)[source]

Create new emloop trainable TensorFlow model.

TF Graph, train ops etc. are constructed with the following procedure:

  1. Create tf.Graph and tf.Session with _create_session()
  2. Either create or restore the model with _create_model() or _restore_model() respectively
  3. Find input/output tensors
  4. Create train ops with _create_train_ops() unless they are already restored
  5. Find the train ops
  6. Create tf.Saver

Note

In most cases, it is not required to re-define the __init__ method for your models.

Tip

It is often useful to monitor signal/weights/gradients ranges, means and/or variances during the training. emloop-tensorflow base model actually provides monitoring of the feed-forward signal through the net. Simply set up the monitor paramater to the name of the layers to be monitored (e.g. Conv2D or Relu). Layer activation means and variances (named signal_mean and signal_variance) will be include in the output.

Parameters:
  • dataset (Optional[AbstractDataset]) – dataset to be trained with
  • log_dir (Optional[str]) – path to the logging directory (wherein models should be saved)
  • inputs (List[str]) – model input names
  • outputs (List[str]) – model output names
  • session – TF session configuration dict, see _create_session()
  • n_gpus (int) – number of GPUs to use
  • restore_from (Optional[str]) – path to directory from which the model is restored
  • restore_model_name – model name to be restored (e.g. model.ckpt)
  • optimizer – TF optimizer configuration dict
  • freeze – freeze the graph after each save
  • loss_name (str) – loss tensor name
  • monitor (Optional[str]) – monitor signal mean and variance of the tensors which names contain the specified value
  • restore_fallback (Optional[str]) – ignored arg. (allows training from configs saved by emloop where it is added)
  • clip_gradient (Optional[float]) – limit the absolute value of the gradient; set to None for no clipping
  • profile (bool) – if true, profile the speed of model inference and save profiles to the specified log_dir
  • keep_profiles (int) – if true, profile the speed of model inference and save profiles to the specified log_dir
  • kwargs – additional kwargs forwarded to _create_model()
_create_model(**kwargs)[source]

Create your TensorFlow model.

Every model has to define:

  • loss tensor named according to given loss_name
  • input placeholders and output tensors named according to the specified input and output names

Warning

To support multi-GPU training, all the variables must be created with tf.get_variable and appropriate variable scopes.

Parameters:kwargs – model configuration as specified in model section of the configuration file
Return type:None
_create_session(session_config)[source]

Create and return TF Session for this model.

By default the session is configured with tf.ConfigProto created with the given session_config as **kwargs. Nested dictionaries such as gpu_options or graph_options are handled automatically.

Parameters:session_config (Optional[dict]) – session configuration dict as specified in the config yaml
Return type:Session
Returns:TensorFlow session
_create_train_ops(dependencies, optimizer_config)[source]

Create the train ops for training. In order to handle incomplete batches, there must be one train op for each number of empty towers. E.g. for 2 GPU training, one must define 2 train ops for 1 and 2 towers respectively. The train ops must be named train_op_1, train_op_2 etc. wherein the suffixed number stands for the number of towers.

By default the train ops are constructed in the following way:
  • optimizer is created from the model.optimizer configuration dict
  • REGULARIZATION_LOSSES collection is summed to regularization_loss
  • gradients minimizing the respective tower losses and regularization_loss are computed
  • for each number of non-empty towers
    • gradients of the respective towers are averaged and applied

To implement a custom behavior, override this method and create your own op named as TRAIN_OP_NAME.

example optimizer config
model:
    optimizer:
        class: RMSPropOptimizer
        learning_rate: 0.001
Parameters:
  • dependencies (List[List[Operation]]) – a list of dependent operations (e.g. batch normalization updates) for each number of towers
  • optimizer_config (Optional[dict]) – optimizer configuration dict
Return type:

None

_initialize_variables(**kwargs)[source]

Initialize variables of your TensorFlow model.

By default variables are initialized randomly.

Tip

Override this method to load variables from some check-point and fine-tune the model.

Parameters:kwargs – model configuration as specified in model section of the configuration file
Return type:None
_restore_checkpoint(checkpoint_path)[source]

Restore model from the given checkpoint_path.

Parameters:checkpoint_path (str) – full path to the checkpoint, e.g. my_dir/model_3.ckpt.
Return type:None
_restore_model(restore_from)[source]

Restore TF model from the given restore_from path and restore_model_name.

The model name can be derived if the restore_from is a directory containing exactly one checkpoint or if its base name specifies a checkpoint.

Parameters:restore_from (str) – path to directory from which the model is restored, optionally including model filename
Return type:None
graph

TF graph object.

Return type:Graph
input_names

List of TF input tensor (placeholder) names.

Return type:List[str]
is_training

Training flag tensor.

This is useful for determining whether to use certain ops such as dropout.

Return type:Tensor
output_names

List of TF output tensor names.

Return type:List[str]
restore_fallback

Return the fully-qualified name of the fallback restore class (e.g. module.submodule.BaseClass).

When restoring a model, emloop tries to use the fallback class if the construction of the model object specified in model configuration section fails.

Return type:str
Returns:fully-qualified name of the fallback restore class
run(batch, train=False, stream=None)[source]

Run the model with the given batch. Update the trainable variables only if train is true.

Fetch and return all the model outputs as a dict.

Parameters:
  • batch (Mapping[str, Sequence[Any]]) – batch dict {source_name: values}
  • train (bool) – flag whether parameters update (train_op) should be included in fetches
  • stream (Optional[StreamWrapper]) – stream wrapper (useful for precise buffer management)
Raises:

ValueError – if an output is wrongly typed or its batch size differs from the input batch size

Return type:

Mapping[str, object]

Returns:

outputs dict

save(name_suffix='')[source]

Save current tensorflow graph to a checkpoint named with the given name suffix.

The checkpoint will be locaced in self.log_dir directory. :type name_suffix: str :param name_suffix: saved checkpoint name suffix :rtype: str :return: path to the saved checkpoint

session

TF session object.

Return type:Session
class emloop_tensorflow.FrozenModel(inputs, outputs, restore_from, log_dir=None, session_config=None, n_gpus=0, profile=False, keep_profiles=5, **_)[source]

Bases: models.AbstractModel

FrozenModel is emloop compatible abstraction for loading and running frozen TensorFlow graphs (.pb files).

In order to use it, just change the model.class configuration and invoke any emloop command such as emloop eval ....

using frozen model
# ...
model:
  class: emloop_tensorflow.FrozenModel
  # ...
Inheritance diagram of FrozenModel

__init__(inputs, outputs, restore_from, log_dir=None, session_config=None, n_gpus=0, profile=False, keep_profiles=5, **_)[source]

Initialize new FrozenModel instance.

Parameters:
  • log_dir (Optional[str]) – output directory
  • inputs (List[str]) – model input names
  • outputs (List[str]) – model output names
  • restore_from (str) – restore model path (either a dir or a .pb file)
  • session_config (Optional[dict]) – TF session configuration dict
  • n_gpus (int) – number of GPUs to use (either 0 or 1)
  • profile (bool) – if true, profile the speed of model inference and save profiles to the specified log_dir
  • keep_profiles (int) – how many profiles are saved
input_names

List of TF input tensor (placeholder) names.

Return type:List[str]
output_names

List of TF output tensor names.

Return type:List[str]
restore_fallback

Return the fully-qualified name of the fallback restore class (e.g. module.submodule.BaseClass).

When restoring a model, emloop tries to use the fallback class if the construction of the model object specified in model configuration section fails.

Return type:str
Returns:fully-qualified name of the fallback restore class
static restore_frozen_model(restore_from)[source]

Restore frozen TF model from the given restore_from path.

The model name can be derived if the restore_from is a directory containing exactly one checkpoint or if its base name specifies a checkpoint.

Parameters:restore_from (str) – path to directory from which the model is restored, optionally including model filename
Return type:None
run(batch, train=False, stream=None)[source]

Run the model with the given batch.

Fetch and return all the model outputs as a dict.

Warning

FrozenModel can not be trained.

Parameters:
  • batch (Mapping[str, Sequence[Any]]) – batch dict {source_name: values}
  • train (bool) – flag whether parameters update (train_op) should be included in fetches
  • stream (Optional[StreamWrapper]) – stream wrapper (useful for precise buffer management)
Return type:

Mapping[str, object]

Returns:

outputs dict

save(name_suffix='')[source]

Save the model (not implemented).

Return type:str