Good news: the pull request fixing a bug with Space classes got merged, which means we’re now able to combine phones information with acoustic samples.

In this post, I’ll show you how it’s done. Note: make sure that you have the latest version of Pylearn2 and of the TIMIT dataset for Pylearn2

### Data specs, how do they work?

A given dataset might offer multiple inputs and multiple targets. Multiple parts of the learning pipeline in Pylearn2 require data in order to work: Model, Cost and Monitor all need input data and, optionally, target data. Furthermore, it is possible that they all require their own formatting for the data.

In order to bridge between what a dataset offers and what the pipeline needs and minimize the number of TensorVariables created, Pylearn2 uses so-called data_specs, which serve two purposes:

• Describe what the dataset has to offer, and in which format.
• Describe which portion of the data a part of the learning pipeline needs, and in which format.

data_specs have the following structure:

data_specs are tuples which contain two types of information: spaces and sources. Sources are strings uniquely identifying a data source (e.g. 'features', 'targets', 'phones', etc.) Spaces specify how these sources are formatted (e.g. VectorSpace, IndexSpace, etc.) and their nested structure correspond to the nested structure of the sources. For instance, one valid data_specs could be

and would mean that a part of the model is requesting examples to be a tuple containing

• a tuple of batches, one of shape (batch_size, 100) containing features and one of shape (batch_size, 62) containing a one-hot encoded phone index for the next acoustic sample to predict
• a batch of shape (batch_size, 1) containing targets, i.e. the next acoustic sample that needs to be predicted

Pylearn2 is smart enough to aggregate data_specs from all parts of the pipeline and create one single, non-redundant and flat data_specs that’s the union of all data_specs and which is used to create TensorVariables used throughout the pipeline. It is able to map those variables back to the nested representations specified by individual data_specs so that every part of the pipeline receives exactly what it needs in the requested format.

### Data specs applied to Dataset sub-classes

Datasets implement a get_data_specs method which returns a flat data_specs containing what the model has to offer, and in which format. For instance, TIMIT’s data_specs looks like this:

### Data specs applied to Model sub-classes

In order for your model to receive the correct data, it needs to implement the following methods:

• get_input_space
• get_output_space
• get_input_source
• get_target_source

(For those of you who are curious, it is the Cost’s responsibility to provide the requested data_specs, and it does so by calling those four methods on the Model)

Luckily for us, both get_input_space and get_output_space are implemented in the Model base class and return self.input_space and self.output_space respectively, so all that is needed is to give self.input_space and self.output_space the desired values when instantiating the Model. However, in Pylearn2’s current state, get_input_source and get_target_source returns 'features' and 'targets' respectively, so they need to be overrided if we want anything else than those two sources.

### Data specs for the MLP framework

The current state of the MLP framework does not allow to change sources to something other than 'features' and 'targets', but the following sub-classes will do what we want:

Combined with the following YAML file, you should finally be able to train with previous acoustic samples and the phone associated with the acoustic sample to predict:

Try it out and tell me if it works for you!