Combining acoustic samples and phones information
Mar 03, 2014
How to use the Pylearn2 TIMIT class with multiple sources
Good news: the pull request fixing a bug with
Space classes got merged, which
means we’re now able to combine phones information with acoustic samples.
In this post, I’ll show you how it’s done. Note: make sure that you have the latest version of Pylearn2 and of the TIMIT dataset for Pylearn2
Data specs, how do they work?
A given dataset might offer multiple inputs and multiple targets. Multiple parts
of the learning pipeline in Pylearn2 require data in order to work:
Monitor all need input data and, optionally, target data.
Furthermore, it is possible that they all require their own formatting for the
In order to bridge between what a dataset offers and what the pipeline needs and
minimize the number of
TensorVariables created, Pylearn2 uses so-called
data_specs, which serve two purposes:
- Describe what the dataset has to offer, and in which format.
- Describe which portion of the data a part of the learning pipeline needs, and in which format.
data_specs have the following structure:
data_specs are tuples which contain two types of information: spaces and
sources. Sources are strings uniquely identifying a data source (e.g.
'phones', etc.) Spaces specify how these sources
are formatted (e.g.
IndexSpace, etc.) and their nested
structure correspond to the nested structure of the sources. For instance, one
data_specs could be
and would mean that a part of the model is requesting examples to be a tuple containing
- a tuple of batches, one of shape
(batch_size, 100)containing features and one of shape
(batch_size, 62)containing a one-hot encoded phone index for the next acoustic sample to predict
- a batch of shape
(batch_size, 1)containing targets, i.e. the next acoustic sample that needs to be predicted
Pylearn2 is smart enough to aggregate
data_specs from all parts of the
pipeline and create one single, non-redundant and flat
data_specs that’s the
union of all
data_specs and which is used to create
throughout the pipeline. It is able to map those variables back to the nested
representations specified by individual
data_specs so that every part of the
pipeline receives exactly what it needs in the requested format.
Data specs applied to
Datasets implement a
get_data_specs method which returns a flat
containing what the model has to offer, and in which format. For instance,
data_specs looks like this:
Data specs applied to
In order for your model to receive the correct data, it needs to implement the following methods:
(For those of you who are curious, it is the
Cost’s responsibility to
provide the requested
data_specs, and it does so by calling those four methods
Luckily for us, both
get_output_space are implemented in
Model base class and return
respectively, so all that is needed is to give
self.output_space the desired values when instantiating the
in Pylearn2’s current state,
'targets' respectively, so they need to be overrided if we
want anything else than those two sources.
Data specs for the MLP framework
The current state of the MLP framework does not allow to change sources to
something other than
'targets', but the following sub-classes
will do what we want:
Combined with the following YAML file, you should finally be able to train with previous acoustic samples and the phone associated with the acoustic sample to predict:
Try it out and tell me if it works for you!Share