Combining acoustic samples and phones information
Mar 03, 2014
How to use the Pylearn2 TIMIT class with multiple sources
Good news: the pull request fixing a bug with Space
classes got merged, which
means we’re now able to combine phones information with acoustic samples.
In this post, I’ll show you how it’s done. Note: make sure that you have the latest version of Pylearn2 and of the TIMIT dataset for Pylearn2
Data specs, how do they work?
A given dataset might offer multiple inputs and multiple targets. Multiple parts
of the learning pipeline in Pylearn2 require data in order to work: Model
,
Cost
and Monitor
all need input data and, optionally, target data.
Furthermore, it is possible that they all require their own formatting for the
data.
In order to bridge between what a dataset offers and what the pipeline needs and
minimize the number of TensorVariable
s created, Pylearn2 uses so-called
data_specs
, which serve two purposes:
- Describe what the dataset has to offer, and in which format.
- Describe which portion of the data a part of the learning pipeline needs, and in which format.
data_specs
have the following structure:
data_specs
are tuples which contain two types of information: spaces and
sources. Sources are strings uniquely identifying a data source (e.g.
'features'
, 'targets'
, 'phones'
, etc.) Spaces specify how these sources
are formatted (e.g. VectorSpace
, IndexSpace
, etc.) and their nested
structure correspond to the nested structure of the sources. For instance, one
valid data_specs
could be
and would mean that a part of the model is requesting examples to be a tuple containing
- a tuple of batches, one of shape
(batch_size, 100)
containing features and one of shape(batch_size, 62)
containing a one-hot encoded phone index for the next acoustic sample to predict - a batch of shape
(batch_size, 1)
containing targets, i.e. the next acoustic sample that needs to be predicted
Pylearn2 is smart enough to aggregate data_specs
from all parts of the
pipeline and create one single, non-redundant and flat data_specs
that’s the
union of all data_specs
and which is used to create TensorVariable
s used
throughout the pipeline. It is able to map those variables back to the nested
representations specified by individual data_specs
so that every part of the
pipeline receives exactly what it needs in the requested format.
Data specs applied to Dataset
sub-classes
Datasets implement a get_data_specs
method which returns a flat data_specs
containing what the model has to offer, and in which format. For instance,
TIMIT’s data_specs
looks like this:
Data specs applied to Model
sub-classes
In order for your model to receive the correct data, it needs to implement the following methods:
get_input_space
get_output_space
get_input_source
get_target_source
(For those of you who are curious, it is the Cost
’s responsibility to
provide the requested data_specs
, and it does so by calling those four methods
on the Model
)
Luckily for us, both get_input_space
and get_output_space
are implemented in
the Model
base class and return self.input_space
and self.output_space
respectively, so all that is needed is to give self.input_space
and
self.output_space
the desired values when instantiating the Model
. However,
in Pylearn2’s current state, get_input_source
and get_target_source
returns
'features'
and 'targets'
respectively, so they need to be overrided if we
want anything else than those two sources.
Data specs for the MLP framework
The current state of the MLP framework does not allow to change sources to
something other than 'features'
and 'targets'
, but the following sub-classes
will do what we want:
Combined with the following YAML file, you should finally be able to train with previous acoustic samples and the phone associated with the acoustic sample to predict:
Try it out and tell me if it works for you!
Share