Iterating over variable-length sequences
Mar 19, 2014
RNNs using the TIMIT dataset
Lately I’ve been working on enabling Pylearn2 to iterate over variable-length sequences. In this post, I’ll discuss my progress so far.
The problem
Some types of models (such as convolutional or recurrent neural nets) naturally
deal with variable-length inputs. Unfortunately, for the moment, this type of
input is not well supported in Pylearn2: all Space
subclasses expect the data
to be a tensor whose first dimension is the batch axis and whose other
dimensions are of fixed size. This means a sequence of fixed-sized elements
cannot be stored in those spaces, because all time steps of the sequence would
be considered as separate examples.
Even more fundamentally, there is no straightforward way to represent data
structures containing variable-length elements in Theano. This means even if we
solve the Space
problem in Pylearn2, we’re limited to batches of size 1 unless
some TypedList
data structure is implemented in Theano.
New spaces
I wrote two new Space
subclasses (VectorSequenceSpace
and
IndexSequenceSpace
) to deal with variable-length sequences. They’re very
similar to the corresponding VectorSpace
and IndexSpace
, with few key
differences:
- Because of Theano restrictions, an object in living in a
*SequenceSpace
is considered to represent a single example, unlike e.g.VectorSpace
, which considers objects as batches of examples. - A
*SequenceSpace
expects objects living in its space to be matrices whose first dimension is time and whose second dimension represent a fixed-sized state, e.g. a features vector. - In order to enforce the fact that we’re dealing with a single example, it
is impossible to convert a
*SequenceSpace
into a*Space
. Doing otherwise would give rise to confusing behaviour: by going from aVectorSequenceSpace
to aVectorSpace
, suddenly every time step of the sequence is considered as a separate example. The only conversion allowed is from anIndexSequenceSpace
to aVectorSequenceSpace
. - Some methods such as
get_total_dimension()
don’t make sense when dealing with variable-length sequences and are not implemented.
New TIMIT wrapper
I also wrote a new TIMIT wrapper called TIMITSequences
, which uses
VectorSequenceSpace
and IndexSequenceSpace
to represent its data. Iterating
over this dataset returns whole sequences. These sequences are segmented in
frames of frame_length
and form matrices whose first dimension is time and
whose second dimension is what a sliding window of this length sees as it’s
passing through the sequence.
As a proof-of-concept, I also wrote a toy RNN model (which you can find here) to train on this dataset. I haven’t had time to play with it a lot, but I hope to find time to do so this week and next week and present some results in another blog post.
Share