Iterating over variable-length sequences
Mar 19, 2014
RNNs using the TIMIT dataset
Lately I’ve been working on enabling Pylearn2 to iterate over variable-length sequences. In this post, I’ll discuss my progress so far.
The problem
Some types of models (such as convolutional or recurrent neural nets) naturally
deal with variable-length inputs. Unfortunately, for the moment, this type of
input is not well supported in Pylearn2: all Space subclasses expect the data
to be a tensor whose first dimension is the batch axis and whose other
dimensions are of fixed size. This means a sequence of fixed-sized elements
cannot be stored in those spaces, because all time steps of the sequence would
be considered as separate examples.
Even more fundamentally, there is no straightforward way to represent data
structures containing variable-length elements in Theano. This means even if we
solve the Space problem in Pylearn2, we’re limited to batches of size 1 unless
some TypedList data structure is implemented in Theano.
New spaces
I wrote two new Space subclasses (VectorSequenceSpace and
IndexSequenceSpace) to deal with variable-length sequences. They’re very
similar to the corresponding VectorSpace and IndexSpace, with few key
differences:
- Because of Theano restrictions, an object in living in a
*SequenceSpaceis considered to represent a single example, unlike e.g.VectorSpace, which considers objects as batches of examples. - A
*SequenceSpaceexpects objects living in its space to be matrices whose first dimension is time and whose second dimension represent a fixed-sized state, e.g. a features vector. - In order to enforce the fact that we’re dealing with a single example, it
is impossible to convert a
*SequenceSpaceinto a*Space. Doing otherwise would give rise to confusing behaviour: by going from aVectorSequenceSpaceto aVectorSpace, suddenly every time step of the sequence is considered as a separate example. The only conversion allowed is from anIndexSequenceSpaceto aVectorSequenceSpace. - Some methods such as
get_total_dimension()don’t make sense when dealing with variable-length sequences and are not implemented.
New TIMIT wrapper
I also wrote a new TIMIT wrapper called TIMITSequences, which uses
VectorSequenceSpace and IndexSequenceSpace to represent its data. Iterating
over this dataset returns whole sequences. These sequences are segmented in
frames of frame_length and form matrices whose first dimension is time and
whose second dimension is what a sliding window of this length sees as it’s
passing through the sequence.
As a proof-of-concept, I also wrote a toy RNN model (which you can find here) to train on this dataset. I haven’t had time to play with it a lot, but I hope to find time to do so this week and next week and present some results in another blog post.
Share