Iterating over variable-length sequences

Mar 19, 2014

RNNs using the TIMIT dataset

Lately I’ve been working on enabling Pylearn2 to iterate over variable-length sequences. In this post, I’ll discuss my progress so far.

The problem

Some types of models (such as convolutional or recurrent neural nets) naturally deal with variable-length inputs. Unfortunately, for the moment, this type of input is not well supported in Pylearn2: all Space subclasses expect the data to be a tensor whose first dimension is the batch axis and whose other dimensions are of fixed size. This means a sequence of fixed-sized elements cannot be stored in those spaces, because all time steps of the sequence would be considered as separate examples.

Even more fundamentally, there is no straightforward way to represent data structures containing variable-length elements in Theano. This means even if we solve the Space problem in Pylearn2, we’re limited to batches of size 1 unless some TypedList data structure is implemented in Theano.

New spaces

I wrote two new Space subclasses (VectorSequenceSpace and IndexSequenceSpace) to deal with variable-length sequences. They’re very similar to the corresponding VectorSpace and IndexSpace, with few key differences:

Because of Theano restrictions, an object in living in a *SequenceSpace is considered to represent a single example, unlike e.g. VectorSpace, which considers objects as batches of examples.
A *SequenceSpace expects objects living in its space to be matrices whose first dimension is time and whose second dimension represent a fixed-sized state, e.g. a features vector.
In order to enforce the fact that we’re dealing with a single example, it is impossible to convert a *SequenceSpace into a *Space. Doing otherwise would give rise to confusing behaviour: by going from a VectorSequenceSpace to a VectorSpace, suddenly every time step of the sequence is considered as a separate example. The only conversion allowed is from an IndexSequenceSpace to a VectorSequenceSpace.
Some methods such as get_total_dimension() don’t make sense when dealing with variable-length sequences and are not implemented.

New TIMIT wrapper

I also wrote a new TIMIT wrapper called TIMITSequences, which uses VectorSequenceSpace and IndexSequenceSpace to represent its data. Iterating over this dataset returns whole sequences. These sequences are segmented in frames of frame_length and form matrices whose first dimension is time and whose second dimension is what a sliding window of this length sees as it’s passing through the sequence.

As a proof-of-concept, I also wrote a toy RNN model (which you can find here) to train on this dataset. I haven’t had time to play with it a lot, but I hope to find time to do so this week and next week and present some results in another blog post.