Lately I’ve been working on enabling Pylearn2 to iterate over variable-length sequences. In this post, I’ll discuss my progress so far.
Some types of models (such as convolutional or recurrent neural nets) naturally
deal with variable-length inputs. Unfortunately, for the moment, this type of
input is not well supported in Pylearn2: all
Space subclasses expect the data
to be a tensor whose first dimension is the batch axis and whose other
dimensions are of fixed size. This means a sequence of fixed-sized elements
cannot be stored in those spaces, because all time steps of the sequence would
be considered as separate examples.
Even more fundamentally, there is no straightforward way to represent data
structures containing variable-length elements in Theano. This means even if we
Space problem in Pylearn2, we’re limited to batches of size 1 unless
TypedList data structure is implemented in Theano.
I wrote two new
Space subclasses (
IndexSequenceSpace) to deal with variable-length sequences. They’re very
similar to the corresponding
IndexSpace, with few key
- Because of Theano restrictions, an object in living in a
*SequenceSpaceis considered to represent a single example, unlike e.g.
VectorSpace, which considers objects as batches of examples.
*SequenceSpaceexpects objects living in its space to be matrices whose first dimension is time and whose second dimension represent a fixed-sized state, e.g. a features vector.
- In order to enforce the fact that we’re dealing with a single example, it
is impossible to convert a
*Space. Doing otherwise would give rise to confusing behaviour: by going from a
VectorSpace, suddenly every time step of the sequence is considered as a separate example. The only conversion allowed is from an
- Some methods such as
get_total_dimension()don’t make sense when dealing with variable-length sequences and are not implemented.
New TIMIT wrapper
I also wrote a new TIMIT wrapper called
TIMITSequences, which uses
IndexSequenceSpace to represent its data. Iterating
over this dataset returns whole sequences. These sequences are segmented in
frame_length and form matrices whose first dimension is time and
whose second dimension is what a sliding window of this length sees as it’s
passing through the sequence.
As a proof-of-concept, I also wrote a toy RNN model (which you can find here) to train on this dataset. I haven’t had time to play with it a lot, but I hope to find time to do so this week and next week and present some results in another blog post.Share