Remember my last post talking about improvements to the TIMIT dataset? Well here’s another big improvement: thanks to Laurent’s and David’s help, I was able to massively reduce memory footprint, which was the main thing on my to-do list for this week.
A memory-time trade-off
In order to quickly map example indexes to their actual location in the data, an array storing this information was computed and kept in memory upon instantiation. At first, this seems like the right thing to do: thanks to this, no matter which example you request, you’ll be able to get it in constant time.
The problem is that the number of possible training examples is huge: the validation set by itself roughly contains 24 million examples if you consider an example to be 100 consecutive audio samples followed by one target audio sample. This means even the array mapping example indexes to data location was big. The problem was particularly apparent when the frame length was small. Given that we are to predict the next acoustic sample based on the k previous ones plus the current phoneme (meaning our frame size is 1) as a first step, something had to be done.
The solution David, Laurent and I agreed on is to trade memory performance with time performance by computing the locations on-the-fly, and it turned out to work pretty well: now even working with a frame size of 1 is doable in terms of memory. Even better, the changes do not seem to impact performance significantly.
I encourage you to try the dataset (see here) and tell me if it works for you.Share