Vincent DumoulinPersonal website of Vincent DumoulinJekyll2021-02-09T08:58:27-05:00https://vdumoulin.github.io/Vincent Dumoulinhttps://vdumoulin.github.io/vincent.dumoulin@umontreal.cahttps://vdumoulin.github.io/articles/extending-pylearn22014-10-10T00:00:00-04:002014-10-10T00:00:00-04:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<h1 id="who-should-read-this">Who should read this</h1>
<p>This tutorial is designed for pretty much anyone working with Theano who’s tired
of writing the same old boilerplate code over and over again. You have SGD
implementations scattered in pretty much every experiment file? Pylearn2 looks
attractive but you think porting your Theano code to it is too much of an
investment? This tutorial is for you.</p>
<p>Having played with Pylearn2 and looked at some of the tutorials is stongly
recommended. If you’re completely new to Pylearn2, have a look at the
<a href="http://nbviewer.ipython.org/github/lisa-lab/pylearn2/blob/master/pylearn2/scripts/tutorials/softmax_regression/softmax_regression.ipynb">softmax regression tutorial</a>.</p>
<p>In my opinion, Pylearn2 is great for two things:</p>
<ul>
<li>It allows you to experiment with new ideas without much implementation
overhead. The library was built to be modular, and it aims to be usable
without an extensive knowledge of the codebase. Writing a new model from
scratch is usually pretty fast once you know what to do and where to look.</li>
<li>It has an interface (YAML) that allows to decouple implementation from
experimental choices, which allows experiments to be constructed in a light
and readable fashion.</li>
</ul>
<p>Obviously, there is always a trade-off between being user-friendly and being
flexible, and Pylearn2 is no exception. For instance, users looking for a way to
work with sequential data might have a harder time getting started (although
this is something that’s being worked on).</p>
<p>In this post, I’ll assume that you have built a regression or classification
model with Theano and that the data it is trained on can be cast into two
matrices, one for training examples and one for training targets. People with
other use cases may need to work a little more (e.g. by figuring out how to put
their data inside Pylearn2), but I think the use case discussed here contains
useful information for anyone interested in porting a model to Pylearn2.</p>
<h1 id="how-i-work-with-pylearn2">How I work with Pylearn2</h1>
<p>I do my research exclusively using Pylearn2, but that doesn’t mean I use
or know everything in Pylearn2. In fact, I prototype new models in a very
Theano-like fashion: I write my model as a big monolithic block of hard coded
Theano expressions, and I wrap that up in the minimal amount of code necessary
to be able to plug my model in Pylearn2. <strong>This bare minimum is what I intend to
teach here.</strong></p>
<p>Sure, every little change to the model is a pain, but it works, right? As I
explore new ideas and change the code, I gradually make it more flexible:
a hard coded input dimension gets factored out as a constructor argument,
functions being composed are separated into layers, etc.</p>
<p>The <a href="https://vdumoulin.github.io/articles/introducing-vae">VAE framework</a> didn’t start out
like it is now: all I did is port what Joost van Amersfoort wrote in Theano
(see his code <a href="https://github.com/y0ast/Variational-Autoencoder/blob/master/Theano/VariationalAutoencoder.py">here</a>)
to Pylearn2 in order to reproduce the experiments in
<a href="http://arxiv.org/abs/1312.6114">(Kingma and Welling)</a>. Over time, I made the
code more modular and started reusing elements of the MLP framework, and at some
point it got to a state where I felt that it could be useful for other people.</p>
<p>I guess what I’m trying to convey here is that <strong>it’s alright to stick to the
bare minimum when developing a model for Pylearn2</strong>. Your code probably won’t
satisfy any other use cases than yours, but this is something that you can
change gradually as you go. There’s no need to make things any more complicated
than they should be when you start.</p>
<h1 id="the-bare-minimum">The bare minimum</h1>
<p>Let’s look at that <em>bare minimum</em>. It involves writing exactly two subclasses:</p>
<ul>
<li>One subclass of <code class="language-plaintext highlighter-rouge">pylearn2.costs.cost.Cost</code></li>
<li>One subclass of <code class="language-plaintext highlighter-rouge">pylearn2.models.model.Model</code></li>
</ul>
<p>No more than that? Nope. That’s it! Let’s have a look.</p>
<h2 id="it-all-starts-with-a-cost-expression">It all starts with a cost expression</h2>
<p>In the scenario I’m describing, your model maps an input to an output, the
output is compared with some ground truth using some measure of dissimilarity,
and the parameters of the model are changed to reduce this measure using
gradient information.</p>
<p>It is therefore natural that the object that interfaces between the model and
the training algorithm represents a cost. The base class for this object is
<code class="language-plaintext highlighter-rouge">pylearn2.costs.cost.Cost</code> and does three main things:</p>
<ul>
<li>It describes what data it needs to perform its duty and how it should be
formatted.</li>
<li>It computes the cost expression by feeding the input to the model and
receiving its output.</li>
<li>It differentiates the cost expression with respect to the model parameter and
returns the gradients to the training algorithm.</li>
</ul>
<p>What’s nice about <code class="language-plaintext highlighter-rouge">Cost</code> is if you follow the guidelines I’m about to describe,
you only have to worry about the cost expression; the gradient part is all
handled by the <code class="language-plaintext highlighter-rouge">Cost</code> base class, and a very useful <code class="language-plaintext highlighter-rouge">DefaultDataSpecsMixin</code>
mixin subclass is defined to handle the data description part (more about that
when we look at the <code class="language-plaintext highlighter-rouge">Model</code> subclass).</p>
<p>Let’s look at how the subclass should look:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pylearn2.costs.cost</span> <span class="kn">import</span> <span class="n">Cost</span><span class="p">,</span> <span class="n">DefaultDataSpecsMixin</span>
<span class="k">class</span> <span class="nc">MyCostSubclass</span><span class="p">(</span><span class="n">DefaultDataSpecsMixin</span><span class="p">,</span> <span class="n">Cost</span><span class="p">):</span>
<span class="c1"># Here it is assumed that we are doing supervised learning
</span> <span class="n">supervised</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">space</span><span class="p">,</span> <span class="n">source</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_data_specs</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
<span class="n">space</span><span class="p">.</span><span class="n">validate</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span> <span class="o">=</span> <span class="n">data</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">some_method_for_outputs</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="c1"># some loss measure involving outputs and targets
</span> <span class="k">return</span> <span class="n">loss</span></code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">supervised</code> class attribute is used by <code class="language-plaintext highlighter-rouge">DefaultDataSpecsMixin</code> to know how
to specify the data requirements. If it is set to <code class="language-plaintext highlighter-rouge">True</code>, the cost will expect
to receive inputs and targets, and if it is set to <code class="language-plaintext highlighter-rouge">False</code>, the cost will expect
to recive inputs only. In the example, it is assumed that we are doing
supervised learning, so we set <code class="language-plaintext highlighter-rouge">supervised</code> to <code class="language-plaintext highlighter-rouge">True</code>.</p>
<p>The first two lines of <code class="language-plaintext highlighter-rouge">expr</code> do some basic input checking and should always be
included at the beginning of your <code class="language-plaintext highlighter-rouge">expr</code> method. Without going too much into
details, <code class="language-plaintext highlighter-rouge">space.validate(data)</code> will make sure that the data you get is the data
you requested (e.g. if you do supervised learning you need an input tensor
variable and a target tensor variable). How “what you need” is decided will be
covered when we look at the <code class="language-plaintext highlighter-rouge">Model</code> subclass.</p>
<p>In that case, <code class="language-plaintext highlighter-rouge">data</code> is a tuple containing the inputs as the first element and
the targets as the second element (once again, bear with me if everything isn’t
completely clear for the moment, you’ll understand soon enough).</p>
<p>We then get the model output by calling its <code class="language-plaintext highlighter-rouge">some_method_for_outputs</code> method,
whose name and behaviour is really for you to decide, as long as your <code class="language-plaintext highlighter-rouge">Cost</code>
subclass knows which method to call on the model.</p>
<p>Finally, we compute some loss measure on <code class="language-plaintext highlighter-rouge">outputs</code> and <code class="language-plaintext highlighter-rouge">targets</code> and return that
as the cost expression.</p>
<p>Note that things don’t have to be <em>exactly</em> like this. For instance, you could
want the model to have a method that takes inputs and targets as argument and
returns the loss directly, and that would be perfectly fine. All you need is
some way to make your <code class="language-plaintext highlighter-rouge">Model</code> and <code class="language-plaintext highlighter-rouge">Cost</code> subclasses to work together to produce
a cost expression in the end.</p>
<h2 id="defining-the-model">Defining the model</h2>
<p>Now it’s time to make things more concrete by writing the model itself. The
model will be a subclass of <code class="language-plaintext highlighter-rouge">pylearn2.models.model.Model</code>, which is responsible
for the following:</p>
<ul>
<li>Defining what its parameters are</li>
<li>Defining what its data requirements are</li>
<li>Doing something with the input to produce an output</li>
</ul>
<p>Like for <code class="language-plaintext highlighter-rouge">Cost</code>, the <code class="language-plaintext highlighter-rouge">Model</code> base class does lots of useful things on its own,
provided you set the appropriate instance attributes. Let’s have a look at a
subclass example:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pylearn2.models.model</span> <span class="kn">import</span> <span class="n">Model</span>
<span class="k">class</span> <span class="nc">MyModelSubclass</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">MyModelSubclass</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
<span class="c1"># Some parameter initialization using *args and **kwargs
</span> <span class="c1"># ...
</span> <span class="bp">self</span><span class="p">.</span><span class="n">_params</span> <span class="o">=</span> <span class="p">[</span>
<span class="c1"># List of all the model parameters
</span> <span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">input_space</span> <span class="o">=</span> <span class="c1"># Some `pylearn2.space.Space` subclass
</span> <span class="c1"># This one is necessary only for supervised learning
</span> <span class="bp">self</span><span class="p">.</span><span class="n">output_space</span> <span class="o">=</span> <span class="c1"># Some `pylearn2.space.Space` subclass
</span>
<span class="k">def</span> <span class="nf">some_method_for_outputs</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
<span class="c1"># Some computation involving the inputs</span></code></pre></figure>
<p>The first thing you should do if you’re overriding the constructor is call the
the superclass’ constructor. Pylearn2 checks for that and will scold you if you
don’t.</p>
<p>You should then initialize you model parameters <strong>as shared variables</strong>:
Pylearn2 will build an updates dictionary for your model variables using
gradients returned by your cost. <em><strong>Protip: the <code class="language-plaintext highlighter-rouge">pylearn2.utils.sharedX</code> method
initializes a shared variable with the value and an optional name you provide.
This allows your code to be GPU-compatible without putting too much thought into
it.</strong></em> For instance, a weights matrix can be initialized this way:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span>
<span class="kn">from</span> <span class="nn">pylearn2.utils</span> <span class="kn">import</span> <span class="n">sharedX</span>
<span class="bp">self</span><span class="p">.</span><span class="n">W</span> <span class="o">=</span> <span class="n">sharedX</span><span class="p">(</span><span class="n">numpy</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">size1</span><span class="p">,</span> <span class="n">size2</span><span class="p">)),</span> <span class="s">'W'</span><span class="p">)</span></code></pre></figure>
<p>Put all your parameters in a list as the <code class="language-plaintext highlighter-rouge">_params</code> instance attribute. The
<code class="language-plaintext highlighter-rouge">Model</code> superclass defines a <code class="language-plaintext highlighter-rouge">get_params</code> method which returns <code class="language-plaintext highlighter-rouge">self._params</code>
for you, and that is method that is called to get the model parameters when
<code class="language-plaintext highlighter-rouge">Cost</code> is computing the gradients.</p>
<p>Your <code class="language-plaintext highlighter-rouge">Model</code> subclass should also describe the data format it expects as input
(<code class="language-plaintext highlighter-rouge">self.input_space</code>) and the data format of the model’s output
(<code class="language-plaintext highlighter-rouge">self.output_space</code>, which is required only if you’re doing supervised
learning). These attributes should be instances of <code class="language-plaintext highlighter-rouge">pylearn2.space.Space</code> (and
generally are instances of <code class="language-plaintext highlighter-rouge">pylearn2.space.VectorSpace</code>, a subclass of
<code class="language-plaintext highlighter-rouge">pylearn2.space.Space</code> used to represent batches of vectors). Without getting
too much into details, this mechanism allows for automatic conversion between
different data formats (e.g. if your targets are stored as integer indexes in
the dataset but are required to be one-hot encoded by the model).</p>
<p>The <code class="language-plaintext highlighter-rouge">some_method_for_outputs</code> method is really where all the magic happens. Like
I said before, the name of the method doesn’t really matter, as long as your
<code class="language-plaintext highlighter-rouge">Cost</code> subclass knows that it’s the one it has to call. This method expects a
tensor variable as input and returns a symbolic expression involving the input
and its parameters. What happens in between is up to you, and this is where you
can put all the Theano code you could possibly hope for, just like you would do
in pure Theano scripts.</p>
<h1 id="show-me-examples">Show me examples</h1>
<p>So far we’ve only been handwaiving. Let’s put these ideas to use by writing two
models, one which does supervised learning and one which does unsupervised
learning.</p>
<p>The data you train these models on is up to you, as long as it’s represented in
a matrix of features (each row being an example) and a matrix of targets (each
row being a target for an example, obviously only required if you’re doing
supervised learning). Note that it’s not the only way to get data into Pylearn2,
but that’s the one we’ll be using as it’s likely to be most people’s use case.</p>
<p>For the purpose of this tutorial, we’ll be training models on the venerable
MNIST dataset, which you can download as follows:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">wget http://deeplearning.net/data/mnist/mnist.pkl.gz</code></pre></figure>
<p>To make things easier to manipulate, we’ll unzip that file into six different
files:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">python <span class="nt">-c</span> <span class="s2">"from pylearn2.utils import serial; </span><span class="se">\</span><span class="s2">
data = serial.load('mnist.pkl'); </span><span class="se">\</span><span class="s2">
serial.save('mnist_train_X.pkl', data[0][0]); </span><span class="se">\</span><span class="s2">
serial.save('mnist_train_y.pkl', data[0][1].reshape((-1, 1))); </span><span class="se">\</span><span class="s2">
serial.save('mnist_valid_X.pkl', data[1][0]); </span><span class="se">\</span><span class="s2">
serial.save('mnist_valid_y.pkl', data[1][1].reshape((-1, 1))); </span><span class="se">\</span><span class="s2">
serial.save('mnist_test_X.pkl', data[2][0]); </span><span class="se">\</span><span class="s2">
serial.save('mnist_test_y.pkl', data[2][1].reshape((-1, 1)))"</span></code></pre></figure>
<h2 id="supervised-learning-using-logistic-regression">Supervised learning using logistic regression</h2>
<p>Let’s keep things simple by porting to Pylearn2 what’s pretty much the <em>Hello
World!</em> of supervised learning: logistic regression. If you haven’t already, go
read the <a href="http://www.deeplearning.net/tutorial/logreg.html#logreg">deeplearning.net tutorial</a>
on logistic regression. Here’s what we have to do:</p>
<ul>
<li>Implement the negative log-likelihood (NLL) loss in our <code class="language-plaintext highlighter-rouge">Cost</code> subclass</li>
<li>Initialize the model parameters W and b</li>
<li>Implement the model’s logistic regression output</li>
</ul>
<p>Let’s start by the <code class="language-plaintext highlighter-rouge">Cost</code> subclass:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">theano.tensor</span> <span class="k">as</span> <span class="n">T</span>
<span class="kn">from</span> <span class="nn">pylearn2.costs.cost</span> <span class="kn">import</span> <span class="n">Cost</span><span class="p">,</span> <span class="n">DefaultDataSpecsMixin</span>
<span class="k">class</span> <span class="nc">LogisticRegressionCost</span><span class="p">(</span><span class="n">DefaultDataSpecsMixin</span><span class="p">,</span> <span class="n">Cost</span><span class="p">):</span>
<span class="n">supervised</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">space</span><span class="p">,</span> <span class="n">source</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_data_specs</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
<span class="n">space</span><span class="p">.</span><span class="n">validate</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span> <span class="o">=</span> <span class="n">data</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">logistic_regression</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">targets</span> <span class="o">*</span> <span class="n">T</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">outputs</span><span class="p">)).</span><span class="nb">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">loss</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span></code></pre></figure>
<p>Easy enough. We assumed our model has a <code class="language-plaintext highlighter-rouge">logistic_regression</code> method which
accepts a batch of examples and computes the logistic regression output. We will
implement that method in just a moment. We also computed the loss as the average
negative log-likelihood of the targets given the logistic regression output, as
described in the deeplearning.net tutorial. Also, notice how we set <code class="language-plaintext highlighter-rouge">supervised</code>
to <code class="language-plaintext highlighter-rouge">True</code>.</p>
<p>Now for the <code class="language-plaintext highlighter-rouge">Model</code> subclass:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span>
<span class="kn">import</span> <span class="nn">theano.tensor</span> <span class="k">as</span> <span class="n">T</span>
<span class="kn">from</span> <span class="nn">pylearn2.models.model</span> <span class="kn">import</span> <span class="n">Model</span>
<span class="kn">from</span> <span class="nn">pylearn2.space</span> <span class="kn">import</span> <span class="n">VectorSpace</span>
<span class="kn">from</span> <span class="nn">pylearn2.utils</span> <span class="kn">import</span> <span class="n">sharedX</span>
<span class="k">class</span> <span class="nc">LogisticRegression</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nvis</span><span class="p">,</span> <span class="n">nclasses</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">LogisticRegression</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">nvis</span> <span class="o">=</span> <span class="n">nvis</span>
<span class="bp">self</span><span class="p">.</span><span class="n">nclasses</span> <span class="o">=</span> <span class="n">nclasses</span>
<span class="n">W_value</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">nvis</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">nclasses</span><span class="p">))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">W</span> <span class="o">=</span> <span class="n">sharedX</span><span class="p">(</span><span class="n">W_value</span><span class="p">,</span> <span class="s">'W'</span><span class="p">)</span>
<span class="n">b_value</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">nclasses</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">b</span> <span class="o">=</span> <span class="n">sharedX</span><span class="p">(</span><span class="n">b_value</span><span class="p">,</span> <span class="s">'b'</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">_params</span> <span class="o">=</span> <span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">W</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">b</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">input_space</span> <span class="o">=</span> <span class="n">VectorSpace</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">nvis</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">output_space</span> <span class="o">=</span> <span class="n">VectorSpace</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">nclasses</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">logistic_regression</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
<span class="k">return</span> <span class="n">T</span><span class="p">.</span><span class="n">nnet</span><span class="p">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">T</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">W</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">b</span><span class="p">)</span></code></pre></figure>
<p>The model’s constructor receives the dimensionality of the input and the number
of classes. It initializes the weights matrix and the bias vector with
<code class="language-plaintext highlighter-rouge">sharedX</code>. It also sets its input space to an instance of <code class="language-plaintext highlighter-rouge">VectorSpace</code> of
the dimensionality of the input (meaning it expects the input to be a batch of
examples which are all vectors of size <code class="language-plaintext highlighter-rouge">nvis</code>) and its output space to an
instance of <code class="language-plaintext highlighter-rouge">VectorSpace</code> of dimension <code class="language-plaintext highlighter-rouge">nclasses</code> (meaning it produces an output
corresponding to a batch of probability vectors, one element for each possible
class).</p>
<p>The <code class="language-plaintext highlighter-rouge">logistic_regression</code> method does pretty much what you would expect: it
returns a linear transformation of the input followed by a softmax
non-linearity.</p>
<p>How about we give it a try? Save those two code snippets in a single file (e.g.
<code class="language-plaintext highlighter-rouge">log_reg.py</code> and save the following in <code class="language-plaintext highlighter-rouge">log_reg.yaml</code>:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_train_X.pkl',
y: !pkl: 'mnist_train_y.pkl',
y_labels: 10,
},
model: !obj:log_reg.LogisticRegression {
nvis: 784,
nclasses: 10,
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: 200,
learning_rate: 1e-3,
monitoring_dataset: {
'train' : *train,
'valid' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_valid_X.pkl',
y: !pkl: 'mnist_valid_y.pkl',
y_labels: 10,
},
'test' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_test_X.pkl',
y: !pkl: 'mnist_test_y.pkl',
y_labels: 10,
},
},
cost: !obj:log_reg.LogisticRegressionCost {},
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 15
},
},
}</code></pre></figure>
<p>Run the following command:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">python <span class="nt">-c</span> <span class="s2">"from pylearn2.utils import serial; </span><span class="se">\</span><span class="s2">
train_obj = serial.load_train_file('log_reg.yaml'); </span><span class="se">\</span><span class="s2">
train_obj.main_loop()"</span></code></pre></figure>
<p>Congratulations, you just implemented your first model in Pylearn2!</p>
<p><em>(By the way, the targets you used to initialize <code class="language-plaintext highlighter-rouge">DenseDesignMatrix</code> instances
were column matrices, yet your model expects to receive one-hot encoded vectors.
The reason why you can do that is because Pylearn2 does the conversion for you
via the <code class="language-plaintext highlighter-rouge">data_specs</code> mechanism. That’s why specifying the model’s <code class="language-plaintext highlighter-rouge">input_space</code>
and <code class="language-plaintext highlighter-rouge">output_space</code> is important.</em></p>
<h2 id="unsupervised-learning-using-an-autoencoder">Unsupervised learning using an autoencoder</h2>
<p>Let’s now have a look at an unsupervised learning example: an autoencoder with
tied weights. Once again, having read <a href="http://www.deeplearning.net/tutorial/logreg.html#logreg">deeplearning.net tutorial</a>
on the subject is recommended. Here’s what we’ll do:</p>
<ul>
<li>Implement the binary cross-entropy reconstruction loss in our <code class="language-plaintext highlighter-rouge">Cost</code> subclass</li>
<li>Initialize the model parameters W and b</li>
<li>Implement the model’s reconstruction logic</li>
</ul>
<p>Let’s start again by the <code class="language-plaintext highlighter-rouge">Cost</code> subclass:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">theano.tensor</span> <span class="k">as</span> <span class="n">T</span>
<span class="kn">from</span> <span class="nn">pylearn2.costs.cost</span> <span class="kn">import</span> <span class="n">Cost</span><span class="p">,</span> <span class="n">DefaultDataSpecsMixin</span>
<span class="k">class</span> <span class="nc">AutoencoderCost</span><span class="p">(</span><span class="n">DefaultDataSpecsMixin</span><span class="p">,</span> <span class="n">Cost</span><span class="p">):</span>
<span class="n">supervised</span> <span class="o">=</span> <span class="bp">False</span>
<span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">space</span><span class="p">,</span> <span class="n">source</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_data_specs</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
<span class="n">space</span><span class="p">.</span><span class="n">validate</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">data</span>
<span class="n">X_hat</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">reconstruct</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">X</span> <span class="o">*</span> <span class="n">T</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">X_hat</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">X</span><span class="p">)</span> <span class="o">*</span> <span class="n">T</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">X_hat</span><span class="p">)).</span><span class="nb">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">loss</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span></code></pre></figure>
<p>We assumed our model has a <code class="language-plaintext highlighter-rouge">reconstruction</code> method which encodes and decodes its
input. We also computed the loss as the average binary cross-entropy between the
input and its reconstruction. This time, however, we set <code class="language-plaintext highlighter-rouge">supervised</code> to
<code class="language-plaintext highlighter-rouge">False</code>.</p>
<p>Now for the <code class="language-plaintext highlighter-rouge">Model</code> subclass:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span>
<span class="kn">import</span> <span class="nn">theano.tensor</span> <span class="k">as</span> <span class="n">T</span>
<span class="kn">from</span> <span class="nn">pylearn2.models.model</span> <span class="kn">import</span> <span class="n">Model</span>
<span class="kn">from</span> <span class="nn">pylearn2.space</span> <span class="kn">import</span> <span class="n">VectorSpace</span>
<span class="kn">from</span> <span class="nn">pylearn2.utils</span> <span class="kn">import</span> <span class="n">sharedX</span>
<span class="k">class</span> <span class="nc">Autoencoder</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nvis</span><span class="p">,</span> <span class="n">nhid</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">Autoencoder</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">nvis</span> <span class="o">=</span> <span class="n">nvis</span>
<span class="bp">self</span><span class="p">.</span><span class="n">nhid</span> <span class="o">=</span> <span class="n">nhid</span>
<span class="n">W_value</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">nvis</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">nhid</span><span class="p">))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">W</span> <span class="o">=</span> <span class="n">sharedX</span><span class="p">(</span><span class="n">W_value</span><span class="p">,</span> <span class="s">'W'</span><span class="p">)</span>
<span class="n">b_value</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">nhid</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">b</span> <span class="o">=</span> <span class="n">sharedX</span><span class="p">(</span><span class="n">b_value</span><span class="p">,</span> <span class="s">'b'</span><span class="p">)</span>
<span class="n">c_value</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">nvis</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">c</span> <span class="o">=</span> <span class="n">sharedX</span><span class="p">(</span><span class="n">c_value</span><span class="p">,</span> <span class="s">'c'</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">_params</span> <span class="o">=</span> <span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">W</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">b</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">c</span><span class="p">]</span>
<span class="bp">self</span><span class="p">.</span><span class="n">input_space</span> <span class="o">=</span> <span class="n">VectorSpace</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">nvis</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">reconstruct</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">T</span><span class="p">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">T</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">W</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">b</span><span class="p">)</span>
<span class="k">return</span> <span class="n">T</span><span class="p">.</span><span class="n">nnet</span><span class="p">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">T</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">W</span><span class="p">.</span><span class="n">T</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">c</span><span class="p">)</span></code></pre></figure>
<p>The constructor looks a lot like for the logistic regression example, except
that this time we don’t need to specify the model’s output space.</p>
<p>The <code class="language-plaintext highlighter-rouge">reconstruct</code> method simply encodes and decodes its input.</p>
<p>Let’s try to train it. Save the two code snippets in a single file (e.g.
<code class="language-plaintext highlighter-rouge">autoencoder.py</code> and save the following in <code class="language-plaintext highlighter-rouge">autoencoder.yaml</code>:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_train_X.pkl',
},
model: !obj:autoencoder.Autoencoder {
nvis: 784,
nhid: 200,
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: 200,
learning_rate: 1e-3,
monitoring_dataset: {
'train' : *train,
'valid' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_valid_X.pkl',
},
'test' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_test_X.pkl',
},
},
cost: !obj:autoencoder.AutoencoderCost {},
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 15
},
},
}</code></pre></figure>
<p>Run the following command:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">python <span class="nt">-c</span> <span class="s2">"from pylearn2.utils import serial; </span><span class="se">\</span><span class="s2">
train_obj = serial.load_train_file('autoencoder.yaml'); </span><span class="se">\</span><span class="s2">
train_obj.main_loop()"</span></code></pre></figure>
<h1 id="what-have-we-gained">What have we gained?</h1>
<p>At this point you might be thinking <em>“There’s still boilerplate code to write;
what have we gained?”</em></p>
<p>The answer is we gained access to a plethora of scripts, model parts, costs and
training algorithms all built into Pylearn2. You don’t have to re-invent the
wheel anymore when you wish to train using SGD and momentum. You want to switch
from SGD to BGD? In Pylearn2 this is as simple as changing the training
algorithm description in your YAML file.</p>
<p>Like I said earlier, what I’m showing is the <strong>bare minimum</strong> needed to
implement a model in Pylearn2. Nothing prevents you from digging deeper in the
codebase and overriding some methods to gain new functionalities.</p>
<p>Here’s an example of how a few more lines of code can do a lot for you in
Pylearn2.</p>
<h2 id="monitoring-various-quantities-during-training">Monitoring various quantities during training</h2>
<p>Let’s monitor the classification error of our logistic regression classifier.</p>
<p>To do so, you’ll have to override <code class="language-plaintext highlighter-rouge">Model</code>’s <code class="language-plaintext highlighter-rouge">get_monitoring_data_specs</code> and
<code class="language-plaintext highlighter-rouge">get_monitoring_channels</code> methods. The former specifies what the model needs for
its monitoring, and in which format they should be provided. The latter does the
actual monitoring by returning an <code class="language-plaintext highlighter-rouge">OrderedDict</code> mapping string identifiers to
their quantities.</p>
<p>Let’s look at how it’s done. Add the following to <code class="language-plaintext highlighter-rouge">LogisticRegression</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># Keeps things compatible for Python 2.6
</span><span class="kn">from</span> <span class="nn">theano.compat.python2x</span> <span class="kn">import</span> <span class="n">OrderedDict</span>
<span class="kn">from</span> <span class="nn">pylearn2.space</span> <span class="kn">import</span> <span class="n">CompositeSpace</span>
<span class="k">class</span> <span class="nc">LogisticRegression</span><span class="p">(</span><span class="n">Model</span><span class="p">):</span>
<span class="c1"># (Your previous code)
</span>
<span class="k">def</span> <span class="nf">get_monitoring_data_specs</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">space</span> <span class="o">=</span> <span class="n">CompositeSpace</span><span class="p">([</span><span class="bp">self</span><span class="p">.</span><span class="n">get_input_space</span><span class="p">(),</span>
<span class="bp">self</span><span class="p">.</span><span class="n">get_target_space</span><span class="p">()])</span>
<span class="n">source</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">get_input_source</span><span class="p">(),</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_target_source</span><span class="p">())</span>
<span class="k">return</span> <span class="p">(</span><span class="n">space</span><span class="p">,</span> <span class="n">source</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_monitoring_channels</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">space</span><span class="p">,</span> <span class="n">source</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_monitoring_data_specs</span><span class="p">()</span>
<span class="n">space</span><span class="p">.</span><span class="n">validate</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">data</span>
<span class="n">y_hat</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">logistic_regression</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">error</span> <span class="o">=</span> <span class="n">T</span><span class="p">.</span><span class="n">neq</span><span class="p">(</span><span class="n">y</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">),</span> <span class="n">y_hat</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)).</span><span class="n">mean</span><span class="p">()</span>
<span class="k">return</span> <span class="n">OrderedDict</span><span class="p">([(</span><span class="s">'error'</span><span class="p">,</span> <span class="n">error</span><span class="p">)])</span></code></pre></figure>
<p>The content of <code class="language-plaintext highlighter-rouge">get_monitoring_data_specs</code> may look cryptic at first.
Documentation for data specs can be found
<a href="http://deeplearning.net/software/pylearn2/internal/data_specs.html">here</a>, but
all you have to know is that this is the standard way in Pylearn2 to request a
tuple whose first element represents features and second element represents
targets.</p>
<p>The content of <code class="language-plaintext highlighter-rouge">get_monitoring_channels</code> should more familiar. We start by
checking <code class="language-plaintext highlighter-rouge">data</code> just as in <code class="language-plaintext highlighter-rouge">Cost</code> subclasses’ implementation of <code class="language-plaintext highlighter-rouge">expr</code>, and we
separate <code class="language-plaintext highlighter-rouge">data</code> into features and targets. We then get predictions by
calling <code class="language-plaintext highlighter-rouge">logistic_regression</code> and compute the average error the standard way.
We return an <code class="language-plaintext highlighter-rouge">OrderedDict</code> mapping <code class="language-plaintext highlighter-rouge">'error'</code> to the Theano expression for the
classification error.</p>
<p>Launch training again using</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">python <span class="nt">-c</span> <span class="s2">"from pylearn2.utils import serial; </span><span class="se">\</span><span class="s2">
train_obj = serial.load_train_file('log_reg.yaml'); </span><span class="se">\</span><span class="s2">
train_obj.main_loop()"</span></code></pre></figure>
<p>and you’ll see the classification error being displayed with other monitored
quantities.</p>
<h1 id="whats-next">What’s next?</h1>
<p>The examples given in this tutorial are obviously very simplistic and could be
easily replaced by existing parts of Pylearn2. They do, however, show the path
one needs to take to implement arbitrary ideas in Pylearn2.</p>
<p>In order not to reinvent the wheel, it is oftentimes useful to dig into
Pylearn2’s codebase to see what’s implemented. For instance, the VAE framework
I wrote relies on the MLP framework to represent the mapping from inputs to
conditional distribution parameters.</p>
<p>Although code reuse is desirable, the ease with which it can be acomplished
depends a lot on the level of familiarity you have with Pylearn2 and how
different your model is from what’s already in there. You should never feel
ashamed to dump a bunch of Theano code inside <code class="language-plaintext highlighter-rouge">Model</code> subclass’ method like I
showed here if that’s what works for you. Modularity and code reuse can be
brought to your code gradually and at your own pace, and in the meantime you can
still benefit from Pylearn2’s features, like human-readable experiment
descriptions, automatic monitoring of various quantities, easily-interchangeable
training algorithms and so on.</p>
<p><a href="https://vdumoulin.github.io/articles/extending-pylearn2">Your models in Pylearn2</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on October 10, 2014.</p>https://vdumoulin.github.io/articles/introducing-vae2014-10-08T00:00:00-04:002014-10-08T00:00:00-04:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p><em>After quite some time spent on the pull request, I’m proud to announce that
the VAE model is now integrated in Pylearn2. In this post, I’ll go over the
main features of the VAE framework and how to extend it. I will assume the
reader is familiar with the VAE model. If not, have a look at my <a href="https://vdumoulin.github.io/articles/vae-demo">VAE demo
webpage</a> as well as the
<a href="http://arxiv.org/abs/1312.6114">(Kingma and Welling)</a> and <a href="http://arxiv.org/abs/1401.4082">(Rezende et
al.)</a> papers.</em></p>
<h1 id="the-model">The model</h1>
<p>A VAE comes with three moving parts:</p>
<ul>
<li>the prior distribution \(p_\theta(\mathbf{z})\) on latent vector
\(\mathbf{z}\)</li>
<li>the conditional distribution \(p_\theta(\mathbf{x} \mid \mathbf{z})\)
on observed vector \(\mathbf{x}\)</li>
<li>the approximate posterior distribution \(q_\phi(\mathbf{z} \mid
\mathbf{x})\) on latent vector \(\mathbf{z}\)</li>
</ul>
<p>The parameters \(\phi\) and \(\theta\) are arbitrary functions of
\(\mathbf{x}\) and \(\mathbf{z}\) respectively.</p>
<p>The model is trained to minimize the expected reconstruction loss of
\(\mathbf{x}\) under \(q_\phi(\mathbf{z} \mid \mathbf{x})\) and the
KL-divergence between the prior and posterior distributions on \(\mathbf{z}\)
at the same time.</p>
<p>In order to backpropagate the gradient on the reconstruction loss through the
function mapping \(\mathbf{x}\) to parameters \(\phi\), the
reparametrization trick is used, which allows sampling from \(\mathbf{z}\) by
considering it as a deterministic function of \(\mathbf{x}\) and some noise
\(\mathbf{\epsilon}\).</p>
<h1 id="the-vae-framework">The VAE framework</h1>
<h2 id="overview">Overview</h2>
<h3 id="pylearn2modelsvaevae">pylearn2.models.vae.VAE</h3>
<p>The VAE model is represented in Pylearn2 by the <code class="language-plaintext highlighter-rouge">VAE</code> class. It is responsible
for high-level computation, such as computing the log-likelihood lower bound
or an importance sampling estimate of the log-likelihood, and acts as the
interface between the model and other parts of Pylearn2.</p>
<p>It delegates much of its functionality to three objects:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">pylearn2.models.vae.conditional.Conditional</code></li>
<li><code class="language-plaintext highlighter-rouge">pylearn2.models.vae.prior.Prior</code></li>
<li><code class="language-plaintext highlighter-rouge">pylearn2.models.vae.kl.KLIntegrator</code></li>
</ul>
<h3 id="pylearn2modelsvaeconditionalconditional">pylearn2.models.vae.conditional.Conditional</h3>
<p><code class="language-plaintext highlighter-rouge">Conditional</code> is used to represent conditional distributions in the VAE
framework (namely the approximate posterior on \(\mathbf{z}\) and the
conditional on \(\mathbf{x}\)). It is responsible for mapping its input to
parameters of the conditional distribution it represents, sampling from the
conditional distribution with or without the reparametrization trick and
computing the conditional log-likelihood of the distribution it represents given
some samples.</p>
<p>Internally, the mapping from input to parameters of the conditional distribution
is done via an <code class="language-plaintext highlighter-rouge">MLP</code> instance. This allows users familiar with the MLP framework
to easily switch between different architectures for the encoding and
decoding networks.</p>
<h3 id="pylearn2modelsvaepriorprior">pylearn2.models.vae.prior.Prior</h3>
<p><code class="language-plaintext highlighter-rouge">Prior</code> is used to represent the prior distribution on \(\mathbf{z}\) in the
VAE framework. It is responsible for sampling from the prior distribution and
computing the log-likelihood of the distribution it represents given some
samples.</p>
<h3 id="pylearn2modelsvaeklklintegrator">pylearn2.models.vae.kl.KLIntegrator</h3>
<p>Some combinations of prior and posterior distributions (e.g. a gaussian prior
with diagonal covariance matrix and a gaussian posterior with diagonal
covariance matrix) allow the analytic integration of the KL term in the VAE
criterion. <code class="language-plaintext highlighter-rouge">KLIntegrator</code> is responsible for representing this analytic
expression and optionally representing it as a sum of elementwise KL terms, when
such decomposition is allowed by the choice of prior and posterior
distributions.</p>
<p>This allows the VAE framework to be more modular: otherwise, the analytical
computation of the KL term would require that the prior and the posterior
distributions are defined in the same class.</p>
<p>Subclasses of <code class="language-plaintext highlighter-rouge">KLIntegrator</code> define one subclass of <code class="language-plaintext highlighter-rouge">Prior</code> and one subclass of
<code class="language-plaintext highlighter-rouge">Conditional</code> as class attributes and can carry out the analytic computation of
the KL term <strong>for these two subclasses only</strong>. The <code class="language-plaintext highlighter-rouge">pylearn2.models.vae.kl</code>
module also contains a method which can automatically infer which subclass of
<code class="language-plaintext highlighter-rouge">KLIntegrator</code> is compatible with the current choice of prior and posterior, and
<code class="language-plaintext highlighter-rouge">VAE</code> automatically falls back to a stochastic approximation of the KL term when
the analytical computation is not possible.</p>
<h3 id="pylearn2costsvaevaeimportancesamplingcriterion">pylearn2.costs.vae.{VAE,ImportanceSampling}Criterion</h3>
<p>Two <code class="language-plaintext highlighter-rouge">Cost</code> objects are compatible with the VAE framework: <code class="language-plaintext highlighter-rouge">VAECriterion</code> and
<code class="language-plaintext highlighter-rouge">ImportanceSamplingCriterion</code>. <code class="language-plaintext highlighter-rouge">VAECriterion</code> represent the VAE criterion as
defined in <a href="http://arxiv.org/abs/1312.6114">(Kingma and Welling)</a>, while
<code class="language-plaintext highlighter-rouge">ImportanceSamplingCriterion</code> defines a cost based on the importance sampling
approximation of the marginal log-likelihood which allows backpropagation
through \(q_\phi(\mathbf{z} \mid \mathbf{x})\) via the
reparametrization trick.</p>
<h2 id="using-the-framework">Using the framework</h2>
<h3 id="training-the-example-model">Training the example model</h3>
<p>Let’s go over a small example on how to train a VAE on MNIST digits.</p>
<p>In this example I’ll be using
<a href="http://www.mit.edu/~rsalakhu/papers/dbn_ais.pdf">Salakhutdinov and Murray</a>’s
binarized version of the MNIST dataset. Make sure the <code class="language-plaintext highlighter-rouge">PYLEARN2_DATA_PATH</code>
environment variable is set properly, and download the data using</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">python pylearn2/scripts/datasets/download_binarized_mnist.py</code></pre></figure>
<p>Here’s the YAML file we’ll be using for the example:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.binarized_mnist.BinarizedMNIST {
which_set: 'train',
},
model: !obj:pylearn2.models.vae.VAE {
nvis: &nvis 784,
nhid: &nhid 100,
prior: !obj:pylearn2.models.vae.prior.DiagonalGaussianPrior {},
conditional: !obj:pylearn2.models.vae.conditional.BernoulliVector {
name: 'conditional',
mlp: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h_1',
dim: 200,
irange: 0.001,
},
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h_2',
dim: 200,
irange: 0.001,
},
],
},
},
posterior: !obj:pylearn2.models.vae.conditional.DiagonalGaussian {
name: 'posterior',
mlp: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h_1',
dim: 200,
irange: 0.001,
},
],
},
},
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: 200,
learning_rate: 1e-3,
learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
init_momentum: 0.05,
},
monitoring_dataset: {
'train' : *train,
'valid' : !obj:pylearn2.datasets.binarized_mnist.BinarizedMNIST {
which_set: 'valid',
},
'test' : !obj:pylearn2.datasets.binarized_mnist.BinarizedMNIST {
which_set: 'test',
},
},
cost: !obj:pylearn2.costs.vae.VAECriterion {
num_samples: 1,
},
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 150
},
update_callbacks: [
!obj:pylearn2.training_algorithms.sgd.ExponentialDecay {
decay_factor: 1.00005,
min_lr: 0.00001
},
],
},
extensions: [
!obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
channel_name: 'valid_objective',
save_path: "${PYLEARN2_TRAIN_FILE_FULL_STEM}_best.pkl",
},
!obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
final_momentum: .95,
start: 5,
saturate: 6
},
],
}</code></pre></figure>
<p>Give it a try:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Assuming your YAML file is called ${YOUR_FILE_NAME}.yaml</span>
python pylearn2/scripts/train.py <span class="k">${</span><span class="nv">YOUR_FILE_NAME</span><span class="k">}</span>.yaml</code></pre></figure>
<p>This might take a while, but you can accelerate things using the appropriate
Theano flags to train using a GPU.</p>
<p>You’ll see a couple things being monitored while the model learns:</p>
<ul>
<li><strong>{train,valid,test}_objective</strong> tracks the value of the VAE criterion for
the training, validation and test sets.</li>
<li><strong>{train,valid,test}_expectation_term</strong> tracks the expected reconstruction
of the input under the posterior distribution averaged across the training,
validation and test sets.</li>
<li><strong>{train,valid,test}_kl_divergence_term</strong> tracks the KL-divergence between
the posterior and the prior distributions averaged across the training,
validation and test sets.</li>
</ul>
<h3 id="evaluating-the-trained-model">Evaluating the trained model</h3>
<p><strong>N.B.: At the moment of writing this post, there are no scripts in Pylearn2 to
evaluate trained models by looking at samples or computing an approximate NLL.
This is definitely something that will be included in the future, but for the
moment here are some workarounds taken from my personal scripts.</strong></p>
<p>When training is complete, you can look at samples from the model by running the
following bit of Python code:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">argparse</span>
<span class="kn">import</span> <span class="nn">numpy</span>
<span class="kn">import</span> <span class="nn">theano</span>
<span class="kn">from</span> <span class="nn">pylearn2.config</span> <span class="kn">import</span> <span class="n">yaml_parse</span>
<span class="kn">from</span> <span class="nn">pylearn2.gui.patch_viewer</span> <span class="kn">import</span> <span class="n">PatchViewer</span>
<span class="kn">from</span> <span class="nn">pylearn2.utils</span> <span class="kn">import</span> <span class="n">serial</span>
<span class="k">def</span> <span class="nf">show</span><span class="p">(</span><span class="n">vis_batch</span><span class="p">,</span> <span class="n">dataset</span><span class="p">,</span> <span class="n">mapback</span><span class="p">,</span> <span class="n">pv</span><span class="p">,</span> <span class="n">rows</span><span class="p">,</span> <span class="n">cols</span><span class="p">):</span>
<span class="c1"># Random selection of a subset of vis_batch to display
</span> <span class="n">index_array</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">vis_batch</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">vis_batch_subset</span> <span class="o">=</span> <span class="n">vis_batch</span><span class="p">[</span><span class="n">index_array</span><span class="p">[:</span><span class="n">rows</span> <span class="o">*</span> <span class="n">cols</span><span class="p">]]</span>
<span class="n">display_batch</span> <span class="o">=</span> <span class="n">dataset</span><span class="p">.</span><span class="n">adjust_for_viewer</span><span class="p">(</span><span class="n">vis_batch_subset</span><span class="p">)</span>
<span class="k">if</span> <span class="n">display_batch</span><span class="p">.</span><span class="n">ndim</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">display_batch</span> <span class="o">=</span> <span class="n">dataset</span><span class="p">.</span><span class="n">get_topological_view</span><span class="p">(</span><span class="n">display_batch</span><span class="p">)</span>
<span class="n">display_batch</span> <span class="o">=</span> <span class="n">display_batch</span><span class="p">.</span><span class="n">transpose</span><span class="p">(</span><span class="nb">tuple</span><span class="p">(</span>
<span class="n">dataset</span><span class="p">.</span><span class="n">X_topo_space</span><span class="p">.</span><span class="n">axes</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">axis</span><span class="p">)</span> <span class="k">for</span> <span class="n">axis</span> <span class="ow">in</span> <span class="p">(</span><span class="s">'b'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s">'c'</span><span class="p">)</span>
<span class="p">))</span>
<span class="k">if</span> <span class="n">mapback</span><span class="p">:</span>
<span class="n">design_vis_batch</span> <span class="o">=</span> <span class="n">vis_batch_subset</span>
<span class="k">if</span> <span class="n">design_vis_batch</span><span class="p">.</span><span class="n">ndim</span> <span class="o">!=</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">design_vis_batch</span> <span class="o">=</span> <span class="n">dataset</span><span class="p">.</span><span class="n">get_design_matrix</span><span class="p">(</span><span class="n">design_vis_batch</span><span class="p">)</span>
<span class="n">mapped_batch_design</span> <span class="o">=</span> <span class="n">dataset</span><span class="p">.</span><span class="n">mapback_for_viewer</span><span class="p">(</span><span class="n">design_vis_batch</span><span class="p">)</span>
<span class="n">mapped_batch</span> <span class="o">=</span> <span class="n">dataset</span><span class="p">.</span><span class="n">get_topological_view</span><span class="p">(</span><span class="n">mapped_batch_design</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">rows</span><span class="p">):</span>
<span class="n">row_start</span> <span class="o">=</span> <span class="n">cols</span> <span class="o">*</span> <span class="n">i</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">cols</span><span class="p">):</span>
<span class="n">pv</span><span class="p">.</span><span class="n">add_patch</span><span class="p">(</span><span class="n">display_batch</span><span class="p">[</span><span class="n">row_start</span><span class="o">+</span><span class="n">j</span><span class="p">,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:],</span>
<span class="n">rescale</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">if</span> <span class="n">mapback</span><span class="p">:</span>
<span class="n">pv</span><span class="p">.</span><span class="n">add_patch</span><span class="p">(</span><span class="n">mapped_batch</span><span class="p">[</span><span class="n">row_start</span><span class="o">+</span><span class="n">j</span><span class="p">,</span> <span class="p">:,</span> <span class="p">:,</span> <span class="p">:],</span>
<span class="n">rescale</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">pv</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">show_samples</span><span class="p">(</span><span class="n">model</span><span class="p">):</span>
<span class="n">num_samples</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">rows</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">cols</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">dataset_yaml_src</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">dataset_yaml_src</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">yaml_parse</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">dataset_yaml_src</span><span class="p">)</span>
<span class="n">vis_batch</span> <span class="o">=</span> <span class="n">dataset</span><span class="p">.</span><span class="n">get_batch_topo</span><span class="p">(</span><span class="n">num_samples</span><span class="p">)</span>
<span class="n">rval</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">vis_batch</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="n">dataset</span><span class="p">.</span><span class="n">X_topo_space</span><span class="p">.</span><span class="n">axes</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">axis</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">axis</span> <span class="ow">in</span> <span class="p">(</span><span class="s">'b'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s">'c'</span><span class="p">))</span>
<span class="n">_</span><span class="p">,</span> <span class="n">patch_rows</span><span class="p">,</span> <span class="n">patch_cols</span><span class="p">,</span> <span class="n">channels</span> <span class="o">=</span> <span class="n">rval</span>
<span class="n">mapback</span> <span class="o">=</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="s">'mapback_for_viewer'</span><span class="p">)</span>
<span class="n">pv</span> <span class="o">=</span> <span class="n">PatchViewer</span><span class="p">((</span><span class="n">rows</span><span class="p">,</span> <span class="n">cols</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">+</span><span class="n">mapback</span><span class="p">)),</span>
<span class="p">(</span><span class="n">patch_rows</span><span class="p">,</span> <span class="n">patch_cols</span><span class="p">),</span>
<span class="n">is_color</span><span class="o">=</span><span class="p">(</span><span class="n">channels</span> <span class="o">==</span> <span class="mi">3</span><span class="p">))</span>
<span class="n">samples</span><span class="p">,</span> <span class="n">expectations</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="n">num_samples</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">theano</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="p">[],</span> <span class="n">outputs</span><span class="o">=</span><span class="n">expectations</span><span class="p">)</span>
<span class="n">samples_batch</span> <span class="o">=</span> <span class="n">f</span><span class="p">()</span>
<span class="n">show</span><span class="p">(</span><span class="n">samples_batch</span><span class="p">,</span> <span class="n">dataset</span><span class="p">,</span> <span class="n">mapback</span><span class="p">,</span> <span class="n">pv</span><span class="p">,</span> <span class="n">rows</span><span class="p">,</span> <span class="n">cols</span><span class="p">)</span>
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="p">.</span><span class="n">ArgumentParser</span><span class="p">()</span>
<span class="n">parser</span><span class="p">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s">"model_path"</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s">"path to the pickled model"</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="p">.</span><span class="n">parse_args</span><span class="p">()</span>
<span class="n">model_path</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">model_path</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">serial</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">model_path</span><span class="p">)</span>
<span class="n">show_samples</span><span class="p">(</span><span class="n">model</span><span class="p">)</span></code></pre></figure>
<p>Look at samples by typing</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Assuming your YAML file is called ${YOUR_FILE_NAME}.yaml and your sampling</span>
<span class="c"># script is named ${SAMPLING_SCRIPT}.py</span>
python <span class="k">${</span><span class="nv">SAMPLING_SCRIPT</span><span class="k">}</span>.py <span class="k">${</span><span class="nv">YOUR_FILE_NAME</span><span class="k">}</span>.pkl</code></pre></figure>
<p>You can also make use of <code class="language-plaintext highlighter-rouge">VAE.log_likelihood_approximation</code> to compute
approximate NLL performance measures of the trained model:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">argparse</span>
<span class="kn">import</span> <span class="nn">numpy</span>
<span class="kn">import</span> <span class="nn">theano</span>
<span class="kn">import</span> <span class="nn">theano.tensor</span> <span class="k">as</span> <span class="n">T</span>
<span class="kn">from</span> <span class="nn">pylearn2.config</span> <span class="kn">import</span> <span class="n">yaml_parse</span>
<span class="kn">from</span> <span class="nn">pylearn2.utils</span> <span class="kn">import</span> <span class="n">as_floatX</span><span class="p">,</span> <span class="n">serial</span>
<span class="k">def</span> <span class="nf">print_nll</span><span class="p">(</span><span class="n">model</span><span class="p">):</span>
<span class="n">dataset_yaml_src</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">dataset_yaml_src</span>
<span class="n">train_set</span> <span class="o">=</span> <span class="n">yaml_parse</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">dataset_yaml_src</span><span class="p">)</span>
<span class="n">valid_set</span> <span class="o">=</span> <span class="n">yaml_parse</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">dataset_yaml_src</span><span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"train"</span><span class="p">,</span> <span class="s">"valid"</span><span class="p">))</span>
<span class="n">test_set</span> <span class="o">=</span> <span class="n">yaml_parse</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">dataset_yaml_src</span><span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"train"</span><span class="p">,</span> <span class="s">"test"</span><span class="p">))</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">T</span><span class="p">.</span><span class="n">matrix</span><span class="p">(</span><span class="s">'X'</span><span class="p">)</span>
<span class="n">importance_sampling_nll</span> <span class="o">=</span> <span class="o">-</span><span class="n">model</span><span class="p">.</span><span class="n">log_likelihood_approximation</span><span class="p">(</span>
<span class="n">X</span><span class="o">=</span><span class="n">X</span><span class="p">,</span>
<span class="n">num_samples</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
<span class="p">).</span><span class="n">mean</span><span class="p">()</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">theano</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">X</span><span class="p">],</span> <span class="n">outputs</span><span class="o">=</span><span class="n">importance_sampling_nll</span><span class="p">)</span>
<span class="n">batch_size</span> <span class="o">=</span> <span class="mi">100</span>
<span class="c1"># Train
</span> <span class="n">train_importance_sampling_nll_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">numpy_X</span> <span class="o">=</span> <span class="n">as_floatX</span><span class="p">(</span><span class="n">train_set</span><span class="p">.</span><span class="n">get_design_matrix</span><span class="p">())</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">numpy_X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="n">batch_size</span><span class="p">):</span>
<span class="n">numpy_X_batch</span> <span class="o">=</span> <span class="n">numpy_X</span><span class="p">[</span><span class="n">batch_size</span> <span class="o">*</span> <span class="n">i</span><span class="p">:</span> <span class="n">batch_size</span> <span class="o">*</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)]</span>
<span class="n">train_importance_sampling_nll_list</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="p">(</span><span class="n">numpy_X_batch</span><span class="p">))</span>
<span class="c1"># Valid
</span> <span class="n">valid_importance_sampling_nll_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">numpy_X</span> <span class="o">=</span> <span class="n">as_floatX</span><span class="p">(</span><span class="n">valid_set</span><span class="p">.</span><span class="n">get_design_matrix</span><span class="p">())</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">numpy_X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="n">batch_size</span><span class="p">):</span>
<span class="n">numpy_X_batch</span> <span class="o">=</span> <span class="n">numpy_X</span><span class="p">[</span><span class="n">batch_size</span> <span class="o">*</span> <span class="n">i</span><span class="p">:</span> <span class="n">batch_size</span> <span class="o">*</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)]</span>
<span class="n">valid_importance_sampling_nll_list</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="p">(</span><span class="n">numpy_X_batch</span><span class="p">))</span>
<span class="c1"># Test
</span> <span class="n">test_importance_sampling_nll_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">numpy_X</span> <span class="o">=</span> <span class="n">as_floatX</span><span class="p">(</span><span class="n">test_set</span><span class="p">.</span><span class="n">get_design_matrix</span><span class="p">())</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">numpy_X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="n">batch_size</span><span class="p">):</span>
<span class="n">numpy_X_batch</span> <span class="o">=</span> <span class="n">numpy_X</span><span class="p">[</span><span class="n">batch_size</span> <span class="o">*</span> <span class="n">i</span><span class="p">:</span> <span class="n">batch_size</span> <span class="o">*</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)]</span>
<span class="n">test_importance_sampling_nll_list</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="p">(</span><span class="n">numpy_X_batch</span><span class="p">))</span>
<span class="k">print</span> <span class="s">"Train NLL approximation: "</span> <span class="o">+</span> \
<span class="nb">str</span><span class="p">(</span><span class="n">numpy</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">train_importance_sampling_nll_list</span><span class="p">))</span>
<span class="k">print</span> <span class="s">"Valid NLL approximation: "</span> <span class="o">+</span> \
<span class="nb">str</span><span class="p">(</span><span class="n">numpy</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">valid_importance_sampling_nll_list</span><span class="p">))</span>
<span class="k">print</span> <span class="s">" Test NLL approximation: "</span> <span class="o">+</span> \
<span class="nb">str</span><span class="p">(</span><span class="n">numpy</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">test_importance_sampling_nll_list</span><span class="p">))</span>
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="p">.</span><span class="n">ArgumentParser</span><span class="p">()</span>
<span class="n">parser</span><span class="p">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s">"model_path"</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s">"path to the pickled model"</span><span class="p">)</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="p">.</span><span class="n">parse_args</span><span class="p">()</span>
<span class="n">model_path</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">model_path</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">serial</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">model_path</span><span class="p">)</span>
<span class="n">print_nll</span><span class="p">(</span><span class="n">model</span><span class="p">)</span></code></pre></figure>
<p>All you have to do is type</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Assuming your YAML file is called ${YOUR_FILE_NAME}.yaml and your NLL</span>
<span class="c"># script is named ${NLL_SCRIPT}.py</span>
python <span class="k">${</span><span class="nv">NLL_SCRIPT</span><span class="k">}</span>.py <span class="k">${</span><span class="nv">YOUR_FILE_NAME</span><span class="k">}</span>.pkl</code></pre></figure>
<h3 id="more-details">More details</h3>
<p>Let’s concentrate on this part of the YAML file:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">model: !obj:pylearn2.models.vae.VAE {
nvis: &nvis 784,
nhid: &nhid 100,
prior: !obj:pylearn2.models.vae.prior.DiagonalGaussianPrior {},
conditional: !obj:pylearn2.models.vae.conditional.BernoulliVector {
name: 'conditional',
mlp: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h_1',
dim: 200,
irange: 0.001,
},
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h_2',
dim: 200,
irange: 0.001,
},
],
},
},
posterior: !obj:pylearn2.models.vae.conditional.DiagonalGaussian {
name: 'posterior',
mlp: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h_1',
dim: 200,
irange: 0.001,
},
],
},
},
}</code></pre></figure>
<p>We define the dimensionality of \(\mathbf{x}\) through <code class="language-plaintext highlighter-rouge">nvis</code> and the
dimensionality of \(\mathbf{z}\) through <code class="language-plaintext highlighter-rouge">nhid</code>.</p>
<p>At a high level, the form of the prior, posterior and conditional distributions
is selected through the choice of which subclasses to instantiate. Here we chose
a gaussian prior with diagonal covariance matrix, a gaussian posterior with
diagonal covariance matrix and a product of bernoulli as the conditional for
\(\mathbf{x}\).</p>
<p>Note that we did not explicitly tell the model how to integrate the KL: it was
able to find it on its own by calling
<code class="language-plaintext highlighter-rouge">pylearn2.models.vae.kl.find_integrator_for</code>, which searched
<code class="language-plaintext highlighter-rouge">pylearn2.models.vae.kl</code> for a match and returned an instance of
<code class="language-plaintext highlighter-rouge">DiagonalGaussianPriorPosteriorKL</code>. If you were to explicitly tell the model how
to integrate the KL term (for instance, if you have defined a new prior and a
new <code class="language-plaintext highlighter-rouge">KLIntegrator</code> subclass to go with it), you would need to add</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">kl_integrator: !obj:pylearn2.models.vae.kl.DiagonalGaussianPriorPosteriorKL {}</code></pre></figure>
<p>as a parameter to <code class="language-plaintext highlighter-rouge">VAE</code>’s constructor.</p>
<p><code class="language-plaintext highlighter-rouge">Conditional</code> instances (passed as <code class="language-plaintext highlighter-rouge">conditional</code> and <code class="language-plaintext highlighter-rouge">posterior</code> parameters)
need a name upon instantiation. This is to avoid key collisions in the
monitoring channels.</p>
<p>They’re also given <strong>nested</strong> <code class="language-plaintext highlighter-rouge">MLP</code> instance. Why this is needed will become
clear soon. Notice how the last layers’ dimensionality does
not match either <code class="language-plaintext highlighter-rouge">nhid</code> or <code class="language-plaintext highlighter-rouge">nvis</code>. This is because they represent the last
hidden representation from which the conditional parameters will be computed.
You did not have to specify the layer mapping the last hidden representation to
the conditional parameters because it was automatically inferred: after
everything is instantiated, <code class="language-plaintext highlighter-rouge">VAE</code> calls <code class="language-plaintext highlighter-rouge">initialize_parameters</code> on <code class="language-plaintext highlighter-rouge">prior</code>,
<code class="language-plaintext highlighter-rouge">conditional</code> and <code class="language-plaintext highlighter-rouge">posterior</code> and gives them relevant information about their
input and output spaces. At that point, <code class="language-plaintext highlighter-rouge">Conditional</code> has enough information to
infer how the last layer should look like. It calls its private
<code class="language-plaintext highlighter-rouge">_get_default_output_layer</code> method, which returns a sane default output layer,
and adds it to its MLP’s list of layers. This is why a nested MLP is required:
this allows <code class="language-plaintext highlighter-rouge">Conditional</code> to delay the initialization of the MLP’s input space
in order to add a layer to it in a clean fashion.</p>
<p>Naturally, you may want to decide on your own how parameters should be computed
based on the last hidden representation. This can be done through
<code class="language-plaintext highlighter-rouge">Conditional</code>’s <code class="language-plaintext highlighter-rouge">output_layer_required</code> constructor parameter. It is set to
<code class="language-plaintext highlighter-rouge">True</code> by default, but you can switch it off and explicitly put the last layer
in the MLP. For instance, you could decide that the gaussian posterior’s
\(\log \sigma\) should not be too big or too small and want to force it to
be between -1 and 1 by using a <em>tanh</em> non-linearity. It can be done like so:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">posterior: !obj:pylearn2.models.vae.conditional.DiagonalGaussian {
name: 'posterior',
output_layer_required: 0,
mlp: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h_1',
dim: 200,
irange: 0.001,
},
!obj:pylearn2.models.mlp.CompositeLayer {
layer_name: 'phi',
layers: [
!obj:pylearn2.models.mlp.Linear {
layer_name: 'mu',
dim: *nhid,
irange: 0.001,
},
!obj:pylearn2.models.mlp.Tanh {
layer_name: 'log_sigma',
dim: *nhid,
irange: 0.001,
},
],
},
],
},
}</code></pre></figure>
<p>There are safeguards in place to make sure your code won’t crash without
explanation in the case of a mistake: <code class="language-plaintext highlighter-rouge">Conditional</code> will verify that the custom
output layer you put in MLP has the same output space as what it expects and
raise an exception otherwise. Every <code class="language-plaintext highlighter-rouge">Conditional</code> subclass need to define how
should the conditional parameters look like through a private
<code class="language-plaintext highlighter-rouge">_get_required_mlp_output_space</code> method, and you should make sure that your
custom output layer has the right output space by looking at the code. Moreover,
you should have a look at the subclass’ <code class="language-plaintext highlighter-rouge">_get_default_output_space</code>
implementation to see what is the nature and order of the conditional parameters
being computed.</p>
<h2 id="extending-the-vae-framework">Extending the VAE framework</h2>
<p><em>This post will be updated soon with more information on how to write your own
subclasses of <code class="language-plaintext highlighter-rouge">Prior</code>, <code class="language-plaintext highlighter-rouge">Conditional</code> and <code class="language-plaintext highlighter-rouge">KLIntegrator</code>.</em></p>
<p><a href="https://vdumoulin.github.io/articles/introducing-vae">Introducing the VAE framework in Pylearn2</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on October 08, 2014.</p>https://vdumoulin.github.io/articles/vae-demo2014-08-28T00:00:00-04:002014-08-28T00:00:00-04:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>This is a tiny post to advertise the demo (available
<a href="http://vdumoulin.github.io/morphing_faces">here</a>) I built using a variational
autoencoder trained on images of faces.</p>
<p>There is an online version, but if you have the required Python dependencies
installed (numpy and matplotlib), I <em>strongly</em> recommend you check out the
offline demo, which is smoother and more interactive.</p>
<p>Have fun!</p>
<p><a href="https://vdumoulin.github.io/articles/vae-demo">Variational Autoencoder Demo</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on August 28, 2014.</p>https://vdumoulin.github.io/articles/rnn-part-22014-04-30T00:00:00-04:002014-04-30T00:00:00-04:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>Building on Jung-Hyung’s
<a href="http://jychung.wordpress.com/2014/04/05/how-should-we-shrink-alpha/">encouraging results</a>,
I tried going smaller and training an RNN to overfit a single phone.</p>
<p>I implemented gradient clipping (my version rescales the gradient norm when it
exceeds a certain threshold) and tried increasing the depth of the
hidden-to-hidden transition, as suggested in
<a href="http://arxiv.org/pdf/1312.6026v4.pdf">Razvan’s paper</a>.</p>
<p>The resulting model has the following properties:</p>
<ul>
<li>Input consists of the 240 previous acoustic samples</li>
<li>Hidden state has 100 dimensions</li>
<li>Input-to-hidden function is linear</li>
<li>Hidden-to-hidden transition is a 3-layer convolutional network (two
convolutional rectified linear layers and a linear layer)</li>
<li>Hidden non-linearity is the hyperbolic tangent</li>
<li>Hidden-to-output function is linear</li>
</ul>
<p>It was trained to predict the next acoustic sample given a ground truth of 240
previous samples on a single ‘aa’ phone for 250 epochs, yielding an MSE of
0.009.</p>
<p>Here are the audio files:</p>
<p>Original:</p>
<audio src="https://vdumoulin.github.io/sounds/original_phone.wav" controls=""> </audio>
<p>Ground-truth-based reconstruction:</p>
<audio src="https://vdumoulin.github.io/sounds/prediction_phone.wav" controls=""> </audio>
<p>Prediction-based reconstruction:</p>
<audio src="https://vdumoulin.github.io/sounds/reconstruction_phone.wav" controls=""> </audio>
<p>And here’s a visual representation of the files (red is the original, blue is
using ground truth and green is the prediction-based reconstruction):</p>
<p><img src="https://vdumoulin.github.io/images/phone_audio.png" alt="Phone audio reconstruction" /></p>
<p>Unfortunately, as you can see (and hear), it’s not on par yet with Jung-Hyung’s
results, even with the extensions to the original model.</p>
<p><a href="https://vdumoulin.github.io/articles/rnn-part-2">RNNs Part Two</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on April 30, 2014.</p>https://vdumoulin.github.io/articles/rnn-part-12014-03-28T00:00:00-04:002014-03-28T00:00:00-04:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>This week I focused on training an RNN to solve our task. The RNN’s structure
is really simple: it maps the <em>k</em> previous samples and the phone of the sample
to predict to a recurrent hidden layer, which itself linearly maps to the
output. The input is a sliding window of fixed length over the sequence and the
phones information.</p>
<p><img src="https://vdumoulin.github.io/images/rnn_figure.png" alt="RNN model" /></p>
<p>For starters, I’m interested in overfitting a <em>single</em> utterance, i.e. given the
first <em>k</em> samples of the sequence and a sequence of phone information, I’d like
to be able to perfectly reconstruct the whole sequence. I trained my <a href="https://github.com/vdumoulin/research/blob/master/code/pylearn2/models/rnn.py">toy RNN
model</a>
using <a href="https://github.com/vdumoulin/research/blob/master/experiments/timit/rnn.yaml">this script</a>
and then compared the original sequence with two types of reconstructions:</p>
<ol>
<li>the reconstruction you get when sequentially predictiong the next sample
using the ground truth as the <em>k</em> previous samples and the phone information</li>
<li>the reconstruction you get when sequentially predictiong the next sample
using the previously-predicted samples as the <em>k</em> previous samples and the
phone information</li>
</ol>
<p>Here are the audio files:</p>
<p>Original:</p>
<audio src="https://vdumoulin.github.io/sounds/original.wav" controls=""> </audio>
<p>Ground-truth-based reconstruction:</p>
<audio src="https://vdumoulin.github.io/sounds/prediction.wav" controls=""> </audio>
<p>Prediction-based reconstruction:</p>
<audio src="https://vdumoulin.github.io/sounds/reconstruction.wav" controls=""> </audio>
<p>For reference, the model converges to a 0.426 mean squared error, although this
number cannot be compared with other experiments. As you can see, although the
model isn’t that bad for ground-truth-based reconstruction, it performs <em>very</em>
poorly when the only information available is the <em>k</em> first samples of the
sequence and the phone information.</p>
<p>Note that I haven’t tried to apply the good practice recommendations for RNNs
(i.e. gradient clipping and regularization) yet; for now I was interested in
running a quick experiment and making sure my code and scripts were working
properly.</p>
<p>One interesting thing I noticed was that I had to keep the number of recurrent
hidden units quite low (in the order of 100 units), otherwise the error would
start to go up during training (is there an exploding gradient effect at play
when increasing the number of hidden units?).</p>
<p>Next week I’d like to implement regularization and gradient clipping techniques
in my toy RNN and see if it improves results.</p>
<p><a href="https://vdumoulin.github.io/articles/rnn-part-1">Starting on RNNs</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on March 28, 2014.</p>https://vdumoulin.github.io/articles/timit-part-62014-03-19T00:00:00-04:002014-03-19T00:00:00-04:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>Lately I’ve been working on enabling Pylearn2 to iterate over variable-length
sequences. In this post, I’ll discuss my progress so far.</p>
<h3 id="the-problem">The problem</h3>
<p>Some types of models (such as convolutional or recurrent neural nets) naturally
deal with variable-length inputs. Unfortunately, for the moment, this type of
input is not well supported in Pylearn2: all <code class="language-plaintext highlighter-rouge">Space</code> subclasses expect the data
to be a tensor whose first dimension is the batch axis and whose other
dimensions are of fixed size. This means a sequence of fixed-sized elements
cannot be stored in those spaces, because all time steps of the sequence would
be considered as separate examples.</p>
<p>Even more fundamentally, there is no straightforward way to represent data
structures containing variable-length elements in Theano. This means even if we
solve the <code class="language-plaintext highlighter-rouge">Space</code> problem in Pylearn2, we’re limited to batches of size 1 unless
some <code class="language-plaintext highlighter-rouge">TypedList</code> data structure is implemented in Theano.</p>
<h3 id="new-spaces">New spaces</h3>
<p>I wrote two new <code class="language-plaintext highlighter-rouge">Space</code> subclasses (<code class="language-plaintext highlighter-rouge">VectorSequenceSpace</code> and
<code class="language-plaintext highlighter-rouge">IndexSequenceSpace</code>) to deal with variable-length sequences. They’re very
similar to the corresponding <code class="language-plaintext highlighter-rouge">VectorSpace</code> and <code class="language-plaintext highlighter-rouge">IndexSpace</code>, with few key
differences:</p>
<ul>
<li>Because of Theano restrictions, an object in living in a <code class="language-plaintext highlighter-rouge">*SequenceSpace</code> is
considered to represent a <em>single</em> example, unlike e.g. <code class="language-plaintext highlighter-rouge">VectorSpace</code>, which
considers objects as batches of examples.</li>
<li>A <code class="language-plaintext highlighter-rouge">*SequenceSpace</code> expects objects living in its space to be matrices whose
first dimension is time and whose second dimension represent a fixed-sized
state, e.g. a features vector.</li>
<li>In order to enforce the fact that we’re dealing with a <em>single</em> example, it
is impossible to convert a <code class="language-plaintext highlighter-rouge">*SequenceSpace</code> into a <code class="language-plaintext highlighter-rouge">*Space</code>. Doing otherwise
would give rise to confusing behaviour: by going from a <code class="language-plaintext highlighter-rouge">VectorSequenceSpace</code>
to a <code class="language-plaintext highlighter-rouge">VectorSpace</code>, suddenly every time step of the sequence is considered as
a separate example. The only conversion allowed is from an
<code class="language-plaintext highlighter-rouge">IndexSequenceSpace</code> to a <code class="language-plaintext highlighter-rouge">VectorSequenceSpace</code>.</li>
<li>Some methods such as <code class="language-plaintext highlighter-rouge">get_total_dimension()</code> don’t make sense when dealing
with variable-length sequences and are not implemented.</li>
</ul>
<h3 id="new-timit-wrapper">New TIMIT wrapper</h3>
<p>I also wrote a new TIMIT wrapper called <code class="language-plaintext highlighter-rouge">TIMITSequences</code>, which uses
<code class="language-plaintext highlighter-rouge">VectorSequenceSpace</code> and <code class="language-plaintext highlighter-rouge">IndexSequenceSpace</code> to represent its data. Iterating
over this dataset returns whole sequences. These sequences are segmented in
frames of <code class="language-plaintext highlighter-rouge">frame_length</code> and form matrices whose first dimension is time and
whose second dimension is what a sliding window of this length sees as it’s
passing through the sequence.</p>
<p>As a proof-of-concept, I also wrote a toy RNN model (which you can find
<a href="https://github.com/vdumoulin/research/blob/master/code/pylearn2/models/rnn.py">here</a>)
to train on this dataset. I haven’t had time to play with it a lot, but I hope
to find time to do so this week and next week and present some results in
another blog post.</p>
<p><a href="https://vdumoulin.github.io/articles/timit-part-6">Iterating over variable-length sequences</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on March 19, 2014.</p>https://vdumoulin.github.io/articles/timit-part-52014-03-03T00:00:00-05:002014-03-03T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>Good news: the pull request fixing a bug with <code class="language-plaintext highlighter-rouge">Space</code> classes got merged, which
means we’re now able to combine phones information with acoustic samples.</p>
<p>In this post, I’ll show you how it’s done. <strong>Note: make sure that you have the
latest version of Pylearn2 and of the TIMIT dataset for Pylearn2</strong></p>
<h3 id="data-specs-how-do-they-work">Data specs, how do they work?</h3>
<p>A given dataset might offer multiple inputs and multiple targets. Multiple parts
of the learning pipeline in Pylearn2 require data in order to work: <code class="language-plaintext highlighter-rouge">Model</code>,
<code class="language-plaintext highlighter-rouge">Cost</code> and <code class="language-plaintext highlighter-rouge">Monitor</code> all need input data and, optionally, target data.
Furthermore, it is possible that they all require their own formatting for the
data.</p>
<p>In order to bridge between what a dataset offers and what the pipeline needs and
minimize the number of <code class="language-plaintext highlighter-rouge">TensorVariable</code>s created, Pylearn2 uses so-called
<code class="language-plaintext highlighter-rouge">data_specs</code>, which serve two purposes:</p>
<ul>
<li>Describe what the dataset has to offer, and in which format.</li>
<li>Describe which portion of the data a part of the learning pipeline needs, and
in which format.</li>
</ul>
<p><code class="language-plaintext highlighter-rouge">data_specs</code> have the following structure:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">(Space, str or nested tuples of str)</code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">data_specs</code> are tuples which contain two types of information: spaces and
sources. Sources are strings uniquely identifying a data source (e.g.
<code class="language-plaintext highlighter-rouge">'features'</code>, <code class="language-plaintext highlighter-rouge">'targets'</code>, <code class="language-plaintext highlighter-rouge">'phones'</code>, etc.) Spaces specify how these sources
are formatted (e.g. <code class="language-plaintext highlighter-rouge">VectorSpace</code>, <code class="language-plaintext highlighter-rouge">IndexSpace</code>, etc.) and their nested
structure correspond to the nested structure of the sources. For instance, one
valid <code class="language-plaintext highlighter-rouge">data_specs</code> could be</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">data_specs = (CompositeSpace([CompositeSpace([VectorSpace(dim=100),
VectorSpace(dim=62)),
VectorSpace(dim=1)]),
(('features', 'phones'), 'targets'))</code></pre></figure>
<p>and would mean that a part of the model is requesting examples to be a tuple
containing</p>
<ul>
<li>a tuple of batches, one of shape <code class="language-plaintext highlighter-rouge">(batch_size, 100)</code> containing features
and one of shape <code class="language-plaintext highlighter-rouge">(batch_size, 62)</code> containing a one-hot encoded phone index
for the next acoustic sample to predict</li>
<li>a batch of shape <code class="language-plaintext highlighter-rouge">(batch_size, 1)</code> containing targets, i.e. the next acoustic
sample that needs to be predicted</li>
</ul>
<p>Pylearn2 is smart enough to aggregate <code class="language-plaintext highlighter-rouge">data_specs</code> from all parts of the
pipeline and create one single, non-redundant and flat <code class="language-plaintext highlighter-rouge">data_specs</code> that’s the
union of all <code class="language-plaintext highlighter-rouge">data_specs</code> and which is used to create <code class="language-plaintext highlighter-rouge">TensorVariable</code>s used
throughout the pipeline. It is able to map those variables back to the nested
representations specified by individual <code class="language-plaintext highlighter-rouge">data_specs</code> so that every part of the
pipeline receives exactly what it needs in the requested format.</p>
<h3 id="data-specs-applied-to-dataset-sub-classes">Data specs applied to <code class="language-plaintext highlighter-rouge">Dataset</code> sub-classes</h3>
<p>Datasets implement a <code class="language-plaintext highlighter-rouge">get_data_specs</code> method which returns a flat <code class="language-plaintext highlighter-rouge">data_specs</code>
containing what the model has to offer, and in which format. For instance,
TIMIT’s <code class="language-plaintext highlighter-rouge">data_specs</code> looks like this:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">(CompositeSpace([VectorSpace(dim=frame_length * frames_per_example),
VectorSpace(dim=frame_length),
IndexSpace(dim=1, max_labels=num_phones),
IndexSpace(dim=1, max_labels=num_phonemes),
IndexSpace(dim=1, max_labels=num_words)],
('features', 'targets', 'phones', 'phonemes', 'words'))</code></pre></figure>
<h3 id="data-specs-applied-to-model-sub-classes">Data specs applied to <code class="language-plaintext highlighter-rouge">Model</code> sub-classes</h3>
<p>In order for your model to receive the correct data, it needs to implement the
following methods:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">get_input_space</code></li>
<li><code class="language-plaintext highlighter-rouge">get_output_space</code></li>
<li><code class="language-plaintext highlighter-rouge">get_input_source</code></li>
<li><code class="language-plaintext highlighter-rouge">get_target_source</code></li>
</ul>
<p><em>(For those of you who are curious, it is the <code class="language-plaintext highlighter-rouge">Cost</code>’s responsibility to
provide the requested <code class="language-plaintext highlighter-rouge">data_specs</code>, and it does so by calling those four methods
on the <code class="language-plaintext highlighter-rouge">Model</code>)</em></p>
<p>Luckily for us, both <code class="language-plaintext highlighter-rouge">get_input_space</code> and <code class="language-plaintext highlighter-rouge">get_output_space</code> are implemented in
the <code class="language-plaintext highlighter-rouge">Model</code> base class and return <code class="language-plaintext highlighter-rouge">self.input_space</code> and <code class="language-plaintext highlighter-rouge">self.output_space</code>
respectively, so all that is needed is to give <code class="language-plaintext highlighter-rouge">self.input_space</code> and
<code class="language-plaintext highlighter-rouge">self.output_space</code> the desired values when instantiating the <code class="language-plaintext highlighter-rouge">Model</code>. However,
in Pylearn2’s current state, <code class="language-plaintext highlighter-rouge">get_input_source</code> and <code class="language-plaintext highlighter-rouge">get_target_source</code> returns
<code class="language-plaintext highlighter-rouge">'features'</code> and <code class="language-plaintext highlighter-rouge">'targets'</code> respectively, so they need to be overrided if we
want anything else than those two sources.</p>
<h3 id="data-specs-for-the-mlp-framework">Data specs for the MLP framework</h3>
<p>The current state of the MLP framework does not allow to change sources to
something other than <code class="language-plaintext highlighter-rouge">'features'</code> and <code class="language-plaintext highlighter-rouge">'targets'</code>, but the following sub-classes
will do what we want:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pylearn2.models.mlp</span> <span class="kn">import</span> <span class="n">MLP</span><span class="p">,</span> <span class="n">CompositeLayer</span>
<span class="kn">from</span> <span class="nn">pylearn2.space</span> <span class="kn">import</span> <span class="n">CompositeSpace</span>
<span class="kn">from</span> <span class="nn">theano.compat.python2x</span> <span class="kn">import</span> <span class="n">OrderedDict</span>
<span class="k">class</span> <span class="nc">MLPWithSource</span><span class="p">(</span><span class="n">MLP</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">input_source</span> <span class="o">=</span> <span class="n">kwargs</span><span class="p">.</span><span class="n">pop</span><span class="p">(</span><span class="s">'input_source'</span><span class="p">,</span> <span class="s">'features'</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">target_source</span> <span class="o">=</span> <span class="n">kwargs</span><span class="p">.</span><span class="n">pop</span><span class="p">(</span><span class="s">'target_source'</span><span class="p">,</span> <span class="s">'targets'</span><span class="p">)</span>
<span class="nb">super</span><span class="p">(</span><span class="n">MLPWithSource</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_input_source</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">input_source</span>
<span class="k">def</span> <span class="nf">get_target_source</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">target_source</span>
<span class="k">class</span> <span class="nc">CompositeLayerWithSource</span><span class="p">(</span><span class="n">CompositeLayer</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_input_source</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">tuple</span><span class="p">([</span><span class="n">layer</span><span class="p">.</span><span class="n">get_input_source</span><span class="p">()</span> <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">get_target_source</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">tuple</span><span class="p">([</span><span class="n">layer</span><span class="p">.</span><span class="n">get_target_source</span><span class="p">()</span> <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">set_input_space</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">space</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">input_space</span> <span class="o">=</span> <span class="n">space</span>
<span class="k">for</span> <span class="n">layer</span><span class="p">,</span> <span class="n">component</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">,</span> <span class="n">space</span><span class="p">.</span><span class="n">components</span><span class="p">):</span>
<span class="n">layer</span><span class="p">.</span><span class="n">set_input_space</span><span class="p">(</span><span class="n">component</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">output_space</span> <span class="o">=</span> <span class="n">CompositeSpace</span><span class="p">(</span><span class="nb">tuple</span><span class="p">(</span><span class="n">layer</span><span class="p">.</span><span class="n">get_output_space</span><span class="p">()</span>
<span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">fprop</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">state_below</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">layer</span><span class="p">.</span><span class="n">fprop</span><span class="p">(</span><span class="n">component_state</span><span class="p">)</span> <span class="k">for</span>
<span class="n">layer</span><span class="p">,</span> <span class="n">component_state</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">,</span> <span class="n">state_below</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">get_monitoring_channels</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">OrderedDict</span><span class="p">()</span></code></pre></figure>
<p>Combined with the following YAML file, you should finally be able to train with
previous acoustic samples and the phone associated with the acoustic sample to
predict:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">!obj:pylearn2.train.Train {
dataset: &train !obj:research.code.pylearn2.datasets.timit.TIMIT {
which_set: 'train',
frame_length: 1,
frames_per_example: &fpe 100,
start: 0,
stop: 100,
},
model: !obj:mlp_with_source.MLPWithSource {
batch_size: 512,
layers: [
!obj:mlp_with_source.CompositeLayerWithSource {
layer_name: 'c',
layers: [
!obj:pylearn2.models.mlp.Linear {
layer_name: 'h1',
dim: 100,
irange: 0.05,
},
!obj:pylearn2.models.mlp.Linear {
layer_name: 'h2',
dim: 62,
irange: 0.05,
},
],
},
!obj:pylearn2.models.mlp.Linear {
layer_name: 'o',
dim: 1,
irange: 0.05,
},
],
input_space: !obj:pylearn2.space.CompositeSpace {
components: [
!obj:pylearn2.space.VectorSpace {
dim: 100,
},
!obj:pylearn2.space.VectorSpace {
dim: 62,
},
],
},
input_source: ['features', 'phones'],
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
learning_rate: .01,
monitoring_dataset: {
'train': *train,
},
cost: !obj:pylearn2.costs.mlp.Default {},
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 10,
},
},
}</code></pre></figure>
<p>Try it out and tell me if it works for you!</p>
<p><a href="https://vdumoulin.github.io/articles/timit-part-5">Combining acoustic samples and phones information</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on March 03, 2014.</p>https://vdumoulin.github.io/articles/nade2014-02-25T00:00:00-05:002014-02-25T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>I might use neural autoregressive distribution estimators (NADEs) for the speech
synthesis project; this has to do with an idea both Guillaume Desjardins and
Yoshua Bengio talked about in the past couple days, and which I’ll detail later
on. For now, I’d like to test my understanding of NADEs by introducing them in a
blog post. As they say,</p>
<blockquote>
<p>If you want to learn something, read. If you want to understand something,
write. If you want to master something, teach.</p>
</blockquote>
<h3 id="the-idea">The idea</h3>
<p>RBMs are able to model complex distributions and work very well as generative
models, but they’re not well suited for density estimation because they present
an intractable partition function:
\[
p_{RBM}(\mathbf{v}) = \sum_{\mathbf{h}} p(\mathbf{v}, \mathbf{h})
= \sum_{\mathbf{h}} \frac{
\exp(-E(\mathbf{v}, \mathbf{h}))
}{
\sum_{\tilde{\mathbf{v}}, \tilde{\mathbf{h}}}
\exp(-E(\tilde{\mathbf{v}}, \tilde{\mathbf{h}}))
}
= \sum_{\mathbf{h}} \frac{\exp(-E(\mathbf{v}, \mathbf{h}))}{Z}
\]
We see that \(Z\) is intractable because it contains a number of terms
that’s exponential in the dimensionality of \(\mathbf{v}\) and
\(\mathbf{h}\).</p>
<p>NADE (<a href="http://jmlr.org/proceedings/papers/v15/larochelle11a/larochelle11a.pdf">original
paper</a>)
is a model proposed by <a href="http://www.dmi.usherb.ca/~larocheh/index_en.html">Hugo Larochelle</a>
and <a href="http://homepages.inf.ed.ac.uk/imurray2/">Iain Murray</a> as a way to
circumvent this difficulty by decomposing the joint distribution
\(p(\mathbf{v})\) into tractable conditional distributions. It is inspired by
an attempt to convert an RBM into a Bayesian network.</p>
<p><img src="https://vdumoulin.github.io/images/nade.png" alt="NADE" /></p>
<p>The joint probability distribution \(p(\mathbf{v})\) over observed variables
is expressed as
\[
p(\mathbf{v}) = \prod_{i=1}^D p(v_i \mid \mathbf{v}_{<i})
\]
where
\[
\begin{split}
p(v_i \mid \mathbf{v}_{<i}) &=
\text{sigm}(b_i + \mathbf{V}_{i}\cdot\mathbf{h}_i), \\<br />
\mathbf{h}_i &=
\text{sigm}(\mathbf{c} + \mathbf{W}_{<i}\cdot\mathbf{v}_{<i})
\end{split}
\]</p>
<p>As you can see both in the graph and in the joint probability, given a specific
ordering, each observed variable only depends on prior variables in the
ordering. By abusing notation a little, we can consider \(\mathbf{h}_i\) to
be a random vector whose conditional distribution is
\(
p(\mathbf{h}_i \mid \mathbf{v}_{<i})
= \delta(\text{sigm}(\mathbf{c} + \mathbf{W}_{<i}\cdot\mathbf{v}_{<i}))
\).</p>
<p>The distribution modeled by NADEs has the great advantage to be tractable, since
all of its conditional probability distributions are themselves tractable. This
means contrary to an RBM, performance can be directly measured via the negative
log-likelihood (NLL) of the dataset.</p>
<p>In (Larochelle & Murray, 2011), NADEs are shown to outperform common models with
tractable distributions and to have a performance comparable to large
intractable RBMs.</p>
<h3 id="implementation-and-results">Implementation and results</h3>
<p>I ported Jörg Bornschein’s NADE Theano implementation to Pylearn2 and used it to
reproduce Larochelle & Murray’s results on MNIST. I intend on making a pull
request out of it so it’s integrated in Pylearn2.</p>
<p>The trained model scores a <strong>-85.8 test log-likelihood</strong>, which is slightly
better than what is reported in the paper. To be fair, I made a mistake while
training and binarized training examples every time they were presented by
sampling from a Bernoulli distribution, which explains the better results.</p>
<p>Below are samples taken from the trained model (more precisely parameters of the
Bernoulli distributions that were output before the actual pixels were sampled)
and weights filters.</p>
<p><img src="https://vdumoulin.github.io/images/nade_samples.png" alt="NADE samples" /></p>
<p><img src="https://vdumoulin.github.io/images/nade_filters.png" alt="NADE filters" /></p>
<p><a href="https://vdumoulin.github.io/articles/nade">NADEs: an introduction</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on February 25, 2014.</p>https://vdumoulin.github.io/articles/timit-part-42014-02-19T00:00:00-05:002014-02-19T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>Remember my last post talking about improvements to the TIMIT dataset? Well
here’s another big improvement: thanks to Laurent’s and David’s help, I was able
to <em>massively</em> reduce memory footprint, which was the main thing on my to-do
list for this week.</p>
<h3 id="a-memory-time-trade-off">A memory-time trade-off</h3>
<p>In order to quickly map example indexes to their actual location in the data,
an array storing this information was computed and kept in memory upon
instantiation. At first, this seems like the right thing to do: thanks to this,
no matter which example you request, you’ll be able to get it in constant time.</p>
<p>The problem is that the number of possible training examples is huge: the
validation set by itself roughly contains 24 <em>million</em> examples if you consider
an example to be 100 consecutive audio samples followed by one target audio
sample. This means even the array mapping example indexes to data location was
big. The problem was particularly apparent when the frame length was small.
Given that we are to predict the next acoustic sample based on the <em>k</em> previous
ones plus the current phoneme (meaning our frame size is 1) as a first step,
something had to be done.</p>
<p>The solution David, Laurent and I agreed on is to trade memory performance with
time performance by computing the locations on-the-fly, and it turned out to
work pretty well: now even working with a frame size of 1 is doable in terms of
memory. Even better, the changes do not seem to impact performance
significantly.</p>
<p>I encourage you to try the dataset (see
<a href="https://github.com/vdumoulin/research/blob/master/code/pylearn2/datasets/timit.py">here</a>)
and tell me if it works for you.</p>
<p><a href="https://vdumoulin.github.io/articles/timit-part-4">(Yet) another update on the state of TIMIT dataset in Pylearn2</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on February 19, 2014.</p>https://vdumoulin.github.io/articles/timit-part-32014-02-18T00:00:00-05:002014-02-18T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>Last week I continued working on the Pylearn2 implementation of the TIMIT
dataset, so I figured now would be the time to write a quick progress report.</p>
<h3 id="more-data-integration">More data integration</h3>
<p>Thanks to Laurent Dinh’s precious help, more data is available:</p>
<ul>
<li>Phones</li>
<li>Phonemes</li>
<li>Words</li>
</ul>
<p>Later this week I’d like to make a blog post to show how this information can be
used.</p>
<h3 id="data-standardization">Data standardization</h3>
<p>Audio sequences are now normalized, with mean and standard deviation being
computed across all sequences of all sets (train, valid and test). Those
values are saved to help with generative tasks.</p>
<h3 id="better-memory-footprint">Better memory footprint</h3>
<p>With Jean-Philippe Raymond’s help, the number of arrays needed to store
information necessary to generate batches of examples on the fly has been
reduced.</p>
<p>The batches returned by the iterator are now stored in-place, in a buffer, to
reduce the number of memory allocations during the lifetime of the dataset.</p>
<h3 id="what-remains-to-be-done">What remains to be done</h3>
<p>There’s still room for improvement in terms of memory usage. For instance, the
array which maps example indexes to their location in data arrays can get quite
big, especially if the length of a frame is very small.</p>
<p><a href="https://vdumoulin.github.io/articles/timit-part-3">Another update on the state of TIMIT dataset in Pylearn2</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on February 18, 2014.</p>https://vdumoulin.github.io/articles/timit-part-22014-02-12T00:00:00-05:002014-02-12T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>I’m almost done implementing a first version the TIMIT dataset in <code class="language-plaintext highlighter-rouge">Pylearn2</code>.
You can find the code in my
<a href="https://github.com/vdumoulin/research">public research repository</a>. Let’s look
at what the problem was and how I solved it.</p>
<h3 id="the-challenge">The challenge</h3>
<p>My goal was to implement a dataset that can provide as training examples
sequences of <em>k</em> audio frames as features and the following frame as target. The
frames have a fixed length, and can overlap by a fixed number of samples.</p>
<p>A naive approach would be to allocate a new 2D numpy array and fill it with
every example you can generate from your audio sequence. This approach does not
scale, and here’s why: say you have 200 acoustic samples that you need to
separate into 20-samples-long frames overlapping by 5 samples. Allocating memory
for each frame would involve having 13 distinct frames: frame 1 gets samples 1
through 20, frame 2 gets samples 15 through 35, …, and frame 13 gets samples
180 through 200. Already, you can see the overlap adds 60 duplicated frames if
you were to enumerate them explicitly. It gets worse, though: say you want to
predict the next frame based on the two previous frames. Then your training set
would have 11 examples: the first example gets frames 1 and 2 as features and
frame 3 as target, the second example gets frames 2 and 3 as features and frame
4 as target, …, and the eleventh example gets frame 11 and 12 as features and
frame 13 as target. If you were to list all examples explicitly, you would have
660 acoustic samples, more than <em>three times</em> the length of your original audio
sequence. When dealing with thousands of audio sequences of thousands of
acoustic samples each, this quickly becomes impractical.</p>
<h3 id="the-solution">The solution</h3>
<p>Obviously, any practical solution would involve keeping a compact representation
of the data in memory and having some sort of mapping to the training examples.</p>
<p>One nice thing about <code class="language-plaintext highlighter-rouge">numpy</code> is that it gives you the ability to manipulate the
<a href="http://en.wikipedia.org/wiki/Stride_of_an_array">strides</a> of your arrays. This
makes it possible to create a view of a numpy array in which data is segmented
into overlapping frames without touching to the actual array (see
<a href="http://wiki.scipy.org/Cookbook/SegmentAxis?action=AttachFile&do=get&target=segmentaxis.py">this script</a>).</p>
<p>If you have a numpy array of numpy arrays (all of your audio sequences), you
can segment each sequence by calling the <code class="language-plaintext highlighter-rouge">segment_axis</code> method on it and then
build two additional numpy arrays whose rows represent training examples: the
first one maps to a sequence index and the starting (inclusive) and ending
(exclusive) frames of the example’s features, and the second one maps to a
sequence index and the example’s target frame. You can then write a <code class="language-plaintext highlighter-rouge">get()</code>
method which takes a list of example indexes and builds the example batch by
using the two “mapping” arrays and the array of sequences.</p>
<p>This way, you only have to change a small part of the iterator: instead of
acting directly upon a reference pointing to the raw data of your dataset, it
calls the dataset’s <code class="language-plaintext highlighter-rouge">get()</code> method, which builds and returns the batch of
example needed.</p>
<h3 id="a-caveat">A caveat</h3>
<p>For now the dataset only manages acoustic samples; this means no phones /
auxiliary information. I’m working on this with Laurent Dinh, and I’ll keep you
informed of our progress.</p>
<h3 id="example-yaml-file">Example YAML file</h3>
<p>You can look
<a href="https://github.com/vdumoulin/research/blob/master/experiments/timit/mlp.yaml">here</a>
for a (completely unrealistic) example on how to use the dataset in a YAML file.</p>
<p><a href="https://vdumoulin.github.io/articles/timit-part-2">An update on the state of TIMIT dataset in Pylearn2</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on February 12, 2014.</p>https://vdumoulin.github.io/articles/timit-in-pylearn22014-02-09T00:00:00-05:002014-02-09T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p>This is a small post just to let you know the current state of the TIMIT dataset
in Pylearn2. You can find the source code
<a href="https://github.com/vdumoulin/research/blob/master/code/pylearn2/datasets/timit.py">here</a>.</p>
<p>I’m mostly done working on the initialization, thanks to Laurent Dinh’s
<a href="https://github.com/laurent-dinh/mumbler/blob/master/dataset/timit.py">code</a>.</p>
<p>The dataset is able to load all relevant files, but only the acoustic samples
are used. For now I won’t bother including phones/phonemes and auxiliary speaker
information, as I have already plenty to manage with the acoustic samples
already.</p>
<p>The biggest problem I’m facing is the lack of support for variable-length
sequences in Pylearn2. The library is mostly built around the assumption that
your data will be a matrix of training examples (with examples being stored in
the matrix’s rows) and a matrix of training targets.</p>
<p>One way to circumvent that is to transform the dataset into a matrix of training
examples each containing a sequence of k frames and a matrix of training targets
each containing the next frame after its corresponding sequence. The problem is
it causes lots of duplication in memory.</p>
<p>Another solution would be to keep the dataset as an array of variable-length
sequences and maintain a <em>visiting order</em> list of tuples containing the index
of a sequence and the index of the starting frame in the sequence. This is where
I’m currently headed. One problem with this solution is that no iterator built
in Pylearn2 is suited to working with the <em>visiting order</em> list. I’ll have to
write one on my own, which might take some time, as I’m not fully fluent with
the whole <em>data specs</em> framework used in Pylearn2.</p>
<p>Conclusion: if you’re waiting for me to finish the TIMIT dataset implementation
in Pylearn2, this might take some time; you’d be better off working directly in
Theano with Laurent’s TIMIT class for now.</p>
<p><a href="https://vdumoulin.github.io/articles/timit-in-pylearn2">State of TIMIT dataset in Pylearn2</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on February 09, 2014.</p>https://vdumoulin.github.io/articles/drbm2014-02-02T00:00:00-05:002014-02-02T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p><em>This semester I’m taking Yoshua Bengio’s representation learning class
(IFT6266). In addition to formal evaluation, we’re also evaluated in the context
of a big class project, in which we compete against each other to find the best
solution to a machine learning problem. We’re to maintain a blog detailing our
progress, and we can cite or be cited by other students, in analogy to what’s
done in actual research.</em></p>
<p>Suppose you have a good, real-valued representation of audio frames and you wish
to learn the distribution of the next audio frame conditioned on the previous
one. The following DBM can achieve that:</p>
<p><img src="https://vdumoulin.github.io/images/gaussian_dbm.png" alt="Gaussian DBM" /></p>
<p>It is a three-layered DBM whose first and last layers are gaussian and whose
hidden layer is binary.</p>
<p>Let’s start with the energy function:</p>
<p>\[
E(\mathbf{x}^t, \mathbf{h}, \mathbf{x}^{t+1}) =
E_{bias}(\mathbf{x}^t, \mathbf{h}, \mathbf{x}^{t+1}) +
E_{interact}(\mathbf{x}^t, \mathbf{h}, \mathbf{x}^{t+1})
\]</p>
<p>with</p>
<p>\[
E_{bias}(\mathbf{x}^t, \mathbf{h}, \mathbf{x}^{t+1}) =
\frac{1}{2}(\mathbf{x}^t - \mathbf{b})^T(\mathbf{x}^t - \mathbf{b})</p>
<ul>
<li>\mathbf{c}^T\mathbf{h}</li>
<li>\frac{1}{2}(\mathbf{x}^{t+1} - \mathbf{d})^T(\mathbf{x}^{t+1} - \mathbf{d})
\]</li>
</ul>
<p>and</p>
<p>\[
E_{interact}(\mathbf{x}^t, \mathbf{h}, \mathbf{x}^{t+1}) =</p>
<ul>
<li>\mathbf{h}^T\mathbf{W}\mathbf{x}^t</li>
<li>(\mathbf{x}^{t+1})^T\mathbf{U}\mathbf{h}
\]</li>
</ul>
<p>Sparing you the algebraic details, conditional probabilities for this model are
given by</p>
<p>\[
\begin{split}
p(\mathbf{x}^t \mid \mathbf{h}, \mathbf{x^{t+1}}) &=
\mathcal{N}(\mathbf{x}^t \mid \mathbf{b} + \mathbf{W}^T\mathbf{h},
\mathbf{I}), \\<br />
p(\mathbf{h} \mid \mathbf{x}^t, \mathbf{x^{t+1}}) &=
\text{sigmoid}(\mathbf{c} + \mathbf{W}\mathbf{x}^t
+ \mathbf{U}^T\mathbf{x}^{t+1}), \\<br />
p(\mathbf{x^{t+1}} \mid \mathbf{x}^t, \mathbf{h}) &=
\mathcal{N}(\mathbf{x}^{t+1} \mid \mathbf{d} + \mathbf{U}\mathbf{h},
\mathbf{I})
\end{split}
\]</p>
<p>and the gradient of the negative log-likelihood (NLL) of
\(p(\mathbf{x}^{t+1} \mid \mathbf{x}^t)\) is given by</p>
<p>\[
\begin{split}
\frac{\partial}{\partial \theta} -\log p(\mathbf{x}^t \mid \mathbf{x^{t+1}}) =
&\mathbb{E}_{p(\mathbf{h} \mid \mathbf{x}^t, \mathbf{x}^{t+1})} \left[
\frac{\partial}{\partial \theta} E(\mathbf{x}^t, \mathbf{h}, \mathbf{x}^{t+1})
\right] \\\ -
&\mathbb{E}_{p(\mathbf{h}, \mathbf{x}^{t+1} \mid \mathbf{x}^t)} \left[
\frac{\partial}{\partial \theta} E(\mathbf{x}^t, \mathbf{h}, \mathbf{x}^{t+1})
\right]
\end{split}
\]</p>
<p>For the positive phase, you sample from \(\mathbf{h}\) given
\( \mathbf{x}_t \) and \( \mathbf{x}_{t+1} \) and for the negative phase,
you sample from
\( \mathbf{h} \) and \( \mathbf{x}_{t+1} \) given \( \mathbf{x}_t \).</p>
<p><a href="https://vdumoulin.github.io/articles/drbm">Speech Synthesis: Gaussian DBMs</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on February 02, 2014.</p>https://vdumoulin.github.io/articles/ift6266-project2014-01-23T00:00:00-05:002014-01-23T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p><em>This semester I’m taking Yoshua Bengio’s representation learning class
(IFT6266). In addition to formal evaluation, we’re also evaluated in the context
of a big class project, in which we compete against each other to find the best
solution to a machine learning problem. We’re to maintain a blog detailing our
progress, and we can cite or be cited by other students, in analogy to what’s
done in actual research.</em></p>
<p><em>This year’s problem is <strong>speech synthesis</strong>, and I though I’d launch my
blogging effort by doing an overview of the problem.</em></p>
<h2 id="definition">Definition</h2>
<p><strong>Speech synthesis</strong> is loosely defined as the task of producing human speech
from text, <em>i.e. making a computer read text out loud</em>. Traditionally, this task
is split in two independent subtasks: <strong>text analysis</strong> and <strong>waveform
generation</strong>. The former is interested in processing text to extract the
<a href="http://en.wikipedia.org/wiki/Phoneme">phonemes</a> to be pronounced and determine
<a href="http://en.wikipedia.org/wiki/Prosody_(linguistics)">prosody</a>. The latter is
interested in converting phonemes and prosody to actual sounds.</p>
<p>One caveat of segmenting the task this way is that prosody cannot be learned
based on audio samples, since it is not part of the waveform generation task;
we rely on labeled datasets and/or heuristics instead. This means we’re throwing
away lots of information coming from audio samples.</p>
<h2 id="improving-state-of-the-art">Improving state-of-the-art</h2>
<p>One way we could improve speech synthesis is to make learning prosody part of
the waveform generation task; information about prosody would be richer because
it would be be coming directly from audio clips instead of labeled data.</p>
<p>However, this is much more involved because prosody is context-dependent, i.e.
in depends on the meaning of what is being said. For this reason, good
representation learning algorithms and deep learning algorithms in general could
be of great help to extract high-level features from the text.</p>
<p>In order to facilitate things a bit, we’ll assume text has already been
processed. The idea, then, is to build a learning algorithm which, given a
sequence of phonemes, generates a good audio representation.</p>
<p>The dataset we’ll use for this task is the <a href="http://catalog.ldc.upenn.edu/LDC93S1">TIMIT Speech
Corpus</a>, a dataset containing audio samples of
many people reading phonetically-rich sentences. The samples are accompanied by
time-aligned phonetic transcriptions, which will be our training targets: our
models should be able to predict how each phoneme will sound and when it starts
in the audio clip.</p>
<p><a href="https://vdumoulin.github.io/articles/ift6266-project">Speech Synthesis: Introduction</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on January 23, 2014.</p>https://vdumoulin.github.io/articles/pylearn2-jobman2014-01-13T00:00:00-05:002014-01-13T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<p><em>This post is adapted from an iPython Notebook I wrote which is part of a pull
request to be added to the Pylearn2 documentation. I assume the reader is
familiar with Pylearn2 (mostly its YAML file framework for describing
experiments) and with <a href="http://deeplearning.net/software/jobman/">Jobman</a>, a tool
to launch and manage experiments.</em></p>
<h2 id="the-problem">The problem</h2>
<p>Suppose you have a YAML file describing an experiment which looks like that:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train',
one_hot: 1,
start: 0,
stop: 50000
},
model: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.Sigmoid {
layer_name: 'h0',
dim: 500,
sparse_init: 15,
}, !obj:pylearn2.models.mlp.Softmax {
layer_name: 'y',
n_classes: 10,
irange: 0.
}
],
nvis: 784,
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: 100,
learning_rate: 1e-3,
learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
init_momentum: 0.5,
},
monitoring_batches: 10,
monitoring_dataset : *train,
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 50
},
},
save_path: "mlp.pkl",
save_freq : 5
}</code></pre></figure>
<p>You’re not sure if the learning rate and the momentum coefficient are optimal,
though, and you’d like to try different hyperparameter values to see if you can
come up with something better.</p>
<p>One (painful) way to do it would be to create multiple copies of the YAML file
and, for each copy, manually change the value of the learning rate and the
momentum coefficient. You’d then call the <code class="language-plaintext highlighter-rouge">train</code> script on each of these
copies. This solution is not satisfying for multiple reasons:</p>
<ul>
<li>This is long and tedious</li>
<li>There’s lot of code duplication going on</li>
<li>You’d better be sure there are no errors in the original YAML file, or else
you’re in for a nice editing ride (been there)</li>
</ul>
<p>Ideally, the solution should involve a single YAML file and some way of
specifying how hyperparameter should be handled. One such solution exists,
thanks to Pylearn2 and Jobman.</p>
<h2 id="solution-overview">Solution overview</h2>
<p>Pylearn2 can instantiate a <code class="language-plaintext highlighter-rouge">Train</code> object specified by a YAML string via the
<code class="language-plaintext highlighter-rouge">pylearn2.config.yaml_parse.load</code> method; using this method and Python’s string
substitution syntax, we can “fill the blanks” of a template YAML string based
on our original YAML file and run the experiment described by that string.</p>
<p>In order to to that, we’ll need a dictionary mapping hyperparameter names to
their value. This is where Jobman will prove useful: Jobman accepts
configuration files describing a job’s parameters, and its syntax allows to
initialize parameters by calling an external Python method. This way, we can
randomly sample hyperparameters for our experiment.</p>
<p>To summarize it all, we will</p>
<ol>
<li>Adapt the YAML file by replacing hyperparameter values with string
substitution statements</li>
<li>Write a configuration file specifying how to initialize the hyperparameter
dictionary</li>
<li>Read the YAML file into a string</li>
<li>Fill in hyperparameter values using string substitution with the
hyperparameter dictionary</li>
<li>Instantiate a <code class="language-plaintext highlighter-rouge">Train</code> object with the YAML string by calling
<code class="language-plaintext highlighter-rouge">pylearn2.config.yaml_parse.load</code></li>
<li>Call the <code class="language-plaintext highlighter-rouge">Train</code> object’s <code class="language-plaintext highlighter-rouge">main_loop</code> method</li>
<li>Extract results from the trained model</li>
</ol>
<p>Let’s break it down.</p>
<h2 id="adapting-the-yaml-file">Adapting the YAML file</h2>
<p>This step is pretty straightforward. Looking back to our example, the only lines
we have to replace are</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"> learning_rate: 1e-3,
learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
init_momentum: 0.5,
},</code></pre></figure>
<p>Using string subsitution syntax, they become</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"> learning_rate: %(learning_rate)f,
learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
init_momentum: %(init_momentum)f,
},</code></pre></figure>
<h2 id="string-substitution-and-training-logic">String substitution and training logic</h2>
<p>The next step, assuming we already have a dictionary mapping hyperparameters
to their values, would be to build a method which</p>
<ol>
<li>takes the YAML string and the hyperparameter dictionary as inputs,</li>
<li>does string substitution on the YAML string,</li>
<li>calls the <code class="language-plaintext highlighter-rouge">pylearn2.config.yaml_parse.load</code> method to instantiate a <code class="language-plaintext highlighter-rouge">Train</code>
object and calls its <code class="language-plaintext highlighter-rouge">main_loop</code> method and</li>
<li>extracts and returns results after the model is trained.</li>
</ol>
<p>Luckily for us, one such method already exists:
<code class="language-plaintext highlighter-rouge">pylearn2.scripts.jobman.experiment.train_experiment</code>.</p>
<p>This method integrates with Jobman: it expects <code class="language-plaintext highlighter-rouge">state</code> and <code class="language-plaintext highlighter-rouge">channel</code>
arguments as input and returns <code class="language-plaintext highlighter-rouge">channel.COMPLETE</code> at the end of training.
Here’s the method’s full implementation:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">train_experiment</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">channel</span><span class="p">):</span>
<span class="s">"""
Train a model specified in state, and extract required results.
This function builds a YAML string from ``state.yaml_template``, taking
the values of hyper-parameters from ``state.hyper_parameters``, creates
the corresponding object and trains it (like train.py), then run the
function in ``state.extract_results`` on it, and store the returned values
into ``state.results``.
To know how to use this function, you can check the example in tester.py
(in the same directory).
"""</span>
<span class="n">yaml_template</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">yaml_template</span>
<span class="c1"># Convert nested DD into nested ydict.
</span> <span class="n">hyper_parameters</span> <span class="o">=</span> <span class="n">expand</span><span class="p">(</span><span class="n">flatten</span><span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">hyper_parameters</span><span class="p">),</span> <span class="n">dict_type</span><span class="o">=</span><span class="n">ydict</span><span class="p">)</span>
<span class="c1"># This will be the complete yaml string that should be executed
</span> <span class="n">final_yaml_str</span> <span class="o">=</span> <span class="n">yaml_template</span> <span class="o">%</span> <span class="n">hyper_parameters</span>
<span class="c1"># Instantiate an object from YAML string
</span> <span class="n">train_obj</span> <span class="o">=</span> <span class="n">pylearn2</span><span class="p">.</span><span class="n">config</span><span class="p">.</span><span class="n">yaml_parse</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">final_yaml_str</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="nb">iter</span><span class="p">(</span><span class="n">train_obj</span><span class="p">)</span>
<span class="n">iterable</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">except</span> <span class="nb">TypeError</span><span class="p">:</span>
<span class="n">iterable</span> <span class="o">=</span> <span class="bp">False</span>
<span class="k">if</span> <span class="n">iterable</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">NotImplementedError</span><span class="p">(</span>
<span class="p">(</span><span class="s">'Current implementation does not support running multiple '</span>
<span class="s">'models in one yaml string. Please change the yaml template '</span>
<span class="s">'and parameters to contain only one single model.'</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># print "Executing the model."
</span> <span class="n">train_obj</span><span class="p">.</span><span class="n">main_loop</span><span class="p">()</span>
<span class="c1"># This line will call a function defined by the user and pass train_obj
</span> <span class="c1"># to it.
</span> <span class="n">state</span><span class="p">.</span><span class="n">results</span> <span class="o">=</span> <span class="n">jobman</span><span class="p">.</span><span class="n">tools</span><span class="p">.</span><span class="n">resolve</span><span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">extract_results</span><span class="p">)(</span><span class="n">train_obj</span><span class="p">)</span>
<span class="k">return</span> <span class="n">channel</span><span class="p">.</span><span class="n">COMPLETE</span></code></pre></figure>
<p>As you can see, it builds a dictionary out of state.hyper_parameters and uses
it to do string substitution on state.yaml_template.</p>
<p>It then instantiates the <code class="language-plaintext highlighter-rouge">Train</code> object as described in the YAML string and
calls its <code class="language-plaintext highlighter-rouge">main_loop</code> method.</p>
<p>Finally, when the method returns, it calls the method referenced in the
<code class="language-plaintext highlighter-rouge">state.extract_results</code> string by passing it the <code class="language-plaintext highlighter-rouge">Train</code> object as argument.
This method is responsible to extract any relevant results from the <code class="language-plaintext highlighter-rouge">Train</code>
object and returning them, either as is or in a <code class="language-plaintext highlighter-rouge">DD</code> object. The return value
is stored in <code class="language-plaintext highlighter-rouge">state.results</code>.</p>
<h2 id="writing-the-extraction-method">Writing the extraction method</h2>
<p>Your extraction method should accept a <code class="language-plaintext highlighter-rouge">Train</code> object instance and return
either a single value (<code class="language-plaintext highlighter-rouge">float</code>, <code class="language-plaintext highlighter-rouge">int</code>, <code class="language-plaintext highlighter-rouge">str</code>, etc.) or a <code class="language-plaintext highlighter-rouge">DD</code> object containing
your values.</p>
<p>For the purpose of this tutorial, let’s write a simple method which extracts
the misclassification rate and the NLL from the model’s monitor:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">jobman.tools</span> <span class="kn">import</span> <span class="n">DD</span>
<span class="k">def</span> <span class="nf">results_extractor</span><span class="p">(</span><span class="n">train_obj</span><span class="p">):</span>
<span class="n">channels</span> <span class="o">=</span> <span class="n">train_obj</span><span class="p">.</span><span class="n">model</span><span class="p">.</span><span class="n">monitor</span><span class="p">.</span><span class="n">channels</span>
<span class="n">train_y_misclass</span> <span class="o">=</span> <span class="n">channels</span><span class="p">[</span><span class="s">'y_misclass'</span><span class="p">].</span><span class="n">val_record</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">train_y_nll</span> <span class="o">=</span> <span class="n">channels</span><span class="p">[</span><span class="s">'y_nll'</span><span class="p">].</span><span class="n">val_record</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">return</span> <span class="n">DD</span><span class="p">(</span><span class="n">train_y_misclass</span><span class="o">=</span><span class="n">train_y_misclass</span><span class="p">,</span> <span class="n">train_y_nll</span><span class="o">=</span><span class="n">train_y_nll</span><span class="p">)</span></code></pre></figure>
<p>Here we extract misclassification rate and NLL values at the last training
epoch from their respective channels of the model’s monitor and return a <code class="language-plaintext highlighter-rouge">DD</code>
object containing those values.</p>
<h2 id="building-the-hyperparameter-dictionary">Building the hyperparameter dictionary</h2>
<p>Let’s now focus on the last piece of the puzzle: the Jobman configuration file.
Your configuration file should contain</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">yaml_template</code>: a YAML string representing your experiment</li>
<li><code class="language-plaintext highlighter-rouge">hyper_parameters.[name]</code>: the value of the <code class="language-plaintext highlighter-rouge">[name]</code> hyperparameter.
You must have at least one such item, but you can have as many as you want.</li>
<li><code class="language-plaintext highlighter-rouge">extract_results</code>: a string written in <code class="language-plaintext highlighter-rouge">module.method</code> form representing the
result extraction method which is to be used</li>
</ul>
<p>Here’s how a configuration file could look for our experiment:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">yaml_template:=@__builtin__.open('mlp.yaml').read()
hyper_parameters.learning_rate:=@utils.log_uniform(1e-5, 1e-1)
hyper_parameters.init_momentum:=@utils.log_uniform(0.5, 1.0)
extract_results = "utils.results_extractor"</code></pre></figure>
<p>Notice how we’re using the <code class="language-plaintext highlighter-rouge">key:=@method</code> statement. This serves two purposes:</p>
<ol>
<li>We don’t have to copy the yaml file to the configuration file as a long,
hard to edit string.</li>
<li>We don’t have to hard-code hyperparameter values, which means every time
Jobman is called with this configuration file, it’ll receive different
hyperparameters.</li>
</ol>
<p>For reference, here’s <code class="language-plaintext highlighter-rouge">utils.log_uniform</code>’s implementation:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">log_uniform</span><span class="p">(</span><span class="n">low</span><span class="p">,</span> <span class="n">high</span><span class="p">):</span>
<span class="s">"""
Generates a number that's uniformly distributed in the log-space between
`low` and `high`
Parameters
----------
low : float
Lower bound of the randomly generated number
high : float
Upper bound of the randomly generated number
Returns
-------
rval : float
Random number uniformly distributed in the log-space specified by `low`
and `high`
"""</span>
<span class="n">log_low</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">low</span><span class="p">)</span>
<span class="n">log_high</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">high</span><span class="p">)</span>
<span class="n">log_rval</span> <span class="o">=</span> <span class="n">numpy</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">log_low</span><span class="p">,</span> <span class="n">log_high</span><span class="p">)</span>
<span class="n">rval</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">numpy</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">log_rval</span><span class="p">))</span>
<span class="k">return</span> <span class="n">rval</span></code></pre></figure>
<h2 id="running-the-whole-thing">Running the whole thing</h2>
<p>Here’s how you would train your model:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>jobman cmdline pylearn2.scripts.jobman.experiment.train_experiment mlp.conf</code></pre></figure>
<p>Alternatively, you can chain jobs using <code class="language-plaintext highlighter-rouge">jobdispatch</code>:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>jobdispatch <span class="nt">--local</span> <span class="nt">--repeat_jobs</span><span class="o">=</span>10 jobman cmdline <span class="se">\</span>
pylearn2.scripts.jobman.experiment.train_experiment mlp.conf</code></pre></figure>
<p><a href="https://vdumoulin.github.io/articles/pylearn2-jobman">Integrating Pylearn2 and Jobman</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on January 13, 2014.</p>https://vdumoulin.github.io/articles/first-post2014-01-09T00:00:00-05:002014-01-09T00:00:00-05:00Vincent Dumoulinhttps://vdumoulin.github.iovincent.dumoulin@umontreal.ca<h1 id="hi">Hi!</h1>
<p>Just a rubbish post used to test code highlighting features.</p>
<p>This function builds a YAML string from <code class="language-plaintext highlighter-rouge">state.yaml_template</code>, taking the
values of hyper-parameters from <code class="language-plaintext highlighter-rouge">state.hyper_parameters</code>, creates the
corresponding object and trains it (like train.py), then run the function in
<code class="language-plaintext highlighter-rouge">state.extract_results</code> on it, and store the returned values into
<code class="language-plaintext highlighter-rouge">state.results</code>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">train_experiment</span><span class="p">(</span><span class="n">state</span><span class="p">,</span> <span class="n">channel</span><span class="p">):</span>
<span class="s">"""
Train a model specified in state, and extract required results.
This function builds a YAML string from ``state.yaml_template``, taking
the values of hyper-parameters from ``state.hyper_parameters``, creates
the corresponding object and trains it (like train.py), then run the
function in ``state.extract_results`` on it, and store the returned values
into ``state.results``.
To know how to use this function, you can check the example in tester.py
(in the same directory).
"""</span>
<span class="n">yaml_template</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">yaml_template</span>
<span class="c1"># Convert nested DD into nested ydict.
</span> <span class="n">hyper_parameters</span> <span class="o">=</span> <span class="n">expand</span><span class="p">(</span><span class="n">flatten</span><span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">hyper_parameters</span><span class="p">),</span> <span class="n">dict_type</span><span class="o">=</span><span class="n">ydict</span><span class="p">)</span>
<span class="c1"># This will be the complete yaml string that should be executed
</span> <span class="n">final_yaml_str</span> <span class="o">=</span> <span class="n">yaml_template</span> <span class="o">%</span> <span class="n">hyper_parameters</span>
<span class="c1"># Instantiate an object from YAML string
</span> <span class="n">train_obj</span> <span class="o">=</span> <span class="n">pylearn2</span><span class="p">.</span><span class="n">config</span><span class="p">.</span><span class="n">yaml_parse</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">final_yaml_str</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="nb">iter</span><span class="p">(</span><span class="n">train_obj</span><span class="p">)</span>
<span class="n">iterable</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">except</span> <span class="nb">TypeError</span><span class="p">:</span>
<span class="n">iterable</span> <span class="o">=</span> <span class="bp">False</span>
<span class="k">if</span> <span class="n">iterable</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">NotImplementedError</span><span class="p">(</span>
<span class="p">(</span><span class="s">'Current implementation does not support running multiple '</span>
<span class="s">'models in one yaml string. Please change the yaml template '</span>
<span class="s">'and parameters to contain only one single model.'</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># print "Executing the model."
</span> <span class="n">train_obj</span><span class="p">.</span><span class="n">main_loop</span><span class="p">()</span>
<span class="c1"># This line will call a function defined by the user and pass train_obj
</span> <span class="c1"># to it.
</span> <span class="n">state</span><span class="p">.</span><span class="n">results</span> <span class="o">=</span> <span class="n">jobman</span><span class="p">.</span><span class="n">tools</span><span class="p">.</span><span class="n">resolve</span><span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">extract_results</span><span class="p">)(</span><span class="n">train_obj</span><span class="p">)</span>
<span class="k">return</span> <span class="n">channel</span><span class="p">.</span><span class="n">COMPLETE</span></code></pre></figure>
<p><a href="https://vdumoulin.github.io/articles/first-post">First Post</a> was originally published by Vincent Dumoulin at <a href="https://vdumoulin.github.io">Vincent Dumoulin</a> on January 09, 2014.</p>