fTheanoNNclassCORE module¶

Inheritance diagram of fTheanoNNclassCORE

class fTheanoNNclassCORE.FunctionModel¶

Bases: object

Collection of activation functions we support.

static LReLU(z, *args)¶

Leaky Rectified Linear Unit. More info.

\[\begin{split}activation= \begin{cases} z, & if z > 0\\ 0.01z, & otherwise \end{cases}\end{split}\]

Parameters:	z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation. args – array, additional parameters. For now uses for MaxOut.
Returns:	array, same size as z.

static Linear(z, *args)¶

Linear activation function. Returns input as-is.

Parameters:	z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation. args – array, additional parameters. For now uses for MaxOut.
Returns:	array, same size as z.

static MaxOut(z, *args)¶

MaxOut activation function.

Original paper: http://arxiv.org/pdf/1302.4389.pdf

\[activation_{i} = \max_{j \in [1,k]} z_{i,j}\]

Parameters:	z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation. args – [0] - the number of “lines” to emulate MaxOut in each pool. Say, in case we have here 3 - each output neuron will be emulated as 3 linear functions.
Returns:	array, size along [0] axis reduced times “lines”.

static ReLU(z, *args)¶

Rectified Linear Unit. More info.

\[activation = \max(0, z)\]

Parameters:	z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation. args – array, additional parameters. For now uses for MaxOut.
Returns:	array, same size as z.

static Sigmoid(z, *args)¶

Standard sigmoid.

\[activation = \frac{1}{1 + e^{-z}}\]

Parameters:	z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation. args – array, additional parameters. For now uses for MaxOut.
Returns:	array, same size as z.

static SoftMax(z, *args)¶

SoftMax activation function with several updates to avoid NaN.

It is useful for output layer only.

\[\begin{split}activation = \frac{1}{\sum\limits_{j=1}^k e^{\theta_j^T x^{(i)}}} \left[\begin{aligned} e&^{\theta_1^Tx^{(i)}}\\ e&^{\theta_2^Tx^{(i)}}\\ &\vdots\\ e&^{\theta_k^Tx^{(i)}} \end{aligned}\right]\end{split}\]

Some hacks for fixing float32 GPU problem:

a = T.clip(a, float(np.finfo(np.float32).tiny), float(np.finfo(np.float32).max))
a = T.clip(a, 1e-20, 1e20)

Proof links:

Links about possible approaches to fix NaN:

Parameters:	z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation. args – array, additional parameters. For now uses for MaxOut.
Returns:	array, same size as z.

static Tanh(z, *args)¶

Hyperbolic tangent.

\[activation = \frac{e^z - e^{-z}}{e^z + e^{-z}}\]

Parameters:	z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation. args – array, additional parameters. For now uses for MaxOut.
Returns:	array, same size as z.

class fTheanoNNclassCORE.LayerCNN(kernel_shape=None, stride=1, pooling=False, pooling_shape=None, optimized=False, validConvolution=True, **kwargs)¶

Bases: fTheanoNNclassCORE.LayerNN

Layer class that extends standard LayerNN class and implements CNN (convolution, not fully connected) type of network. The most useful type of network to apply for image processing beyond others NN algorithms. It implements the most brain-like way to process data (applies the same weights to small parts of input data). Read more about convolution here:

Parameters:

kernel_shape – tuple of int, kernels to use (number of kernels, colors, shape X, shape Y)
stride – int, step between windows in pixels
pooling – boolean, whether to use pooling after convolution or not
pooling_shape – int, pooling window’s shape. Stride will be the same, so only standard non-overlapping pooling is available.
optimized – boolean, whether to use highly optimized version or not. In case TRUE - it is able to run only on GPU.
validConvolution – whether to use valid (convolve fully overlapped parts) or full (convolve partially overlapped parts) convolution.
kwargs – other parameters are inherited from LayerNN.__init__()

Note

In case optimized = True there are number of restrictions you have take into account:

The number of channels must be even, or less than or equal to 3. If you want to compute the gradient, it should be divisible by 4. Valid numbers of input channels are: 1, 2, 3, 4, 8, 12, 16, ...
Filters must be square.
The number of filters must be a multiple of 16.
All minibatch sizes are supported, but the best performance is achieved when the minibatch size is a multiple of 128.
Only “valid” convolutions are supported. If you want to perform a “full” convolution, you will need to use zero-padding (more on this later).
Only works on the GPU. You cannot run your Theano code on the CPU if you use it. But still possible to train on GPU and to load & run on CPU.

compileActivation(net, layerNum)¶

compileDropout(net, R)¶

Compile necessary mask matrix for dropout regularisation.

Parameters:	net – TheanoNNclass object R – Theano’s RandomGenerator object

compilePredictActivation(net, layerNum)¶

compileSparsity(net, layerNum, num)¶

In general, method does the same as compileSparsity(). Can be used in combination with Sigmoid() only.

But concretely for CNN it was a little bit modified, to be able to calculates average activations from bc01 format.

Note

bc01 - mean: batch x color x size_X x size_Y

Parameters:	net – TheanoNNclass object layerNum – int, layer’s index num – batch size

compileWeight(net, layerNum)¶

Allocates weights to be used as shared variable in Theano. It is impossible to use MaxOut as activation function yet. In case you experience train issues - try to change init random values.

Parameters:	net – TheanoNNclass object layerNum – layer’s index.

class fTheanoNNclassCORE.LayerNN(size_in=1, size_out=1, activation=<function Sigmoid>, weightDecay=False, sparsity=False, beta=False, dropout=False, dropConnect=False, pool_size=False)¶

Bases: object

Basic layer class. By default - standard NeuralNet fully-connected network.

Parameters:

size_in – int, number of neurons on input
size_out – int, number neurons on out
activation – FunctionModel, activation function to use
weightDecay – float or False, weight decay regularization and its coefficient
sparsity – float or False, sparcity constraint. Make sense only with Sigmoid activation function
beta – float, sparse weight coefficient
dropout – float or False, dropout regularisation with defined coefficient
dropConnect – TBD
pool_size – int, Should be specified only for MaxOut activation function. Number of lines to emulate each neroun.

Returns:

layer object.

Printer()¶: Prints layer properties :return:

compileActivation(net, layerNum)¶

Compile layer’s activation taking into account dropout and specified activation function. Used during network’s training to calculate activations.

Parameters:	net – TheanoNNclass object layerNum – int, layer’s index
Returns:

compileDropout(net, R)¶

Compile necessary mask matrix for dropout regularisation.

Parameters:	net – TheanoNNclass object R – Theano’s RandomGenerator object
Returns:

compilePredictActivation(net, layerNum)¶

Compile layer’s activation taking into account dropout and specified activation function. Used to calculate predictions without training.

Parameters:	net – TheanoNNclass object layerNum – int, layer’s index
Returns:

compileSparsity(net, layerNum, num)¶

Compile necessary sparsity constraint calculations.

Average activation of hidden unit j (averaged over the training set):

\[\hat{\rho} = \frac{1}{m}\sum\limits_{i=1}^{m}\left[a_j(x^{(i)})\right]\]

Then penalty (using Kullback-Leibler):

\[penalty = \sum\limits_{j=1}^{hiddenUnits}\rho\log\frac{\rho}{\hat\rho_{j}} + (1 - \rho)\log\frac{1 - \rho}{1 - \hat\rho_{j}}\]

where \(\rho\) - is sparsity parameter. Means - the level of average activation we want to achieve.

Parameters:	net – TheanoNNclass object layerNum – int, layer’s index num – batch size
Returns:

compileWeight(net, layerNum)¶

Allocates weights to be used as shared variable in Theano

Parameters:	net – TheanoNNclass object layerNum – layer’s index.
Returns:

compileWeightDecayPenalty(net, layerNum)¶

Adds weight decay penalty to network’s error. Useful to decrease absolute weight’s values.

\[\begin{split}penalty = \frac{1}{2}\sum W_{target\>layer}^2\end{split}\]

Parameters:	net – TheanoNNclass object layerNum – int, layer’s index
Returns:

class fTheanoNNclassCORE.LayerRNN(blocks=1, peeholes=False, **kwargs)¶

Bases: fTheanoNNclassCORE.LayerNN

Layer class that extends standard LayerNN class and implements RNN (recurrent) type of network. Particularly, here we implement LSTM (Long Short-Term Memory).

You can find more info about it on:

Wiki
Original paper
More about traditional LSTM vs peepholed

Parameters:	blocks – int, number of blocks to create. Should be equivalent to size_out peeholes – boolean, whether to use peeholes or not (send Acc to input gate). kwargs – needs for compatibility.
Returns:	LayerRNN object

compileActivation(net, layerNum)¶

Compile layer’s activation taking into account dropout. It is meaningful to use Sigmoid activation function (or probably hyperbolic tang).

Activation calculated as follows:

\(Input\>activation\)
\(Input\>gate\)
\(Forget\>gate\)
\(Output\>gate\)

Note

All above where calculated in one step

\(Pi = {Input\>activation} \times {Input\>gate}\)
\(Pr = {Forget\>gate} \times {Cell\>state}\)
\({Cell\>state} = Pi + Pr\)
\(output = {Output\>gate} \times {Cell\>state}\)

Parameters:	net – TheanoNNclass object layerNum – layer’s index.

compilePredictActivation(net, layerNum)¶

Compile layer’s activation taking into account dropout and specified activation function. Used to calculate predictions without training. Uses separate Accumulator to store cell’s state independently from training.

Parameters:	net – TheanoNNclass object layerNum – layer’s index

compileWeight(net, layerNum)¶

Allocates weights to be used as shared variable in Theano.

To initialise bias we use values advised here:

Input gate: 0.0
Forget gate: -2.0
Output gate: +2.0

Parameters:	net – TheanoNNclass object layerNum – layer’s index.

class fTheanoNNclassCORE.NNsupport¶

Bases: object

static crossV(number, y, x, modelObj)¶

static errorG(errorArray, folder, plotsize=50)¶

class fTheanoNNclassCORE.OptionsStore(learnStep=0.01, rmsProp=False, mmsmin=1e-10, rProp=False, minibatch_size=1, CV_size=1)¶

Bases: object

Container for global network’s options.

Parameters:

learnStep – float, learn step to use in gradient descent or RMSprop
rmsProp – False or float, whether to use RMSprop or not. If yes - rate of RootMeanSquare. Usually 0.9
mmsmin – float, clip RootMeanSquare to avoid NaN. Default: 1e-10. Reasonable: down to 1e-20
rProp – False or float, use only for full batch. If yes - rate to increase next weight’s change.
minibatch_size – int, size of batch you use. Can’t be changed compiling.
CV_size – int, size of cross validation set. Can’t be changed compiling.

Returns:

OptionStore object.

Printer()¶

Print out to stdout current options. Useful for debug.

Returns:	nothing

class fTheanoNNclassCORE.TheanoNNclass(opt, architecture)¶

Bases: object

The most important class. Here everything combines together.

Using info defined in OptionStore and Layers - compile Network object.

Parameters:	opt – OptionStore, general network’s options. architecture – list, list of layers to build a network.
Returns:	TheanoNNclass object

getStatus()¶

modelLoader(folder)¶

modelSaver(folder)¶

paramGetter()¶

paramSetter(loaded)¶

predictCalc(X, debug=False)¶

predictCompile(layerNum=-1)¶

roll(a)¶

trainCalc(X, Y, iteration=10, debug=False, errorCollect=False)¶

Standard method to train network using labeled data.

Parameters:	X – array, data to train network on. Y – array, data’s labels. iteration – number of cycles you want network to train on current X debug – boolean, whether to print some useful info. errorCollect – boolean, whether to collect network’s error in self.errorArray field
Returns:

trainCalcExternal(model, X, Y)¶

Call this method in case you want to use external optimizer.

Parameters:	model – vector, new weights for network. X – array, data to train on. Y – array, labels for data.
Returns:	(float, array), network’s error and weight’s gradients

trainCompile()¶

Using OptionsStore, Layers - create shared variable and Theno’s function to train network. Usually, should be call only once for each network.

Returns:	link self.train with appropriate theano’s function

trainCompileExternal()¶

It is possible to use external optimisation.

In case yu decide to use something external - this method will prepare necessary functions. So after you should be able to use returned gradient and load updated weights.

Returns:

unroll()¶

weightsVisualizer(folder, size=(100, 100), color='L', second=False, name='weights')¶