fTheanoNNclassCORE module¶
-
class
fTheanoNNclassCORE.FunctionModel¶ Bases:
objectCollection of activation functions we support.
-
static
LReLU(z, *args)¶ Leaky Rectified Linear Unit. More info.
\[\begin{split}activation= \begin{cases} z, & if z > 0\\ 0.01z, & otherwise \end{cases}\end{split}\]Parameters: - z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation.
- args – array, additional parameters. For now uses for MaxOut.
Returns: array, same size as z.
-
static
Linear(z, *args)¶ Linear activation function. Returns input as-is.
Parameters: - z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation.
- args – array, additional parameters. For now uses for MaxOut.
Returns: array, same size as z.
-
static
MaxOut(z, *args)¶ MaxOut activation function.
Original paper: http://arxiv.org/pdf/1302.4389.pdf
\[activation_{i} = \max_{j \in [1,k]} z_{i,j}\]Parameters: - z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation.
- args – [0] - the number of “lines” to emulate MaxOut in each pool. Say, in case we have here 3 - each output neuron will be emulated as 3 linear functions.
Returns: array, size along [0] axis reduced times “lines”.
-
static
ReLU(z, *args)¶ Rectified Linear Unit. More info.
\[activation = \max(0, z)\]Parameters: - z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation.
- args – array, additional parameters. For now uses for MaxOut.
Returns: array, same size as z.
-
static
Sigmoid(z, *args)¶ Standard sigmoid.
\[activation = \frac{1}{1 + e^{-z}}\]Parameters: - z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation.
- args – array, additional parameters. For now uses for MaxOut.
Returns: array, same size as z.
-
static
SoftMax(z, *args)¶ SoftMax activation function with several updates to avoid NaN.
It is useful for output layer only.
\[\begin{split}activation = \frac{1}{\sum\limits_{j=1}^k e^{\theta_j^T x^{(i)}}} \left[\begin{aligned} e&^{\theta_1^Tx^{(i)}}\\ e&^{\theta_2^Tx^{(i)}}\\ &\vdots\\ e&^{\theta_k^Tx^{(i)}} \end{aligned}\right]\end{split}\]Some hacks for fixing float32 GPU problem:
a = T.clip(a, float(np.finfo(np.float32).tiny), float(np.finfo(np.float32).max)) a = T.clip(a, 1e-20, 1e20)
Proof links:
- http://www.velocityreviews.com/forums/t714189-max-min-smallest-float-value-on-python-2-5-a.html
- http://docs.scipy.org/doc/numpy/reference/generated/numpy.finfo.html
Links about possible approaches to fix NaN:
- http://blog.csdn.net/xceman1997/article/details/9974569
- https://github.com/Theano/Theano/issues/1563
Parameters: - z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation.
- args – array, additional parameters. For now uses for MaxOut.
Returns: array, same size as z.
-
static
Tanh(z, *args)¶ Hyperbolic tangent.
\[activation = \frac{e^z - e^{-z}}{e^z + e^{-z}}\]Parameters: - z – array, raw activation, usually calculated as \(z=W^Tx\) that will be used for further calculation.
- args – array, additional parameters. For now uses for MaxOut.
Returns: array, same size as z.
-
static
-
class
fTheanoNNclassCORE.LayerCNN(kernel_shape=None, stride=1, pooling=False, pooling_shape=None, optimized=False, validConvolution=True, **kwargs)¶ Bases:
fTheanoNNclassCORE.LayerNNLayer class that extends standard LayerNN class and implements CNN (convolution, not fully connected) type of network. The most useful type of network to apply for image processing beyond others NN algorithms. It implements the most brain-like way to process data (applies the same weights to small parts of input data). Read more about convolution here:
- http://deeplearning.net/tutorial/lenet.html
- http://en.wikipedia.org/wiki/Convolutional_neural_network
Parameters: - kernel_shape – tuple of int, kernels to use (number of kernels, colors, shape X, shape Y)
- stride – int, step between windows in pixels
- pooling – boolean, whether to use pooling after convolution or not
- pooling_shape – int, pooling window’s shape. Stride will be the same, so only standard non-overlapping pooling is available.
- optimized – boolean, whether to use highly optimized version or not. In case TRUE - it is able to run only on GPU.
- validConvolution – whether to use valid (convolve fully overlapped parts) or full (convolve partially overlapped parts) convolution.
- kwargs – other parameters are inherited from LayerNN.__init__()
Note
In case
optimized = Truethere are number of restrictions you have take into account:- The number of channels must be even, or less than or equal to 3. If you want to compute the gradient, it should be divisible by 4. Valid numbers of input channels are: 1, 2, 3, 4, 8, 12, 16, ...
- Filters must be square.
- The number of filters must be a multiple of 16.
- All minibatch sizes are supported, but the best performance is achieved when the minibatch size is a multiple of 128.
- Only “valid” convolutions are supported. If you want to perform a “full” convolution, you will need to use zero-padding (more on this later).
- Only works on the GPU. You cannot run your Theano code on the CPU if you use it. But still possible to train on GPU and to load & run on CPU.
-
compileActivation(net, layerNum)¶
-
compileDropout(net, R)¶ Compile necessary mask matrix for dropout regularisation.
Parameters: - net – TheanoNNclass object
- R – Theano’s RandomGenerator object
-
compilePredictActivation(net, layerNum)¶
-
compileSparsity(net, layerNum, num)¶ In general, method does the same as
compileSparsity(). Can be used in combination withSigmoid()only.But concretely for CNN it was a little bit modified, to be able to calculates average activations from bc01 format.
Note
bc01 - mean: batch x color x size_X x size_Y
Parameters: - net – TheanoNNclass object
- layerNum – int, layer’s index
- num – batch size
-
compileWeight(net, layerNum)¶ Allocates weights to be used as shared variable in Theano. It is impossible to use MaxOut as activation function yet. In case you experience train issues - try to change init random values.
Parameters: - net – TheanoNNclass object
- layerNum – layer’s index.
-
class
fTheanoNNclassCORE.LayerNN(size_in=1, size_out=1, activation=<function Sigmoid>, weightDecay=False, sparsity=False, beta=False, dropout=False, dropConnect=False, pool_size=False)¶ Bases:
objectBasic layer class. By default - standard NeuralNet fully-connected network.
Parameters: - size_in – int, number of neurons on input
- size_out – int, number neurons on out
- activation – FunctionModel, activation function to use
- weightDecay – float or False, weight decay regularization and its coefficient
- sparsity – float or False, sparcity constraint. Make sense only with Sigmoid activation function
- beta – float, sparse weight coefficient
- dropout – float or False, dropout regularisation with defined coefficient
- dropConnect – TBD
- pool_size – int, Should be specified only for MaxOut activation function. Number of lines to emulate each neroun.
Returns: layer object.
-
Printer()¶ Prints layer properties :return:
-
compileActivation(net, layerNum)¶ Compile layer’s activation taking into account dropout and specified activation function. Used during network’s training to calculate activations.
Parameters: - net – TheanoNNclass object
- layerNum – int, layer’s index
Returns:
-
compileDropout(net, R)¶ Compile necessary mask matrix for dropout regularisation.
Parameters: - net – TheanoNNclass object
- R – Theano’s RandomGenerator object
Returns:
-
compilePredictActivation(net, layerNum)¶ Compile layer’s activation taking into account dropout and specified activation function. Used to calculate predictions without training.
Parameters: - net – TheanoNNclass object
- layerNum – int, layer’s index
Returns:
-
compileSparsity(net, layerNum, num)¶ Compile necessary sparsity constraint calculations.
Average activation of hidden unit j (averaged over the training set):
\[\hat{\rho} = \frac{1}{m}\sum\limits_{i=1}^{m}\left[a_j(x^{(i)})\right]\]Then penalty (using Kullback-Leibler):
\[penalty = \sum\limits_{j=1}^{hiddenUnits}\rho\log\frac{\rho}{\hat\rho_{j}} + (1 - \rho)\log\frac{1 - \rho}{1 - \hat\rho_{j}}\]where \(\rho\) - is sparsity parameter. Means - the level of average activation we want to achieve.
Parameters: - net – TheanoNNclass object
- layerNum – int, layer’s index
- num – batch size
Returns:
-
compileWeight(net, layerNum)¶ Allocates weights to be used as shared variable in Theano
Parameters: - net – TheanoNNclass object
- layerNum – layer’s index.
Returns:
-
compileWeightDecayPenalty(net, layerNum)¶ Adds weight decay penalty to network’s error. Useful to decrease absolute weight’s values.
\[\begin{split}penalty = \frac{1}{2}\sum W_{target\>layer}^2\end{split}\]Parameters: - net – TheanoNNclass object
- layerNum – int, layer’s index
Returns:
-
class
fTheanoNNclassCORE.LayerRNN(blocks=1, peeholes=False, **kwargs)¶ Bases:
fTheanoNNclassCORE.LayerNNLayer class that extends standard LayerNN class and implements RNN (recurrent) type of network. Particularly, here we implement LSTM (Long Short-Term Memory).
You can find more info about it on:
- Wiki
- Original paper
- More about traditional LSTM vs peepholed
Parameters: - blocks – int, number of blocks to create. Should be equivalent to size_out
- peeholes – boolean, whether to use peeholes or not (send Acc to input gate).
- kwargs – needs for compatibility.
Returns: LayerRNN object
-
compileActivation(net, layerNum)¶ Compile layer’s activation taking into account dropout. It is meaningful to use Sigmoid activation function (or probably hyperbolic tang).
Activation calculated as follows:
- \(Input\>activation\)
- \(Input\>gate\)
- \(Forget\>gate\)
- \(Output\>gate\)
Note
All above where calculated in one step
- \(Pi = {Input\>activation} \times {Input\>gate}\)
- \(Pr = {Forget\>gate} \times {Cell\>state}\)
- \({Cell\>state} = Pi + Pr\)
- \(output = {Output\>gate} \times {Cell\>state}\)
Parameters: - net – TheanoNNclass object
- layerNum – layer’s index.
-
compilePredictActivation(net, layerNum)¶ Compile layer’s activation taking into account dropout and specified activation function. Used to calculate predictions without training. Uses separate Accumulator to store cell’s state independently from training.
Parameters: - net – TheanoNNclass object
- layerNum – layer’s index
-
class
fTheanoNNclassCORE.NNsupport¶ Bases:
object-
static
crossV(number, y, x, modelObj)¶
-
static
errorG(errorArray, folder, plotsize=50)¶
-
static
-
class
fTheanoNNclassCORE.OptionsStore(learnStep=0.01, rmsProp=False, mmsmin=1e-10, rProp=False, minibatch_size=1, CV_size=1)¶ Bases:
objectContainer for global network’s options.
Parameters: - learnStep – float, learn step to use in gradient descent or RMSprop
- rmsProp – False or float, whether to use RMSprop or not. If yes - rate of RootMeanSquare. Usually 0.9
- mmsmin – float, clip RootMeanSquare to avoid NaN. Default: 1e-10. Reasonable: down to 1e-20
- rProp – False or float, use only for full batch. If yes - rate to increase next weight’s change.
- minibatch_size – int, size of batch you use. Can’t be changed compiling.
- CV_size – int, size of cross validation set. Can’t be changed compiling.
Returns: OptionStore object.
-
Printer()¶ Print out to stdout current options. Useful for debug.
Returns: nothing
-
class
fTheanoNNclassCORE.TheanoNNclass(opt, architecture)¶ Bases:
objectThe most important class. Here everything combines together.
Using info defined in OptionStore and Layers - compile Network object.
Parameters: - opt – OptionStore, general network’s options.
- architecture – list, list of layers to build a network.
Returns: TheanoNNclass object
-
getStatus()¶
-
modelLoader(folder)¶
-
modelSaver(folder)¶
-
paramGetter()¶
-
paramSetter(loaded)¶
-
predictCalc(X, debug=False)¶
-
predictCompile(layerNum=-1)¶
-
roll(a)¶
-
trainCalc(X, Y, iteration=10, debug=False, errorCollect=False)¶ Standard method to train network using labeled data.
Parameters: - X – array, data to train network on.
- Y – array, data’s labels.
- iteration – number of cycles you want network to train on current X
- debug – boolean, whether to print some useful info.
- errorCollect – boolean, whether to collect network’s error in self.errorArray field
Returns:
-
trainCalcExternal(model, X, Y)¶ Call this method in case you want to use external optimizer.
Parameters: - model – vector, new weights for network.
- X – array, data to train on.
- Y – array, labels for data.
Returns: (float, array), network’s error and weight’s gradients
-
trainCompile()¶ Using OptionsStore, Layers - create shared variable and Theno’s function to train network. Usually, should be call only once for each network.
Returns: link self.train with appropriate theano’s function
-
trainCompileExternal()¶ It is possible to use external optimisation.
In case yu decide to use something external - this method will prepare necessary functions. So after you should be able to use returned gradient and load updated weights.
Returns:
-
unroll()¶
-
weightsVisualizer(folder, size=(100, 100), color='L', second=False, name='weights')¶