# Deep Learning Practicals

Forewords This page is written in restructured text format and generated with the script rst2html.py using the docutils package. The stylesheet main.css is adapted from the one used in the book from python to numpy.

Latest revision - novembre 2017

# 1.   First steps in Keras : classifying handwritten digits (MNIST)

Back to the top

## 1.1.   Objectives

In this practical, we will make our first steps with Keras and train our first models for classifying the handwritten digits of the MNIST dataset. It should be noted that this dataset is no more considered as a challenging problem but for an introduction with Keras and the first trained architectures, it does the job.

The models you will train are :
• a linear classifier
• a fully connected neural network with two hidden layers
• a vanilla convolutional neural network (i.e. a LeNet like convnet)
• some fancier architectures (e.g. ConvNets without fully connected layers)

The point here is to introduce the various syntactic elements of Keras to:

• load the datasets,
• define the architecture, loss, optimizer,
• save/load a model and evaluate its performances
• monitor the training progress by interfacing with tensorboard

## 1.2.   A Linear classifier

Before training deep neural networks, it is good to get an idea of the performances of a simple linear classifier. So we will define and train a linear classifier and see together how this is written in Python/Keras.

Important: Below, we see together step by step how to set up our training script. While reading the following lines, edit a file named train_mnist_linear.py that you will fill. We also see the modules to be imported only when these are required but obviously, it is clearer to put these imports at the beginning of your scripts. So the following python codes should not be strictly copy-pasted on the fly.

### 1.2.1.   Loading and basic preprocessing the dataset

The first step is to load the dataset as numpy arrays. Functions are already provided by Keras to import some datasets. The MNIST dataset is made of gray scale images, of size $$28 \times 28$$, with values in the range $$[0; 255]$$. The training set consists in $$60000$$ images and the test set consists in $$10000$$ images. Every image represents an handwritten digit in $$[0, 9]$$. Below, we show the 10 first samples of the training set.

To import the MNIST dataset in Keras, you can do the following:

from keras.datasets import mnist
from keras.utils import to_categorical

(X_train, y_train), (X_test, y_test) = mnist.load_data()


X_train and y_train are numpy arrays of respective shape $$(60000, 28, 28)$$ and $$(60000,)$$. X_test and y_test are numpy arrays of respective shape $$(10000, 28, 28)$$ and $$(10000, )$$. X_train and X_test contain the images and y_train and y_test contain the labels. For a linear classifier (or in fact Dense Layers as well as we shall see in a moment and this is different from convolutional layers we will see later on) expect to get vectors in their input and not images, i.e. 1-dimensional and not 2-dimensional objects. So we need to reshape the loaded numpy arrays

num_train  = X_train.shape[0]
num_test   = X_test.shape[0]

img_height = X_train.shape[1]
img_width  = X_train.shape[2]
X_train = X_train.reshape((num_train, img_width * img_height))
X_test  = X_test.reshape((num_test, img_width * img_height))


Now, the shape of X_train and X_test are $$(60000, 784)$$ and $$(10000, 784)$$ respectively.

The networks we are going to train output, for each input image, a probability distribution over the labels. We therefore need to convert the labels into their one-hot encoding. Keras provides the to_categorical function to do that.

y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)


Now, the shape of y_train and y_test is $$(60000, 10)$$ and $$(10000, 10)$$ respectively and the first 10 entries of y_train are:

[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
[ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]


For now, this is all we do for loading the dataset. Preprocessing the input actually influences the performances of the training but we come back to this later in the section Normalizing the input.

### 1.2.2.   Building the network

We consider a linear classifier, i.e. we perform logistic regression. As a reminder, in logistic regression, given an input image $$x_i \in \mathbb{R}^{28\times 28}$$, we compute scores for each class as $$w_k^T x_i$$ (in this notation, the input is supposed to be extended with a constant dimension equal to 1 to take into account the bias), that we pass through the softmax transfer function to get probabilities over the classes :

\begin{equation*} P(y=k / x_i) = \frac{e^{w_k^T x_i}}{\sum_{j=0}^{9} e^{w_j^T x_i}} \end{equation*}

To define this model with Keras, we need an Input layer, a Dense layer and an Activation layer

Note

In keras, a model can be specified with the Sequential or Functional API. We here make use of the functional API which is more flexible than the Sequential API. Also, while one could incorporate the activation within the dense layers, we will be using linear dense layers so that we can visualize what is going on after each operation.

In Python, this can be written as :

from keras.layers import Input, Dense, Activation
from keras.models import Model

num_classes = 10
xi      = Input(shape=(img_height*img_width,))
xo      = Dense(num_classes)(xi)
yo      = Activation('softmax')(xo)
model   = Model(inputs=[xi], outputs=[yo])

model.summary()


In the input layer, we just specify the dimension of a single sample, here 784 pixels per image. The dense layer has num_classes=10 units. The output of the dense layer feeds the softmax transfer function. By default, in Keras, a dense layer is linear and has the bias so that we do not need to extend the input to include the constant dimension. Therefore the outputs of yo are all in the range $$[0, 1]$$ and sum up to 1. Finally, we define our model specifying the input and output layers. The call to model.summary() will display a summary of the architecture in the terminal.

### 1.2.3.   Compiling and training

The next step is, in the terminology of Keras, to compile the model by providing the loss function to be minimized, the optimizer and the metrics to monitor. For this classification problem, an appropriate loss is the crossentropy loss. In Keras, among all the Losses, we will use the categorical_crossentropy loss. For the optimizer, several Optimizers are available and we will use adam. For the metrics, you can use some predefined metrics or define your own. Here, we will use the accuracy which makes sense because the dataset is balanced.

In Python, this can be written as

model.compile(loss='categorical_crossentropy', optimizer='adam',  metrics=['accuracy'])


Note

Metrics, losses and optimizers can be specified in two ways in the call of the compile function. It can be either specified by a string or by passing a function for the loss, a list of functions for the metrics and an Optimizer object for the optimizer.

We are now ready for training our linear classifier, by calling the fit function.

model.fit(X_train, y_train,
batch_size=128,
epochs=20,
verbose=1,
validation_split=0.1)

score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])


We here used $$10 \%$$ of the training set for validation purpose. This is based on the validation loss that we should select our best model (why should we do it on the validation loss rather than the validation accuracy?). At the end, we evaluate the performances on the test set.

You are now ready for executing your first training. You need first to log on a GPU node as described in section Using the GPU cluster of CentraleSupelec

# First you log to the cluster
...
# And then :
python3 train_mnist_linear.py


This is your first trained classifier with Keras. Depending on how lucky you are, you may reach a Test accuracy of something between $$50\%$$ and $$85\%$$ . If you repeat the experiment, you should end up with various accuracy; The training does not appear to be very consistent.

### 1.2.4.   Callbacks for saving the best model and monitoring the training

So far, we just get some information within the terminal but Keras allows you to define Callbacks. We will define two callbacks :

• a TensorBoard callback which allows to monitor the training progress with tensorboard
• a ModelCheckpoint callback which saves the best model with respect to a provided metric

For both these callbacks, we need to specify a path to which data will be logged. I propose you the following utilitary function which generates a unique path:

import os
...
def generate_unique_logpath(logdir, raw_run_name):
i = 0
while(True):
run_name = raw_run_name + "-" + str(i)
log_path = os.path.join(logdir, run_name)
if not os.path.isdir(log_path):
return log_path
i = i + 1


For defining a TensorBoard callback, you need to add its import, instanciate it by specifying a directory in which the callback will log the progress and then modify the call to fit to specify the callback.

from keras.callbacks import TensorBoard
...
run_name = "linear"
logpath = generate_unique_logpath("./logs_linear", run_name)
tbcb = TensorBoard(log_dir=logpath)
...
model.fit(X_train, y_train,
batch_size=128,
epochs=20,
verbose=1,
validation_split=0.1,
callbacks=[tbcb])


Once this is done, you have to start tensorboard on the GPU and run port_forward.sh to get locally access to the remote tensorboard.

[In one terminal]
mymachine:~:mylogin$./log.sh mylogin {0, 1} sh11:~:mylogin$ tensorboard --logdir ./logs_linear
Starting TensorBoard b'47' at http://0.0.0.0:6006
(Press CTRL+C to quit)

[In a second terminal]
mymachine:~:mylogin$./port_forward.sh mylogin {0, 1}  And then start a browser and log to http://localhost:6006 . Once this is done, you will be able to monitor your metrics in the browser while the training are running. The second callback we define is a ModelCheckpoint callback which will save the best model based on a provided metric, e.g. the validation loss. from keras.callbacks import ModelCheckpoint ... run_name = "linear" logpath = generate_unique_logpath("./logs_linear", run_name) checkpoint_filepath = os.path.join(logpath, "best_model.h5") checkpoint_cb = ModelCheckpoint(checkpoint_filepath, save_best_only=True) model.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split=0.1, callbacks=[tbcb, checkpoint_cb])  You can now run several experiments, monitor them and get a copy of the best models. A handy bash command to run several experiments is given below : for iter in$(seq 1 10); do echo ">>>> Run $iter" && python3 train_mnist_linear.py ; done;  Logistic regression without normalization of the input Two metrics are displayed for several runs; on the left the training accuracy and on the right the validation accuracy You should reach a validation and test accuracy between $$55\%$$ and $$85\%$$. The curves above seem to suggest that there might be several local minima...but do not be misleaded, the optimization problem is convex so the results above are just indications that we are not solving our optimization problem the right way. ### 1.2.5. Loading a model In the previous paragraph, we regularly saved the best model (with the ModelCheckpoint callback) with respect to the validation loss. You may want to load this best model, for example for estimating its test set performance at the end of your training script. At the time of writing this practical, there is an issue in the "h5" file saved by Keras and some of its elements must be removed in order to be able to reload the model : import h5py from keras.models import load_model ... with h5py.File(checkpoint_filepath, 'a') as f: if 'optimizer_weights' in f.keys(): del f['optimizer_weights'] model = load_model(checkpoint_filepath) score = model.evaluate(X_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1])  ### 1.2.6. Normalizing the input So far, we used the raw data, i.e. images with pixels in the range $$[0, 255]$$. It is usually a good idea to normalize the input because it allows to make training faster (because the loss becomes more circular symmetric) and also allows to use a consistent learning rate for all the parameters in the architecture. There are various ways to normalize the data and various ways to translate it into Keras. The point of normalization is to equalize the relative importance of the dimensions of the input. One normalization is min-max scaling just scaling the input by a constant factor, e.g. given an image $$I$$, you feed the network with $$I/255. - 0.5$$. Another normalization is standardization. Here, you compute the mean of the training vectors and their variance and normalize every input vector (even the test set) with these data. Given a set of training images $$X_i \in \mathbb{R}^{784}, i \in [0, N-1]$$, and a vector $$X \in \mathbb{R}^{784}$$ you feed the network with $$\hat{X} \in \mathbb{R}^{784}$$ given by \begin{equation*} X_\mu = \frac{1}{N} \sum_{i=0}^{N-1} X_i \end{equation*} \begin{equation*} X_\sigma = \sqrt{\frac{1}{N} \sum_{i=0}^{N-1} (X_i - X_\mu)^T (X_i - X_\mu)} + 10^{-5} \end{equation*} \begin{equation*} \hat{X} = (X - X_\mu)/X_\sigma \end{equation*} How do we introduce normalization in a Keras model ? One way is to create a dataset that is normalized and use this dataset for training and testing. Another possibility is to embed normalization in the network by introducing a Lambda layer right after the Input layer. For example, introducing standardization in our linear model could be done the following way : from keras.layers import Lambda ... xi = Input(shape=input_shape, name="input") mean = X_train.mean(axis=0) std = X_train.std(axis=0) + 1e-5 xl = Lambda(lambda image, mu, std: (image - mu) / std, arguments={'mu': mean, 'std': std})(xi) xo = Dense(num_classes, name="y")(xl) yo = Activation('softmax', name="y_act")(xo) model = Model(inputs=[xi], outputs=[yo]) return model  Logistic regression with input standardization Two metrics are displayed for several runs; on the left the training accuracy and on the right the validation accuracy You should reach a validation accuracy of around $$93.6\%$$ and a test accuracy of around $$92.4\%$$ ## 1.3. A Fully connected 2 hidden layers classifier ### 1.3.1. Basics Let us change the network to build a 2 hidden layers perceptron. This is simply about adding dense layers with appropriate activations in between the input and the output layer. Basically, the only thing you need to change compared to the linear model is when you build up the model. A simple 2 layers MLP would be defined as : xi = Input(shape=input_shape) // Normalization layer is skipped for brevety x = Dense(nhidden1)(xl) x = Activation('relu')(x) x = Dense(nhidden2)(x) x = Activation('relu')(x) x = Dense(num_classes)(x) y = Activation('softmax')(x)  where nhidden1 and nhidden2 are the respective size of the first and second hidden layers. We here used a ReLu activation function but you are free to experiment with other activation functions, e.g. PReLU, ELU About the minor adaptations, you may want to save the logs in a different root directory than for the linear experiment or keep them in the same directory for comparison. You may also want to change the run_name variable so that it contains the size of the hidden layers to easily distinguish the architectures in the tensorboard. For example, training a Input-256(Relu)-256(Relu)-10(Softmax) network (270.000 trainable parameters), the inputs being standardized, the training accuracy gets around $$99.6\%$$, the validation accuracy around $$97.6\%$$ and evaluating it on the test set, we get around $$97.37\%$$. This model is slightly overfitting. We can try to improve the generalization performance by introducing some regularization, which is addressed in the next paragraph. ### 1.3.2. Regularization There are various Regularizers provided by Keras. Some typical regularizers are L1/L2 penalties or also Dropout layers. L2 regularization (or weight decay) is usually applied to the kernel only and not the bias. It adds a term in the loss function to be minimized of the form $$\lambda \sum_i w_i^2$$. The parameter $$\lambda$$ has to be experimentally determined (by monitoring the performances on the validation set) and usually quite small, e.g. values around $$10^{-5}$$ or so. If you wish to experiment with L2 penalty, you can directly specify it when you create the Dense layer : from keras import regularizers ... # Creating a dense layer with L2 penalty on the weights (not biases) # l2_reg is a floating point value to be determined x = Dense(hidden1, kernel_regularizer=regularizers.l2(l2_reg))(x)  Another more recently introduced regularization technique is called Dropout [Srivastava2014]. It consists in setting to 0 the activations of a certain fraction of the units in a layer. In their original paper, the authors of dropout suggest for example Dropping out 20% of the input units and 50% of the hidden units was often found to be optimal. A dropout mask is generated for every training sample. At test time, an ensemble of dropped out networks are combined to compute the output (see also http://cs231n.github.io/neural-networks-2/#reg). In Keras, we simply need to introduce Dropout layers specifying the rate at which to drop units. Below we create a Dense(relu) layer followed by dropout where $$50\%$$ of the output are set to 0. Learning a neural network with dropout is usually slower than without dropout so that you may need to consider increasing the number of epochs. from keras.layers import Dropout ... x = Dense(hidden1)(x) x = Activation('relu')(x) if(args.dropout): x = Dropout(0.5)(x)   [Srivastava2014] N. Srivastava and G. Hinton and A. Krizhevsky and I. Sutskever and R. Salakhutdinov Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15, (2014) ## 1.4. A vanilla convolutional neural network The MultiLayer Perceptron does not take any benefit from the intrinsic structure of the input space, here images. We propose in this paragraph to explore the performances of Convolutional Neural Networks (CNN) which exploit that structure. Our first experiment with CNN will consider a vanilla CNN, i.e. a stack of conv-relu-maxpooling layers followed by some dense layers. In order to write our script from training CNN, compared to the script for training a linear or MLP model, we need to change the input_shape and also introduce new layers: Convolutional layers, Pooling layers and a Flatten layer. The tensors for convolutional layers are 4D which include batch_size, image width, image height, number of channels. The ordering is framework dependent. It depends on the parameter image_data_format. It can be : • channels_last (tensorflow convention), in which case, the tensors are (batch_size, image height, image width, number of channels) • channels_first (theano convention), in which case, the tensors are (batch_size, number of channels, image height, image width) To check which ordering is currently used, you can run : python3 -c 'import keras; print(keras.backend.image_data_format())'  I will assume that the output is "channels_last", otherwise you might change your "~/.keras/keras.json" file (see https://keras.io/backend/#kerasjson-details) or adapt the code below. Therefore, in your python script, the input_shape must be set to: input_shape = (img_rows, img_cols, 1)  The next step is to define our model. We here consider stacking Conv-Relu-MaxPool layers. One block with 64 5x5 stride 1 filters and 2x2 stride 2 max pooling would be defined with the following syntax : from keras.layers.convolutional import Conv2D from keras.layers.pooling import MaxPooling2D ... x = Conv2D(filters=64, kernel_size=5, strides=1 padding='same')(x) x = Activation('relu')(x) x = MaxPooling2D(pool_size=2, strides=2)(x)  The "padding='same'" parameter induces that the convolutional layers will not decrease the size of the representation. We let this as the job of the max-pooling operation. Indeed, the max pooling layer has a stride 2 which effectively downscale the representation by a size of 2. How do you set up the architecture ? the size of the filters ? the number of blocks ? the stride ? the padding ? well, this is all the magic. Actually, we begin to see a small part of the large number of degrees of freedom on which we can play to define a convolutional neural network. The last thing we need to speek about is the Flatten Layer. Usually (but this is not always the case), there are some final fully connected (dense) layers at the end of the architecture. When you go from the Conv/MaxPooling layers to the final fully connected layers, you need to flatten your feature maps. This means converting the 4D Tensors to 2D Tensors. For example, the code below illustrates the connection between some convolutional/max-pooling layers to the output layer with a 10-class classification: x = Conv2D(filters=64, kernel_size=5, strides=1 padding='same')(x) x = Activation('relu')(x) x = MaxPooling2D(pool_size=2, strides=2)(x) x = Flatten()(x) y = Dense(10, activation='softmax')(x)  Now, we should have introduced all the required blocks. As a first try, I propose you to code a network with : • The input layer • The standardization Lambda layer • 3 consecutive blocks with Conv(5x5, strides=1, padding=same)-Relu-MaxPooling(2x2, strides=2). Take 16 filters for the first Conv layer, 32 filters for the second and 64 for the third • Two dense layers of size 128 and 64, with ReLu activations • One dense layer with 10 units and a softmax activation Training this architecture, you should end up with almost $$100\%$$ training accuracy, $$99.2\%$$ of validation accuracy and around $$98.9\%$$ of test accuracy. Introducing Dropout (after the standardizing layer and before the dense layers), the test accuracy should raise up to around $$99.2\%$$. This means that 80 images out of the 10000 in the test set are misclassified. ## 1.5. Small kernels, no fully connected layers, Dataset Augmentation, Model averaging ### 1.5.1. Architecture Recently, it is suggested to use small convolutional kernels (typically 3x3, and sometimes 5x5). The rationale is that using two stacked 3x3 convolutional layers gives you receptive field size of 5x5 with less parameters (and more non linearity). This is for example the guideline used in VGG : use mostly 3x3 kernels, stack two of them, followed by a maxpool and then double the number of filters. The number of filters is usually increased as we go deeper in the network (because we expect the low level layers to extract basic features that are combined in the deeper layers). Finally, in [Lin2014], it is also suggested that we can completly remove the final fully connected layers and replace them by GlobalAveragePooling layers. It appears that removing fully connected layers, the network is less likely to overfit and you end up with much less parameters for a network of a given depth (this is the fully connected layers that usually contain most of your parameters). Therefore, I suggest you to give a try to the following architecture : • InputLayer • Standardizing lambda layer • 16C3s1-BN-Relu-16C3s1-BN-Relu - MaxPool2s2 • 32C3s1-BN-Relu-32C3s1-BN-Relu - MaxPool2s2 • 64C3s1-BN-Relu-64C3s1-BN-Relu - GlobalAverage • Dense(10), Softmax where 16C3s1 denotes a convolutional layer with 16 kernels, of size 3x3, with stride 1, with zero padding to keep the same size for the input and output. BN is a BatchNormalization layer. MaxPool2s2 is a max-pooling layer with receptive field size 2x2 and stride 2. GlobalAverage is an averaging layer computing an average over a whole feature map. This should bring you with a test accuracy around $$99.2\%$$ with 72.890 trainable parameters. ### 1.5.2. Dataset Augmention and model averaging One process which can bring you improvements is Dataset Augmentation. The basic idea is to apply transformations to your input images that must keep invariant your label. For example, slightly rotating, zooming or shearing an image of, say, a 5 is still an image of a 5 as shown below : Now, the idea is to produce a stream (actually infinite if you allow continuous perturbations) of training samples generated from your finite set of training samples. With Keras, you can augment your image datasets using an ImageDataGenerator. Here is a snippet to create the generator that produced the images above : from keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator(shear_range=0.3, zoom_range=0.1, rotation_range=10.)  As a note, for some of the transformations ImageDataGenerator performs, it is necessary to invoke the fit method (e.g. featurewise_std_normalization). Now, in order to make use of the generator, we need to slightly adjust our split into training/validation sets, as well as our optimization of the model which we did so far calling the model.fit method. def split(X, y, test_size): idx = np.arange(X.shape[0]) np.random.shuffle(idx) nb_test = int(test_size * X.shape[0]) return X[nb_test:,:, :, :], y[nb_test:],\ X[:nb_test, :, :, :], y[:nb_test] X_train = X_train.reshape(num_train, img_rows, img_cols, 1) X_train, y_train, X_val, y_val = split(X_train, y_train, test_size=0.1) y_train = to_categorical(y_train, num_classes) y_val = to_categorical(y_val, num_classes) datagen = ImageDataGenerator(....) train_flow = datagen.flow(X_train, y_train, batch_size=128) model.fit_generator(train_flow, steps_per_epoch=X_train.shape[0]/128, validation_data=(X_val, y_val), .....)  Fitting the same architecture as before but with dataset augmentation, you should reach an accuracy around $$99.5\%$$ Now, as a final step in our beginner tutorial on keras, you can train several models and average their probability predictions over the test set. An average of 4 models might eventually lead you to a loss of 0.0105 and an accuracy of $$99.68\%$$.  [Lin2014] M. Lin, Q. Chen, S. Yan Network In Network, International Conference on Learning Representations(2014) # 2. A more ambitious image classification dataset : CIFAR-100 Back to the top Table of content ## 2.1. Objectives We now turn to a more difficult problem of classifying RBG images belonging to one of 100 classes with the CIFAR-100 dataset. The CIFAR-100 dataset consists of 60000 32x32 colour images in 100 classes, with 600 images per class. There are 50000 training images and 10000 test images. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). To give you an idea of the coarse labels, you find fruits, fish, aquatic mammals, vehicles, ... and the fine labels are for example seal, whale, orchids, bicycle, bus, ... Keras provides functions to automatically get the CIFAR-100 dataset. Classical dataset augmentation in CIFAR-100 include : • feature wise standardization • horizontal flip • zero padding of 4 pixels on each side, with random crops of 32x32. For the last augmentation, you can make use of width_shift_range, height_shift_range, fill_mode="constant" and cval=0. I now propose you a list of recent papers published on arxiv and I propose you to try reimplementing their architecture and training setup : The following papers are trickier to implement : If you wish to get an idea of the state of the art in 2015 on CIFAR-100, I invite you to visite the classification scores website. # 3. Using the GPU cluster of CentraleSupelec Back to the top Table of content Allocation of the GPU machines are handled by a resource manager called OAR. It can be annoying to remember the command lines to reserve a machine and log to it. We therefore provide the scripts : After getting these scripts, please make them executables : mymachine:~:mylogin$ chmod u+x book.sh kill_reservation.sh log.sh port_forward.sh


These scripts help you to make a reservation and log to the reserved machine. These scripts must be in the same directory. The book.sh script handles only one reservation, i.e. running it two times will simply kill the first reservation.

## 3.1.   From within the campus

Get the scripts and run book_and_log.sh as below. We also show a typical output from the execution of the script.

mymachine:~:mylogin$./book.sh mylogin 0 Booking a node Reservation successfull Booking requested : OAR_JOB_ID = 99785 Waiting for the reservation to be running, might last few seconds The reservation is not yet running The reservation is not yet running The reservation is not yet running The reservation is not yet running [...] The reservation is not yet running The reservation is not yet running The reservation is running The reservation is running mymachine:~:mylogin$


If the reservation is successfull, you can then log to the booked GPU :

mymachine:~:mylogin$./log.sh mylogin 0 The file job_id exists. I am checking the reservation is still valid The reservation is still running Logging to the booked node Connect to OAR job 99785 via the node sh11 sh11:~:mylogin$


You end up with a terminal logged on a the GPU machine where you can execute your code. Your reservation will run for 24 hours. If you need more time, you may need to tweak the bash script a little bit.

You can log any terminal you wish to the booked machine.

To get access to tensorboard, you need to log to the GPU, start tensorboard and activate port forwarding :

[ In a first terminal ]
mymachine:~:mylogin$./log.sh mylogin 0 ... sh11:~:mylogin$ tensorboard --logdir path_to_the_logs
[ In a second terminal ]
mymachine:~:mylogin$./port_forward.sh mylogin 0 ...  You can now open a browser on your machine on the port 6006 and you should get access to tensorboard. Once your work is finished, just unlog from the machine and run kill_reservation.sh: sh11:~:mylogin$ logout
Connection to sh11 closed.
Disconnected from OAR job 99785
Connection to term2.grid closed.
Unlogged
mymachine:/home/mylogin:mylogin$./kill_reservation.sh mylogin 0 The file job_id exists. I will kill the previous reservation in case it is running Deleting the job = 99785 ...REGISTERED. The job(s) [ 99785 ] will be deleted in a near future. Waiting for the previous job to be killed Done mymachine:~:mylogin$


## 3.2.   From outside the campus

Get the scripts and run book.sh as below. We also show a typical output from the execution of the script. The only difference with the calls in the previous paragraph is the last parameter of the script : 1 instead of 0.

mymachine:~:mylogin$./book.sh mylogin 1 Booking a node Reservation successfull Booking requested : OAR_JOB_ID = 99785 Waiting for the reservation to be running, might last few seconds The reservation is not yet running The reservation is not yet running The reservation is not yet running The reservation is not yet running [...] The reservation is not yet running The reservation is not yet running The reservation is running The reservation is running mymachine:~:mylogin$


If the reservation is successfull, you can then log to the booked GPU :

mymachine:~:mylogin$./log.sh mylogin 1 The file job_id exists. I am checking the reservation is still valid The reservation is still running Logging to the booked node Connect to OAR job 99785 via the node sh11 sh11:~:mylogin$


You end up with a terminal logged on a the GPU machine where you can execute your code. Your reservation will run for 24 hours. If you need more time, you may need to tweak the bash script a little bit.

You can log any terminal you wish to the booked machine.

To get access to tensorboard, you need to log to the GPU, start tensorboard and activate port forwarding :

[ In a first terminal ]
mymachine:~:mylogin$./log.sh mylogin 1 ... sh11:~:mylogin$ tensorboard --logdir path_to_the_logs
[ In a second terminal ]
mymachine:~:mylogin$./port_forward.sh mylogin 1 ...  You can now open a browser on your machine on the port 6006 and you should get access to tensorboard. Once your work is finished, just unlog from the machine and run kill_reservation.sh: sh11:~:mylogin$ logout
Connection to sh11 closed.
Disconnected from OAR job 99785
Connection to term2.grid closed.
Unlogged
mymachine:/home/mylogin:mylogin$./kill_reservation.sh mylogin 1 The file job_id exists. I will kill the previous reservation in case it is running Deleting the job = 99785 ...REGISTERED. The job(s) [ 99785 ] will be deleted in a near future. Waiting for the previous job to be killed Done mymachine:~:mylogin$


# 4.   Parametrizing a script with argparse

Back to the top

When you want to test a family of architectures, to be understood in a large sense, covering both the model, its initialization, the loss, optimization function, possibly preprocessing steps, ..., it is certainly easier to write a single script that can take optional command line arguments to parameterize it.

The argparse python module is particularly well suited for defining a parameterized script. A basic usage of argparse is to build an ArgumentParser object and to add arguments to it. Arguments can be optional or mandatory, of type int, string, bool (a flag), etc..

The code below is an example of python script with some elements on how to use it.

import argparse

parser = argparse.ArgumentParser()

# Argument definition
parser.add_argument(
'--normalize',
choices=['None', 'minmax', 'std'],
default='None',
help='Which normalization to apply to the input data',
action='store'
)
parser.add_argument(
'--logdir',
type=str,
default="./logs",
help='The directory in which to store the logs'
)
parser.add_argument(
'--h1',
type=int,
required=True,
help='The size of the hidden layer'
)
group_reg = parser.add_mutually_exclusive_group()
group_reg.add_argument(
'--L2',
type=float,
help='Activate L2 regularization with the provided penalty'
)
group_reg.add_argument(
'--dropout',
type=float,
help='Activate Dropout with the specified rate'
)

# Actual parsing
args = parser.parse_args()

print(args)


And now some usage examples:

mymachine:~:mylogin$python3 argpase_ex.py -h usage: argpase_ex.py [-h] [--normalize {None,minmax,std}] [--logdir LOGDIR] --h1 H1 [--L2 L2 | --dropout DROPOUT] optional arguments: -h, --help show this help message and exit --normalize {None,minmax,std} Which normalization to apply to the input data --logdir LOGDIR The directory in which to store the logs --h1 H1 The size of the hidden layer --L2 L2 Activate L2 regularization with the provided penalty --dropout DROPOUT Activate Dropout with the specified rate mymachine:~:mylogin$ python3 argpase_ex.py
usage: argpase_ex.py [-h] [--normalize {None,minmax,std}] [--logdir LOGDIR]
--h1 H1 [--L2 L2 | --dropout DROPOUT]
argpase_ex.py: error: the following arguments are required: --h1

mymachine:~:mylogin$python3 argpase_ex.py --h1 10 Namespace(L2=None, dropout=None, h1=10, logdir='./logs', normalize='None') mymachine:~:mylogin$ python3 argpase_ex.py --h1 10 --dropout 0.5 --normalize std
Namespace(L2=None, dropout=0.5, h1=10, logdir='./logs', normalize='std')


Back to the top

Table of content