(Part2) MNIST classifier training
Last updated
Last updated
Select the Notebooks icon upon logging in.
After you login to PrimeHub, please open Notebook
in a new tab first.
Select an instance type
with at least one virtual CPU and 2GB RAM.
Select an image
with Tensorflow 1.14.
Start the notebook.
Example using our demo group
Follow the steps below to write code in group volume
Double click your group volume folder.
Click Text File
in the right hand side under the 'Other' section.
Right click on the new untitled.txt
and rename it into train_mnist.py
Write code in the right hand side.
This is the example code of training a MNIST classifier.
Please note: we are saving a model file in a relative path.
Congratulations! You have prepared everything you need to submit a job.
Confirm if the current group is what you desire; switch the group by the Group:
dropdown at the top of the right side.
Select the Jobs
icon and click Create Job
.
To submit a job, please open Job Submission in a new tab and create a job by clicking top right button.
Select your instance type and image on the left panel; ensure that these are the identical to the ones you are using in JupyterLab.
In the right panel, name the job training mnist
in Job Name input field, or another name that prefer.
Since our code is under group volume and will be mounted in /home/jovyan/<group name> -> /project/<group name>
, type the following into the Command input field; replace <group name>
:
You need to cd
into group volume
first. Because we save model in a relative path.
<group name>
is case sensitive;
You may notice there is a -u
in python command. In Job Submission, Python will buffer the log by default. Adding -u
tells Python not to buffer the log so that we can see the log in real time.
--dropout
is a parameter which we specified in our code.
Job Submission will execute command column as a shell script. Therefore, you can write multiple line just like you are writing a shell script.
If you hover your mouse over the question mark next to Command
, you can see more hints.
Jobs list page and refresh button
After submitting a job, the job will appear in the Jobs list in a Pending
state. Click the Refresh button in the top right corner to get the latest status of your jobs.
Logs tab
To view the Job's log, select the Job name and then click the Logs
tab to see that the job is running. Wait for the job to finish running. While waiting, you don't need to wait in the Job Submission page - you can write code and continue to do analyses while the job is running!
When the job is complete, it will output a model file mymodel
into JupyterLab.
It's time to verify if our job really train a MNIST classifier.
Outputted model file located in JuypterLab
In your JupyterLab tab, you can find the outputted model file params_dropout-0.2/my_model
in group volume
.
If the file does not exist, please check to see if your job succeeded. If the job failed, select the job name to check the log for any error messages. Then try to fix errors and submit again.
Follow the steps below to test the MNIST classifier:
Ensure that you are in group volume
Click Python 3
on the right hand side under the 'Notebook' section
Write code in your newly created Jupyter notebook on the right hand side:
Example Code
Press shift + enter
to execute your code.
Example code from testing a MNIST classifier. Warning messages (highlighted in red above) can be ignored since the model's accuracy is ~0.98.
Hooray! You have trained a MNIST classifier through the Job Submission feature!
You may have noticed that we used a notebook during testing, and a Python file in training. We suggest that you write experimental code in the notebook. Once you confirm that the code works, you can convert it to a regular Python file and fully utilize the power of Job Submission. From now on, you can submit many jobs with differing drop rates and check the results.