PrimeHub
v4.1
v4.1
  • Introduction
  • Installation
  • Tiers and Licenses
  • End-to-End Tutorial
    • 1 - MLOps Introduction and Scoping the Project
    • 2 - Train and Manage the Model
    • 3 - Compare, Register and Deploy the Model
    • 4 - Build the Web Application
    • 5 - Summary
  • User Guide
    • User Portal
    • Notebook
      • Notebook Tips
      • Advanced Settings
      • PrimeHub Notebook Extension
      • Submit Notebook as Job
    • Jobs
      • Job Artifacts
      • Tutorial
        • (Part1) MNIST classifier training
        • (Part2) MNIST classifier training
        • (Advanced) Use Job Submission to Tune Hyperparameters
        • (Advanced) Model Serving by Seldon
        • Job Artifacts Simple Usecase
    • Models
      • Manage and Deploy Model
      • Model Management Configuration
    • Deployments
      • Pre-packaged servers
        • TensorFlow server
        • PyTorch server
        • SKLearn server
        • Customize Pre-packaged Server
        • Run Pre-packaged Server Locally
      • Package from Language Wrapper
        • Model Image for Python
        • Model Image for R
        • Reusable Base Image
      • Prediction APIs
      • Model URI
      • Tutorial
        • Model by Pre-packaged Server
        • Model by Pre-packaged Server (PHFS)
        • Model by Image built from Language Wrapper
    • Shared Files
    • Datasets
    • Apps
      • Label Studio
      • MATLAB
      • MLflow
      • Streamlit
      • Tutorial
        • Create Your Own App
        • Create an MLflow server
        • Label Dataset by Label Studio
        • Code Server
    • Group Admin
      • Images
      • Settings
    • Generate an PrimeHub API Token
    • Python SDK
    • SSH Server Feature
      • VSCode SSH Notebook Remotely
      • Generate SSH Key Pair
      • Permission Denied
      • Connection Refused
    • Advanced Tutorial
      • Labeling the data
      • Notebook as a Job
      • Custom build the Seldon server
      • PrimeHub SDK/CLI Tools
  • Administrator Guide
    • Admin Portal
      • Create User
      • Create Group
      • Assign Group Admin
      • Create/Plan Instance Type
      • Add InfuseAI Image
      • Add Image
      • Build Image
      • Gitsync Secret for GitHub
      • Pull Secret for GitLab
    • System Settings
    • User Management
    • Group Management
    • Instance Type Management
      • NodeSelector
      • Toleration
    • Image Management
      • Custom Image Guideline
    • Volume Management
      • Upload Server
    • Secret Management
    • App Settings
    • Notebooks Admin
    • Usage Reports
  • Reference
    • Jupyter Images
      • repo2docker image
      • RStudio image
    • InfuseAI Images List
    • Roadmap
  • Developer Guide
    • GitHub
    • Design
      • PrimeHub File System (PHFS)
      • PrimeHub Store
      • Log Persistence
      • PrimeHub Apps
      • Admission
      • Notebook with kernel process
      • JupyterHub
      • Image Builder
      • Volume Upload
      • Job Scheduler
      • Job Submission
      • Job Monitoring
      • Install Helper
      • User Portal
      • Meta Chart
      • PrimeHub Usage
      • Job Artifact
      • PrimeHub Apps
    • Concept
      • Architecture
      • Data Model
      • CRDs
      • GraphQL
      • Persistence Storages
      • Persistence
      • Resources Quota
      • Privilege
    • Configuration
      • How to configure PrimeHub
      • Multiple Jupyter Notebook Kernels
      • Configure SSH Server
      • Configure Job Submission
      • Configure Custom Image Build
      • Configure Model Deployment
      • Setup Self-Signed Certificate for PrimeHub
      • Chart Configuration
      • Configure PrimeHub Store
    • Environment Variables
Powered by GitBook
On this page
  • Install Label Studio
  • Label Studio UI
  • Label Dataset
  • How to Use Labeled Data to Train a Model
  1. User Guide
  2. Apps
  3. Tutorial

Label Dataset by Label Studio

PreviousCreate an MLflow serverNextCode Server

This tutorial covers the basic flow to help you get started with Label Studio in PrimeHub.

Install Label Studio

First, you need to install it in the Apps tab. Please check the Overview section to learn how to install an App.

In the installing process, you can change the environment variables.

DEFAULT_USERNAME and DEFAULT_PASSWORD are the login account information. You can change them and use them to log into Label Studio after installed.

If you don't know the meaning of other environment variables, you can use the default values or check the or the tooltip beside the environment variable for more details.

Label Studio UI

PrimeHub shows the app's state in the Apps tab. You can open the Label Studio UI by clicking Open after the state becomes Ready.

It will open a new window and show the Label Studio UI. You can find your login information by clicking Manage in the Apps tab and then clicking the eyes icon. The $(PRIMEHUB_GROUP) is the group name.

Label Dataset

What we need?

  • The dataset in PrimeHub you want to label (we use /datasets/dog-demo in this tutorial)

  • The directory in group volume that you want to save the labeled results (we use /project/<group_name>/dog-demo-labeled in the tutorial)

Please have the data volume, group volume, or request administrators for assistance before we start.

Steps

  1. After login, please click Create button.

  2. Enter your Project Name. Skip the Data Import step. And choose the Labeling Setup. Here we choose Semantic Segmentation with Polygons.

  3. Delete the original Labels settings and Add our own label names.

  4. Sync the data file folder with label studio.

    1. Click the Settings on the upper-right.

    2. Click Cloud Storage and Add Source Storage to sync the data volume to label

    3. Configure the source storage setting:

      Variable
      Value

      Storage type

      Local path

      Absolute local path

      /datasets/dog-demo/

      File Filter Regex

      .*jpeg

      Treat every bucket object as a source file

      Enable

    4. Click the Sync Storage to sync the data volume

  5. Click Add Target Storage to sync to labeled results to /project/<group_name>/dog-demo-labeled. You need to set Local path to /project/<group_name>/dog-demo-labeled.

  6. Back to the project in Label Studio. The data in the data volume has been shown on the UI. And you can click each row of data to label.

  7. After you submit the labeled result, the labeled json file will be under the /project/<group_name>/dog-demo-labeled.

That's the basic use of how to label the dataset by using Label Studio and PrimeHub. Enjoy it!

How to Use Labeled Data to Train a Model

In the last section, we show you how to label the dataset. Now, we want to demonstrate how you can use the labeled data to train a model.

For simplicity, the model will be a classification model and you also only need to label the class of the image. The model classifies whether the screw is good or bad.

Here are examples of good and bad screws. The first image is the good screw. The second image is the bad screw and you can see the there is a manipulated front.

What we need?

  • Create a directory /project/<group_name>/screw-labeled in group volume to save the labeled results

  • The image infuseai/docker-stacks:pytorch-notebook-v1-7-0-04b2c51f

  • An instance type >= minimal requirement (CPU=1, GPU=0, Mem=2G)

  • The prepared python file of the example app_tutorial_labelstudio_screw_prepare.py and upload it to ~/screw_train by the notebook

  • The prepared notebook file of the example app_tutorial_labelstudio_screw_train.ipynb and upload it to ~/screw_train by the notebook

Please have the data volume, group volume, or request administrators for assistance before we start.

To use the new data volume, you need to create a label studio app after the creation of the data volume.

Steps

  1. Follow the previous Label Dataset section to use the label studio. This time in Labeling Setup, we should choose Image Classification.

  2. Delete the original Labels settings and Add our own label classes: bad, good.

  3. Click the Settings on the upper-right. Click Cloud Storage and Add Source Storage to sync the /datasets/screw data volume to label. Set Local path to /datasets/screw, set File Filter Regex to .*png, turn on toggle of Treat every bucket object as a source file. After added, click Sync Storage.

  4. Click Add Target Storage to sync to labeled results to /project/<group_name>/screw-labeled. You need to set Local path to /project/<group_name>/screw-labeled.

  5. Back to the project in Label Studio. The data in the data volume has been shown on the UI. And you can click Label to start labeling. (Tip: you can use number to select the class)

    After you labeled all images, you may see the following message. This is a known issue. Please click OK, click your project name and refresh the page.

  6. Now you have labeled all data by the label studio. We can go back to our notebook to train the model.

  7. Open a terminal.

      cd ~/screw_train
      python app_tutorial_labelstudio_screw_prepare.py --path /project/<group_name>/screw-labeled/

    After executed, it will create a folder named data and place the labeled images into the correct folder inside data folder.

We successfully use our labeled data to train a model which can classify whether the screw is good or bad!

Create a in PrimeHub called screw, and set the read/write permission to your group. Please download the app_tutorial_labelstudio_screw_dataset.zip, unzip it and upload images to the ~/datasets/screw folder by the notebook

Open the notebook app_tutorial_labelstudio_screw_train.ipynb and execute all cells. In the last cell, you will see the result which is similar to the following image.

Label Studio Official Doc
15MB
app_tutorial_labelstudio_screw_dataset.zip
archive
2KB
app_tutorial_labelstudio_screw_prepare (1).py
9KB
app_tutorial_labelstudio_screw_train (1).ipynb
data volume