PrimeHub
v4.1
v4.1
  • Introduction
  • Installation
  • Tiers and Licenses
  • End-to-End Tutorial
    • 1 - MLOps Introduction and Scoping the Project
    • 2 - Train and Manage the Model
    • 3 - Compare, Register and Deploy the Model
    • 4 - Build the Web Application
    • 5 - Summary
  • User Guide
    • User Portal
    • Notebook
      • Notebook Tips
      • Advanced Settings
      • PrimeHub Notebook Extension
      • Submit Notebook as Job
    • Jobs
      • Job Artifacts
      • Tutorial
        • (Part1) MNIST classifier training
        • (Part2) MNIST classifier training
        • (Advanced) Use Job Submission to Tune Hyperparameters
        • (Advanced) Model Serving by Seldon
        • Job Artifacts Simple Usecase
    • Models
      • Manage and Deploy Model
      • Model Management Configuration
    • Deployments
      • Pre-packaged servers
        • TensorFlow server
        • PyTorch server
        • SKLearn server
        • Customize Pre-packaged Server
        • Run Pre-packaged Server Locally
      • Package from Language Wrapper
        • Model Image for Python
        • Model Image for R
        • Reusable Base Image
      • Prediction APIs
      • Model URI
      • Tutorial
        • Model by Pre-packaged Server
        • Model by Pre-packaged Server (PHFS)
        • Model by Image built from Language Wrapper
    • Shared Files
    • Datasets
    • Apps
      • Label Studio
      • MATLAB
      • MLflow
      • Streamlit
      • Tutorial
        • Create Your Own App
        • Create an MLflow server
        • Label Dataset by Label Studio
        • Code Server
    • Group Admin
      • Images
      • Settings
    • Generate an PrimeHub API Token
    • Python SDK
    • SSH Server Feature
      • VSCode SSH Notebook Remotely
      • Generate SSH Key Pair
      • Permission Denied
      • Connection Refused
    • Advanced Tutorial
      • Labeling the data
      • Notebook as a Job
      • Custom build the Seldon server
      • PrimeHub SDK/CLI Tools
  • Administrator Guide
    • Admin Portal
      • Create User
      • Create Group
      • Assign Group Admin
      • Create/Plan Instance Type
      • Add InfuseAI Image
      • Add Image
      • Build Image
      • Gitsync Secret for GitHub
      • Pull Secret for GitLab
    • System Settings
    • User Management
    • Group Management
    • Instance Type Management
      • NodeSelector
      • Toleration
    • Image Management
      • Custom Image Guideline
    • Volume Management
      • Upload Server
    • Secret Management
    • App Settings
    • Notebooks Admin
    • Usage Reports
  • Reference
    • Jupyter Images
      • repo2docker image
      • RStudio image
    • InfuseAI Images List
    • Roadmap
  • Developer Guide
    • GitHub
    • Design
      • PrimeHub File System (PHFS)
      • PrimeHub Store
      • Log Persistence
      • PrimeHub Apps
      • Admission
      • Notebook with kernel process
      • JupyterHub
      • Image Builder
      • Volume Upload
      • Job Scheduler
      • Job Submission
      • Job Monitoring
      • Install Helper
      • User Portal
      • Meta Chart
      • PrimeHub Usage
      • Job Artifact
      • PrimeHub Apps
    • Concept
      • Architecture
      • Data Model
      • CRDs
      • GraphQL
      • Persistence Storages
      • Persistence
      • Resources Quota
      • Privilege
    • Configuration
      • How to configure PrimeHub
      • Multiple Jupyter Notebook Kernels
      • Configure SSH Server
      • Configure Job Submission
      • Configure Custom Image Build
      • Configure Model Deployment
      • Setup Self-Signed Certificate for PrimeHub
      • Chart Configuration
      • Configure PrimeHub Store
    • Environment Variables
Powered by GitBook
On this page
  • Configuration
  • Design
  • Other Notes
  1. Developer Guide
  2. Design

Volume Upload

Provide a upload server to upload data to pv type volume.

Configuration

Prerequisite

Required PRIMEHUB_FEATURE_USER_PORTAL true. And PRIMEHUB_DOMAIN be set.

Settings

Please add these variables to the .env file

Name
Value

PRIMEHUB_FEATURE_DATASET_UPLOAD

true

Install

make release-install-primehub

Migration

Set PRIMEHUB_STORAGE_CLASS env to correct storage class.

Troubleshooting

  • Check Primehub Console Container's Environment Variables

The environment variables should be added automatically. PRIMEHUB_FEATURE_DATASET_UPLOAD will be added to graphql and ui containers when PRIMEHUB_FEATURE_DATASET_UPLOAD is true in your cluster's .env file.

CMS_APP_PREFIX will be added to graphql container.

PRIMEHUB_GROUP_SC will be added to graphql container. This value is based on

groupvolume:
  storageClass: {value}

And if you didn't specify value in yaml, it will be set by PRIMEHUB_STORAGE_CLASS env.

  • Check Issuer

If you are using letsencrypt-prod-dns issuer, your volume upload ingress annotations should contain:

certmanager.k8s.io/acme-challenge-type: "dns01"
certmanager.k8s.io/acme-dns01-provider: "clouddns"
certmanager.k8s.io/cluster-issuer: "letsencrypt-prod-dns"

Design

Metacontroller is used to automatically create desired resources based on our settings.

Application code is under modules/primehub-dataset-upload. K8s and metacontroller related code is under modules/charts/primehub.

Start/Stop Volume Upload Server

When volume has an annotation dataset.primehub.io/uploadServer: "true", it will start a volume upload server.

Otherwise, it is stopped.

Currently, volume upload url is https://<primehub domain>/admin/dataset/<namespace>/<dataset name>/browse/.

Enable Http Auth to Volume Upload Server

First, need to have a secret which is created by htpasswd. EX:

htpasswd -c auth <name>
kubectl -n hub create secret generic dataset-upload-<name> --from-file=auth

Then add an annotation dataset.primehub.io/uploadServerAuthSecretName: dataset-upload-<name> to enable http auth.

Username is <name>.

Current Post-Finish Hook in Tus Server (Tusd)

  • Make a dir if we need

  • Move .bin to their real file name

  • Remove .info which generated by tusd

  • If it is a zip file, unzip it

Other Notes

  • Cli resumable ability now only handle bad network situations. It dose not handle the situation that user cancel a upload job. (web & cli can't resume interchangeably) (https://github.com/tus/tus-js-client/issues/62)

  • Mechanism to clean up temporary state files.

Cli

  • Download from https://github.com/avvertix/tus-client-cli/releases/tag/v0.3.0

  • ./tus-client-macos upload <filepath> https://<primehub domain>/admin/dataset/<name>/upload/files/

PreviousImage BuilderNextJob Scheduler

We use tus protocol to do the resumable file uploads. Backend is . Frontend package is . In order to let user view/edit uploaded files, also have a flask server to view/edit uploaded files. The package to view files is . Therefore, volume upload deployment contains two containers and both have a mounted pv volume.

tusd
uppy
Flask-AutoIndex