PrimeHub
v4.1
v4.1
  • Introduction
  • Installation
  • Tiers and Licenses
  • End-to-End Tutorial
    • 1 - MLOps Introduction and Scoping the Project
    • 2 - Train and Manage the Model
    • 3 - Compare, Register and Deploy the Model
    • 4 - Build the Web Application
    • 5 - Summary
  • User Guide
    • User Portal
    • Notebook
      • Notebook Tips
      • Advanced Settings
      • PrimeHub Notebook Extension
      • Submit Notebook as Job
    • Jobs
      • Job Artifacts
      • Tutorial
        • (Part1) MNIST classifier training
        • (Part2) MNIST classifier training
        • (Advanced) Use Job Submission to Tune Hyperparameters
        • (Advanced) Model Serving by Seldon
        • Job Artifacts Simple Usecase
    • Models
      • Manage and Deploy Model
      • Model Management Configuration
    • Deployments
      • Pre-packaged servers
        • TensorFlow server
        • PyTorch server
        • SKLearn server
        • Customize Pre-packaged Server
        • Run Pre-packaged Server Locally
      • Package from Language Wrapper
        • Model Image for Python
        • Model Image for R
        • Reusable Base Image
      • Prediction APIs
      • Model URI
      • Tutorial
        • Model by Pre-packaged Server
        • Model by Pre-packaged Server (PHFS)
        • Model by Image built from Language Wrapper
    • Shared Files
    • Datasets
    • Apps
      • Label Studio
      • MATLAB
      • MLflow
      • Streamlit
      • Tutorial
        • Create Your Own App
        • Create an MLflow server
        • Label Dataset by Label Studio
        • Code Server
    • Group Admin
      • Images
      • Settings
    • Generate an PrimeHub API Token
    • Python SDK
    • SSH Server Feature
      • VSCode SSH Notebook Remotely
      • Generate SSH Key Pair
      • Permission Denied
      • Connection Refused
    • Advanced Tutorial
      • Labeling the data
      • Notebook as a Job
      • Custom build the Seldon server
      • PrimeHub SDK/CLI Tools
  • Administrator Guide
    • Admin Portal
      • Create User
      • Create Group
      • Assign Group Admin
      • Create/Plan Instance Type
      • Add InfuseAI Image
      • Add Image
      • Build Image
      • Gitsync Secret for GitHub
      • Pull Secret for GitLab
    • System Settings
    • User Management
    • Group Management
    • Instance Type Management
      • NodeSelector
      • Toleration
    • Image Management
      • Custom Image Guideline
    • Volume Management
      • Upload Server
    • Secret Management
    • App Settings
    • Notebooks Admin
    • Usage Reports
  • Reference
    • Jupyter Images
      • repo2docker image
      • RStudio image
    • InfuseAI Images List
    • Roadmap
  • Developer Guide
    • GitHub
    • Design
      • PrimeHub File System (PHFS)
      • PrimeHub Store
      • Log Persistence
      • PrimeHub Apps
      • Admission
      • Notebook with kernel process
      • JupyterHub
      • Image Builder
      • Volume Upload
      • Job Scheduler
      • Job Submission
      • Job Monitoring
      • Install Helper
      • User Portal
      • Meta Chart
      • PrimeHub Usage
      • Job Artifact
      • PrimeHub Apps
    • Concept
      • Architecture
      • Data Model
      • CRDs
      • GraphQL
      • Persistence Storages
      • Persistence
      • Resources Quota
      • Privilege
    • Configuration
      • How to configure PrimeHub
      • Multiple Jupyter Notebook Kernels
      • Configure SSH Server
      • Configure Job Submission
      • Configure Custom Image Build
      • Configure Model Deployment
      • Setup Self-Signed Certificate for PrimeHub
      • Chart Configuration
      • Configure PrimeHub Store
    • Environment Variables
Powered by GitBook
On this page
  • Volume Types
  • User Volume
  • Shared Volume
  • PHFS Storage
  • Data Volume
  • Comparison
  1. Developer Guide
  2. Concept

Persistence

PrimeHub provides several types of persistent data stores. This document describes the characteristics of each of them, and the conditions in which they perform the best.

Volume Types

User Volume

A user's private storage.

A User Volume is good for:

  • Storing personal data. (e.g. datasets, code)

A User Volume has the following limitations:

  • It can only be accessed via notebooks.

Shared Volume

A volume shared among group members. All members can read and write data to this volume. It is like an NFS server for a group.

A Shared Volume is good for:

  • Storing shared data among members in a group

  • Exchanging data among notebooks, jobs, and apps

A Shared Volume cannot be used for:

  • Downloading and uploading data through an API/CLI/SDK

PHFS Storage

Data stored in PHFS can be found under the subpath /groups/<group> of an object storage bucket.

There are several ways to access data stored in PHFS:

  • PHFS can be mounted in notebooks, apps, and jobs.

PHFS is good for:

  • Uploading and downloading files via the Shared Files UI

  • The source of model files for model deployment

Even though we can access the PHFS from the filesystem, the access mode is not fully POSIX-compatible. It does not allow random access and append write. It's only suitable for sequential read and sequential write operation.

Due to this limitation PHFS cannot be used for:

  • Uploading a file with size > 1MB from the notebook UI (i.e. Jupterlab upload feature). An error will occur and the uploaded file will be truncated to 1MB. To upload files larger than 1MB, please use the _Shared Files_** UI.**

  • The output of training. Some ML frameworks cannot output training results successfully to PHFS. For example, in TensorFlow, writing model files in HDF5 format to PHFS will cause the error Problems closing file (file write failed: ...) due to HDF5 using seek while writing. To store training results in PHFS, first output to a User Volume or Shared Volume, and then copy to PHFS.

  • The input of training. PHFS has the worst performance out of all kinds of storage. To train a dataset multiple runs, we recommend putting them in user volume or group volume.

Data Volume

A Data Volume is a storage type that can be shared among multiple groups. The following permission settings can be configured:

  • Read-only on a global or per-group basis

  • Writable on a per-group basis

There are several kinds of Data Volumes we can create:

  • Persistent volume (PV): Like group volume, but can be shared among multiple groups rather than just a single group.

  • NFS: A volume that connects to an external NFS server.

  • Host Path: A special kind of volume that mounts the host filesystem.

  • Git: A special kind of volume which syncs the upstream git repository periodically. The actual data is stored on the host filesystem.

  • Env: Technically, this is not a volume, but a method to configure environment variables to be used in notebooks and jobs.

A Data Volume is good for:

  • Sharing among groups. In an education environment, for example, datasets could be shared among multiple teams (groups) of students with read-only permissions, while the teaching assistants could be in another group with write permissions.

  • Special storage destination (e.g. external NFS server, host path, git sync)

A Data Volume has the following limitations:

  • Data cannot downloaded and uploaded through API/CLI/SDK

  • If the volume is to be used by only one group then, due to its ease of use, a Shared Volume is preferred

Comparison

Type
Shared by
API/UI Access
Use case

User Volume

No

No

Private data

Group Volume

Group members of a group

No

Shared data in group

PHFS

Group members of a group

Yes

Data import/export

Data Volume

Multiple groups

No

Shared data among groups

All four storage options can be accessed via the file system. The following table describes the mount points and characteristics:

Type
Available in
Mount point
Characteristic

User Volume

Notebooks

/home/jovyan

Best performance (like block device)

Group Volume

Notebooks Apps Jobs

/project/<group>

Good performance (like NFS)

PHFS

Notebooks Apps Jobs

/phfs

Limited access mode Sequential Read/Write (like object storage)

Data Volume

Notebooks Apps Jobs

/datasets/<volume>

Good performace (like NFS)

PreviousPersistence StoragesNextResources Quota

A group's Shared Volume is not enabled by default. Please contact the system administrator to enable it. For more information, Please see .

PHFS (PrimeHub File System) is shared among group members, like a Shared Volume. However, PHFS has the added benefit of being , similar to S3. Due to the characteristics of object storage, PHFS provides the best accessibility out of all kinds of storage.

Users can download/upload content from the UI in the User Portal.

Users can list and download files from the .

Data exchange through

Storing the of a job's output

PHFS is not installed by default, please check this document to .

A Data Volume is configured by the system administrator. For more information, Please see . In some types of the volume, we can also configure a to upload data to the data volume.

Group Management
object storage
Shared Files
PrimeHub SDK/CLI
PrimeHub SDK/CLI
artifacts
configure PrimeHub store and PHFS
Volume Management
upload server