PrimeHub
v4.1
v4.1
  • Introduction
  • Installation
  • Tiers and Licenses
  • End-to-End Tutorial
    • 1 - MLOps Introduction and Scoping the Project
    • 2 - Train and Manage the Model
    • 3 - Compare, Register and Deploy the Model
    • 4 - Build the Web Application
    • 5 - Summary
  • User Guide
    • User Portal
    • Notebook
      • Notebook Tips
      • Advanced Settings
      • PrimeHub Notebook Extension
      • Submit Notebook as Job
    • Jobs
      • Job Artifacts
      • Tutorial
        • (Part1) MNIST classifier training
        • (Part2) MNIST classifier training
        • (Advanced) Use Job Submission to Tune Hyperparameters
        • (Advanced) Model Serving by Seldon
        • Job Artifacts Simple Usecase
    • Models
      • Manage and Deploy Model
      • Model Management Configuration
    • Deployments
      • Pre-packaged servers
        • TensorFlow server
        • PyTorch server
        • SKLearn server
        • Customize Pre-packaged Server
        • Run Pre-packaged Server Locally
      • Package from Language Wrapper
        • Model Image for Python
        • Model Image for R
        • Reusable Base Image
      • Prediction APIs
      • Model URI
      • Tutorial
        • Model by Pre-packaged Server
        • Model by Pre-packaged Server (PHFS)
        • Model by Image built from Language Wrapper
    • Shared Files
    • Datasets
    • Apps
      • Label Studio
      • MATLAB
      • MLflow
      • Streamlit
      • Tutorial
        • Create Your Own App
        • Create an MLflow server
        • Label Dataset by Label Studio
        • Code Server
    • Group Admin
      • Images
      • Settings
    • Generate an PrimeHub API Token
    • Python SDK
    • SSH Server Feature
      • VSCode SSH Notebook Remotely
      • Generate SSH Key Pair
      • Permission Denied
      • Connection Refused
    • Advanced Tutorial
      • Labeling the data
      • Notebook as a Job
      • Custom build the Seldon server
      • PrimeHub SDK/CLI Tools
  • Administrator Guide
    • Admin Portal
      • Create User
      • Create Group
      • Assign Group Admin
      • Create/Plan Instance Type
      • Add InfuseAI Image
      • Add Image
      • Build Image
      • Gitsync Secret for GitHub
      • Pull Secret for GitLab
    • System Settings
    • User Management
    • Group Management
    • Instance Type Management
      • NodeSelector
      • Toleration
    • Image Management
      • Custom Image Guideline
    • Volume Management
      • Upload Server
    • Secret Management
    • App Settings
    • Notebooks Admin
    • Usage Reports
  • Reference
    • Jupyter Images
      • repo2docker image
      • RStudio image
    • InfuseAI Images List
    • Roadmap
  • Developer Guide
    • GitHub
    • Design
      • PrimeHub File System (PHFS)
      • PrimeHub Store
      • Log Persistence
      • PrimeHub Apps
      • Admission
      • Notebook with kernel process
      • JupyterHub
      • Image Builder
      • Volume Upload
      • Job Scheduler
      • Job Submission
      • Job Monitoring
      • Install Helper
      • User Portal
      • Meta Chart
      • PrimeHub Usage
      • Job Artifact
      • PrimeHub Apps
    • Concept
      • Architecture
      • Data Model
      • CRDs
      • GraphQL
      • Persistence Storages
      • Persistence
      • Resources Quota
      • Privilege
    • Configuration
      • How to configure PrimeHub
      • Multiple Jupyter Notebook Kernels
      • Configure SSH Server
      • Configure Job Submission
      • Configure Custom Image Build
      • Configure Model Deployment
      • Setup Self-Signed Certificate for PrimeHub
      • Chart Configuration
      • Configure PrimeHub Store
    • Environment Variables
Powered by GitBook
On this page
  • Features
  • Non-Goal
  • Configuration
  • Design
  • Prober
  • API
  • Legacy resources migration
  1. Developer Guide
  2. Design

PrimeHub Usage

PreviousMeta ChartNextJob Artifact

PrimeHub Usage provides administrators a overall insight of the usage of the PrimeHub.

Usage is about allocated resources, not about actual utilization. For example, when an user opens an Jupyter notebook, the record of the allocated resources is logged in the usage data, even if the user doesn't run any program actually on it. The each record includes the lifetime of a pod, and CPU/GPU/Memory are allocated/occupied for a pod.

Features

  • PrimeHub administrator can download monthly usage report (CSV format)

Non-Goal

  • Actual utilization is not covered in this scope.

The utilization describes the ratio of actual utilized resources to allocated resources.

Configuration

To enable PrimeHub Usage, set the usage.enabled to true.

Path
Description
Default Value

usage.enabled

If the PrimeHub Usage is enabled

false

Design

PrimeHub Usage is made of five components:

  • Usage

    • API: a Rest API to query data from the usage database

    • Prober: a watcher to save pod events in the usage database

  • Database: a Postgresql database saves data of a pvc created by StatefulSet

  • Reporting: a cronjob to generate monthly reports daily, it generates two reports (this month and last month) each time.

  • Monitor: similar to Prober, but it only monitors pod events if it is not updated recently. If a pod hasn't been changed for a while, it will mark the pod finished. In general case, the finished state should be handled by Prober. However, we do a final check in a separated process to deal with edge cases.

Prober

apiVersion: v1
kind: Pod
metadata:
  annotations:
    primehub.io/usage: '{"component": "jupyter", "component_name": "jupyter-foo",
      "group": "phusers", "user": "foo", "instance_type": "cpu-1"}'
  creationTimestamp: "2020-08-31T17:51:31Z"
  labels:
    app: jupyterhub
    chart: jupyterhub-0.9-dev
...

A prober watches pod events and filtering events in specific namespace (e.g. hub) with annotation primehub.io/usage. It defines the lifetime of a pod between

  • A pod when scheduled

  • Terminated time of the last container

API

Usage API is an internal API consumed by GraphQL.

Available months

curl http://primehub-usage-api/report/monthly
["2020/8","2020/9"]

Get report from a month

curl http://primehub-usage-api/report/monthly/2020/9
component,group_name,user_name,gpu_hours,cpu_hours,memory_gb_hours,total_hours,report_date
jupyter,phusers,foo,0.00,4320.00,2160.00,2160.00,202009
deployment,phusers,,0.00,720.00,720.00,720.00,202009
jupyter,phusers,phadmin,0.00,4320.00,2160.00,2160.00,202009
jupyter,phusers,foo,0.00,720.00,720.00,720.00,202009
jupyter,phusers,phadmin,0.00,720.00,720.00,720.00,202009

Legacy resources migration

A cluster might have lots of resources created before PrimeHub Usage enabled. There is a tool to migrate legacy resources by patching their primehub.io/usage annotation.

There is a primehub-usage-legacy-pods-helper.py in the prober pod:

kubectl -n hub exec -it primehub-usage-prober-it-is-an-example -- primehub-usage-legacy-pods-helper.py

After execution, it generates patch commands if some resources are needed to patch:

Please review the commands before applying them.

# patch jupyter pod: jupyter-foo
kubectl -n hub patch pod jupyter-foo --type='json' -p '[{"op": "add", "path": "/metadata/annotations/primehub.io~1usage", "value": "{\"component\": \"jupyter\", \"component_name\": \"jupyter-foo\", \"group\": \"phusers\", \"user\": \"foo\", \"instance_type\": \"cpu-1\"}"}]'

# patch phdeployment deployment: tmp-1gawm, it might restart pods if anything have changed
kubectl -n hub patch deployment tmp-1gawm --type='json' -p '[{"op": "add", "path": "/spec/template/metadata/annotations/primehub.io~1usage", "value": "{\"component\": \"deployment\", \"component_name\": \"tmp-1gawm\", \"group\": \"model-deployment-test-group\", \"user\": \"ericy\", \"instance_type\": \"cpu-tiny\"}"}]'