PrimeHub
  • Introduction
  • Installation
  • Tiers and Licenses
  • End-to-End Tutorial
    • 1 - MLOps Introduction and Scoping the Project
    • 2 - Train and Manage the Model
    • 3 - Compare, Register and Deploy the Model
    • 4 - Build the Web Application
    • 5 - Summary
  • User Guide
    • User Portal
    • Notebook
      • Notebook Tips
      • Advanced Settings
      • PrimeHub Notebook Extension
      • Submit Notebook as Job
    • Jobs
      • Job Artifacts
      • Tutorial
        • (Part1) MNIST classifier training
        • (Part2) MNIST classifier training
        • (Advanced) Use Job Submission to Tune Hyperparameters
        • (Advanced) Model Serving by Seldon
        • Job Artifacts Simple Usecase
    • Models
      • Manage and Deploy Model
      • Model Management Configuration
    • Deployments
      • Pre-packaged servers
        • TensorFlow server
        • PyTorch server
        • SKLearn server
        • Customize Pre-packaged Server
        • Run Pre-packaged Server Locally
      • Package from Language Wrapper
        • Model Image for Python
        • Model Image for R
        • Reusable Base Image
      • Prediction APIs
      • Model URI
      • Tutorial
        • Model by Pre-packaged Server
        • Model by Pre-packaged Server (PHFS)
        • Model by Image built from Language Wrapper
    • Shared Files
    • Datasets
    • Apps
      • Label Studio
      • MATLAB
      • MLflow
      • Streamlit
      • Tutorial
        • Create Your Own App
        • Create an MLflow server
        • Label Dataset by Label Studio
        • Code Server
    • Group Admin
      • Images
      • Settings
    • Generate an PrimeHub API Token
    • Python SDK
    • SSH Server Feature
      • VSCode SSH Notebook Remotely
      • Generate SSH Key Pair
      • Permission Denied
      • Connection Refused
    • Advanced Tutorial
      • Labeling the data
      • Notebook as a Job
      • Custom build the Seldon server
      • PrimeHub SDK/CLI Tools
  • Administrator Guide
    • Admin Portal
      • Create User
      • Create Group
      • Assign Group Admin
      • Create/Plan Instance Type
      • Add InfuseAI Image
      • Add Image
      • Build Image
      • Gitsync Secret for GitHub
      • Pull Secret for GitLab
    • System Settings
    • User Management
    • Group Management
    • Instance Type Management
      • NodeSelector
      • Toleration
    • Image Management
      • Custom Image Guideline
    • Volume Management
      • Upload Server
    • Secret Management
    • App Settings
    • Notebooks Admin
    • Usage Reports
  • Reference
    • Jupyter Images
      • repo2docker image
      • RStudio image
    • InfuseAI Images List
    • Roadmap
  • Developer Guide
    • GitHub
    • Design
      • PrimeHub File System (PHFS)
      • PrimeHub Store
      • Log Persistence
      • PrimeHub Apps
      • Admission
      • Notebook with kernel process
      • JupyterHub
      • Image Builder
      • Volume Upload
      • Job Scheduler
      • Job Submission
      • Job Monitoring
      • Install Helper
      • User Portal
      • Meta Chart
      • PrimeHub Usage
      • Job Artifact
      • PrimeHub Apps
    • Concept
      • Architecture
      • Data Model
      • CRDs
      • GraphQL
      • Persistence Storages
      • Persistence
      • Resources Quota
      • Privilege
    • Configuration
      • How to configure PrimeHub
      • Multiple Jupyter Notebook Kernels
      • Configure SSH Server
      • Configure Job Submission
      • Configure Custom Image Build
      • Configure Model Deployment
      • Setup Self-Signed Certificate for PrimeHub
      • Chart Configuration
      • Configure PrimeHub Store
    • Environment Variables
Powered by GitBook
On this page
  • Prerequisites
  • Features
  • Configruation
  • Design
  • Prefix in PrimeHub store
  • Limitation
  1. Developer Guide
  2. Design

Log Persistence

PreviousPrimeHub StoreNextPrimeHub Apps

Last updated 2 years ago

Allows users to persists the job submission logs. By default, the job log is retrieved from the underlying pod. As the pod is deleted, the log is no longer accessed by the user.

Prerequisites

The feature must be enabled

Features

  • The job log can be still accessible even the underlying pod is deleted.

  • Support to store on s3 or gcs

  • Support flush interval and max buffer size

  • Support txt and gzip format

Configruation

To enable PHFS, set the store.eanbled and store.logPersistence.enabled to true.

Path
Description
Default Value

store.enabled

If the PrimeHub store is enabled

false

store.logPersistence.enabled

If the log persistence is enabled

true

fluentd.flushAtShutdown

false

fluentd.flushInterval

3600s

fluentd.chunkLimitSize

"256m"

fluentd.storeAs

txt

fluentd.*

The other fluentd settings

Design

  • Flunetd: The log collector to collect pod logs to PrimeHub store

  • GraphQL server: The log endpoint retrieve the log from PrimeHub store if pod does not exist

  • Console: Get the log from graphql server

Fluentd

  • Get the logs from /var/log/containers

  • Get the pod metadata from kubernetes API

  • Filter the log by label

  • Flush the log to minio by s3 plugin

GraphQL

  • Enhance the original log endpoint

  • Add a new query parameter persist=true. If it marked as true, the log is retrieve from persistent log

Console

  • The log UI would try to get the log from pod persist=false

  • Once the response has code 404, it will continue to get the persistent log by persist=true

Prefix in PrimeHub store

  • The prefix of log persistence is /logs

  • The output of one job is /logs/phjob/<phjob>/<date>/ (e.g /logs/hub/job-202006030120-gxpavy/2020-06-03/log-*.txt)

Limitation

The default flush time of fluentd is 1 hour. So the log may have 1 hour delay from persistent log. It is possible to shorten the flush interval in configuration. However, it may generate more files in the storage and lead to more query overhead.

Flush when flunetd is shutdown. Please see flush_interval setting in

The flush interval. Please see flush_interval in

The max size of each chunks. Please see chunk_limit_size setting in

The log format stored in the store. We supports txt or gzip. Please see store_as setting in

Please see the

Fluentd is based on . The behavior is

PrimeHub Store
fluentd kuberentes dameonset
flunetd buffer document
flunetd buffer document
flunetd buffer document
flunetd s3 plugin document
chart configuration