Log Persistence

Allows users to persists the job submission logs. By default, the job log is retrieved from the underlying pod. As the pod is deleted, the log is no longer accessed by the user.

Prerequisites

The PrimeHub Store feature must be enabled

Features

The job log can be still accessible even the underlying pod is deleted.
Support to store on s3 or gcs
Support flush interval and max buffer size
Support txt and gzip format

Configruation

To enable PHFS, set the store.eanbled and store.logPersistence.enabled to true.

Path

Description

Default Value

store.enabled

If the PrimeHub store is enabled

false

store.logPersistence.enabled

If the log persistence is enabled

true

fluentd.flushAtShutdown

Flush when flunetd is shutdown. Please see flush_interval setting in flunetd buffer document

false

fluentd.flushInterval

The flush interval. Please see flush_interval in flunetd buffer document

3600s

fluentd.chunkLimitSize

The max size of each chunks. Please see chunk_limit_size setting in flunetd buffer document

"256m"

fluentd.storeAs

The log format stored in the store. We supports txt or gzip. Please see store_as setting in flunetd s3 plugin document

txt

fluentd.*

The other fluentd settings

Please see the chart configuration

Design

Flunetd: The log collector to collect pod logs to PrimeHub store
GraphQL server: The log endpoint retrieve the log from PrimeHub store if pod does not exist
Console: Get the log from graphql server

Fluentd

Fluentd is based on fluentd kuberentes dameonset. The behavior is

Get the logs from /var/log/containers
Get the pod metadata from kubernetes API
Filter the log by label
Flush the log to minio by s3 plugin

GraphQL

Enhance the original log endpoint
Add a new query parameter persist=true. If it marked as true, the log is retrieve from persistent log

Console

The log UI would try to get the log from pod persist=false
Once the response has code 404, it will continue to get the persistent log by persist=true

Prefix in PrimeHub store

The prefix of log persistence is /logs
The output of one job is /logs/phjob/<phjob>/<date>/ (e.g /logs/hub/job-202006030120-gxpavy/2020-06-03/log-*.txt)

Limitation

The default flush time of fluentd is 1 hour. So the log may have 1 hour delay from persistent log. It is possible to shorten the flush interval in configuration. However, it may generate more files in the storage and lead to more query overhead.

PreviousPrimeHub Store NextPrimeHub Apps