PrimeHub Apps
Allows third-party application integrated into PrimeHub platform.
Features
Shared domain: The installed application can be accessed in the sub-path of PrimeHub's domain. We don't need an additional domain for this application.
Authorization: Allows to restrict the applications only accessible to group members, PrimeHub logged-in users, or public users.
Data Persistence: Allows applications to persist data in group volume and access data in other persistent storage, like data volumes and PHFS.
Resource constraint: Enforce the resource CPU, memory, GPU quota limitation in a group.
Concepts
Application
We introduce a new concept: Application in PrimeHub. An application is an instance of an integrated third-party application (e.g. MLflow). We install it as a group resource and can install multiple instances for a kind of application within a group.
Application Template
Application template describes how an application is installed. It contains this information
podTemplate: used to create the deployment of the application
Service Ports: used to create the service
HTTP Port: The HTTP port if the application has a web interface
defaultEnvs: The default env variables are used when creating the application. When an application is created, the values would be put in enviornment variables of the target application.
ENV Name
Description
Default value
Optional
Create an Application
Users can create an application from an application template.
Select an application template
Fill the default envs provided by the template
Select the instance type
Choose the scope (Public / PrimeHub users only/ Group members only). The scope only affects the web interface of the application.
Create
Preset Environment Variables
The preset environment variables can be used in the value field of the environment variables. Here is the list of preset environment variables
PRIMEHUB_APP_ID
: The PhApplication k8s resource name<app-id>
PRIMEHUB_APP_ROOT
: The root of persistence storage for the application.<group-volume>/phapplications/<app-id>
(if group volume available)/phapplications/<app-id>
(if group volume not available)
PRIMEHUB_APP_BASE_URL
: The url prefix for the application/console/apps/<app-id>
PRIMEHUB_URL
: The external url of PrimeHubPRIMEHUB_GROUP
: The group name
Connect to Application
There are two ways to connect to the application
Connect to a web interface from sub-path of PrimeHub: Most of the applications are web-based applications. To access this kind of application, users can access it from
https://<primehub>/console/apps/<app-id>
from the browser.Connect to a TCP endpoint from the host name and port: For some applications, they provide non-HTTP service. We can access it by the service endpoint
<my-app-svc>:<my-app-port>
. The endpoint can be only accessed in the PrimeHub cluster internally. (like notebooks and jobs)
Application Management
Users can start/stop an application
Users can get the basic information of the application
Users can get the application log
Users can delete an application
Implementation
Principles
Introduce a new CRD
PhApplication
that represents the app to install. It will derive a deployment and service for this app.We use
PhAppTemplate
to createPhApplication
. However,PhApplication
can work standalone without PhAppTemplate. The controller ofPhApplication
should not knowPhAppTemplate
.GraphQL uses
PhAppTemplate
to createPhApplication
. And the template and template data are stored in thePhApplication
's annotation for update use thereafter.
CRDs
PhApplication
annotations:
phapplication.primehub.io/template
: template content used to create thisPhApplication
phapplication.primehub.io/template-data
: template data used to create thisPhApplication
. It is the POST data of the graphqlPhApplication
create.
spec:
podTemplate: the template of pods
svcTemplate: the template of service
scope: "group", "primehub", "public"
httpPort: the backend service port that the proxy should forward to
status:
phase: "Starting", "Ready", "Updating", "Stopping", "Stopped", "Error"
message: human readable message
serviceName: name of service (used for graphql
serviceName
field)
PhAppTemplate
spec:
version: the version string of the template
description: free form description of this applicatin
docLink: the document url
defaultEnvs: used for create the additional envs
name: the name of the environment variable
descsription: description of the variable
defaultValue: the default value of the variable
optional: if the environment is optional
template (the content of phApplication). See the phApplication
NOTE:
Why we copy the content of PhAppTemplate to PhApplication instead of use name ref is we want to decouple the created app from the template.
phapplication.primehub.io/template
is used for GraphQL to use. We keep the template so we can use it thereafter while updating the app.
Control Plane
GraphQL create
PhApplication
resource fromPhAppTemplate
The controller of
PhApplication
reconciles thePhApplication
. The hierarchy is
Create
Console get the template list from GraphQL
Console select one template and list the defaultEnvs to the UI's variables
Console call GraphQL to create phapplication
Get the phapptemplate content
Append env variables to the end of the container's env
Set the scope
Update
Console gets the PhApplication from GraphQL.
GraphQL returns PhApplication user data
phapplication.primehub.io/template-data
and the PhApplication default envs fromphapplication.primehub.io/template
Console can reset the variables
Console can add/remove/update the env vars
Console call update to the GraphQL
GraphQL gets the current PhApplication, and modify container env, scope, instance type.
GraphQL cannot change the appId and template
Controller
Deployment
name
app-<app-id>
Use
spec.podTemplate.spec
Volumes
Add group, data volumes
Add empty dir if no group volume available
Init Container
run as root
mkdir -p $(PRIMEHUB_APP_ROOT)
Container
Keep only the first container
Set resources from instanceType
Prepend (not append) the primehub required envs.
PRIMEHUB_APP_ID
, ,PRIMEHUB_APP_ROOT
andPRIMEHUB_APP_BASE_URL
Mount group, data volumes
Mount empty dir if no group volume available (
/phapplications/<app-id>
)
The created pod should have label
app=primehub-app
primehub.io/phapplication=<appid>
,primehub.io/group: <escaped group>
Service
name
app-<app-id>
Use
spec.svcTemplates.spec
NetworkPolicy
Allows the ingress traffic from
pod label with
primehub.io/group=<escape-group>
primehub-console (for proxy)
status.Phase
The phase of PhApplication
Starting: App is starting, no ready pod and service still not available
deployment.status.readyReplicas==0
Ready: App is ready to use
deployment.status.readyReplicas==1
anddeployment.status.replicas==1
Updating: App is updating, old pod is still ready to use, but new version of app is starting.
deployment.status.readyReplicas==1
anddeployment.status.replicas>1
Stopping: App is stopping, the pod is terminating but resource has not been freed.
spec.stop=true
anddeployment.status.replicas>0
Stopped: App is stopped, the pod is delated. No resource is used.
spec.stop=true
anddeployment.status.replicas==0
We can check by
starting
andupdating
by deployment status
Data Plane
Http Proxy
App path is under https://<primehub>/console/apps/<app-id>
. We validate if the user can access app by server session. The first solution we come out is to validate the traffic by access token. The flow is as follows:
Get the app information of
<app-id>
in the pathIf app is with scope
group only
, it will check if the owner of this access token has permission of the groupIf yes, accept the request and proxy to upstream service
Performance issue:
Access Token is expired about 5 mins. Too short to cache.
To refresh the access token, we need to request Keycloak token endpoint to ask for a new access token.
If the refresh token is expired, we need to go through the OIDC process.
The cache miss rate would be high because we keep changing the access token.
The solution is to implement the "session" concept
If a new connection to the app, it will use the access token to authorize the request by the access token. If the traffic is accepted, create a session.
When the session is created, the console sets a cookie with key
phapplication-session-id
under path/console/apps/<app>
, expired in 30mins, and maintains the session cache on the server side.If the request contains the session cookie and it is found in the session cache. Allows the request to the backend. And it will extend the expiration time to 30mins.
If the session id is not found in the server, authorize the request as step 1
Performance issue:
Because the session can be easily extended, there would be much fewer Keycloak token endpoint requests.
The cache miss rate would be low because the session id is only expired if it is not used for 30 mins.
Log Traffic
Log API
Add a new endpoint (generic log endpoints authorized by group label)
/api/logs/pods/<pod>
GraphQL get the pod and find if there are
primehub.io/group
label and unescaped the group. If not found, rejected.Check the user of the token is the group member of the app
Get the pod log from k8s API
Console & GraphQL
Get the pod list from GraphQL API (reference
PhDeployment
)GraphQL get the pods from the label
primehub.io/phapplication=<appid>
Last updated