Volume Upload

Provide a upload server to upload data to pv type volume.

Configuration

Prerequisite

Required PRIMEHUB_FEATURE_USER_PORTAL true. And PRIMEHUB_DOMAIN be set.

Settings

Please add these variables to the .env file

Install

make release-install-primehub

Migration

Set PRIMEHUB_STORAGE_CLASS env to correct storage class.

Troubleshooting

  • Check Primehub Console Container's Environment Variables

The environment variables should be added automatically. PRIMEHUB_FEATURE_DATASET_UPLOAD will be added to graphql and ui containers when PRIMEHUB_FEATURE_DATASET_UPLOAD is true in your cluster's .env file.

CMS_APP_PREFIX will be added to graphql container.

PRIMEHUB_GROUP_SC will be added to graphql container. This value is based on

groupvolume:
  storageClass: {value}

And if you didn't specify value in yaml, it will be set by PRIMEHUB_STORAGE_CLASS env.

  • Check Issuer

If you are using letsencrypt-prod-dns issuer, your volume upload ingress annotations should contain:

certmanager.k8s.io/acme-challenge-type: "dns01"
certmanager.k8s.io/acme-dns01-provider: "clouddns"
certmanager.k8s.io/cluster-issuer: "letsencrypt-prod-dns"

Design

We use tus protocol to do the resumable file uploads. Backend is tusd. Frontend package is uppy. In order to let user view/edit uploaded files, also have a flask server to view/edit uploaded files. The package to view files is Flask-AutoIndex. Therefore, volume upload deployment contains two containers and both have a mounted pv volume.

Metacontroller is used to automatically create desired resources based on our settings.

Application code is under modules/primehub-dataset-upload. K8s and metacontroller related code is under modules/charts/primehub.

Start/Stop Volume Upload Server

When volume has an annotation dataset.primehub.io/uploadServer: "true", it will start a volume upload server.

Otherwise, it is stopped.

Currently, volume upload url is https://<primehub domain>/admin/dataset/<namespace>/<dataset name>/browse/.

Enable Http Auth to Volume Upload Server

First, need to have a secret which is created by htpasswd. EX:

htpasswd -c auth <name>
kubectl -n hub create secret generic dataset-upload-<name> --from-file=auth

Then add an annotation dataset.primehub.io/uploadServerAuthSecretName: dataset-upload-<name> to enable http auth.

Username is <name>.

Current Post-Finish Hook in Tus Server (Tusd)

  • Make a dir if we need

  • Move .bin to their real file name

  • Remove .info which generated by tusd

  • If it is a zip file, unzip it

Other Notes

  • Cli resumable ability now only handle bad network situations. It dose not handle the situation that user cancel a upload job. (web & cli can't resume interchangeably) (https://github.com/tus/tus-js-client/issues/62)

  • Mechanism to clean up temporary state files.

Cli

  • Download from https://github.com/avvertix/tus-client-cli/releases/tag/v0.3.0

  • ./tus-client-macos upload <filepath> https://<primehub domain>/admin/dataset/<name>/upload/files/

Last updated