Datashim

Our Framework introduces the Dataset CRD which is a pointer to existing S3 and NFS data sources. It includes the necessary logic to map these Datasets into Persistent Volume Claims and ConfigMaps which users can reference in their pods, letting them focus on the workload development and not on configuring/mounting/tuning the data access. Thanks to Container Storage Interface it is extensible to support additional data sources in the future.

A Kubernetes Framework to provide easy access to S3 and NFS Datasets within pods. Orchestrates the provisioning of Persistent Volume Claims and ConfigMaps needed for each Dataset. Find more details in our FAQ

Quickstart

In order to quickly deploy DLF, based on your environment execute one of the following commands:

Kubernetes/Minikube/kind

kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf.yaml

Kubernetes on IBM Cloud

kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf-ibm-k8s.yaml

Openshift

kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf-oc.yaml

Openshift on IBM Cloud

kubectl apply -f https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf-ibm-oc.yaml

Wait for all the pods to be ready :)

kubectl wait --for=condition=ready pods -l app.kubernetes.io/name=dlf -n dlf

As an optional step, label the namespace(or namespaces) you want in order have the pods labelling functionality (see below).

kubectl label namespace default monitor-pods-datasets=enabled

In case don't have an existing S3 Bucket follow our wiki to deploy an Object Store and populate it with data.

We will create now a Dataset named example-dataset pointing to your S3 bucket.

cat <<EOF | kubectl apply -f -
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: example-dataset
spec:
  local:
    type: "COS"
    accessKeyID: "{AWS_ACCESS_KEY_ID}"
    secretAccessKey: "{AWS_SECRET_ACCESS_KEY}"
    endpoint: "{S3_SERVICE_URL}"
    bucket: "{BUCKET_NAME}"
    readonly: "true" #OPTIONAL, default is false  
    region: "" #OPTIONAL
EOF

If everything worked okay, you should see a PVC and a ConfigMap named example-dataset which you can mount in your pods. As an easier way to use the Dataset in your pod, you can instead label the pod as follows:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    dataset.0.id: "example-dataset"
    dataset.0.useas: "mount"
spec:
  containers:
    - name: nginx
      image: nginx

As a convention the Dataset will be mounted in /mnt/datasets/example-dataset. If instead you wish to pass the connection details as environment variables, change the useas line to dataset.0.useas: "configmap"

Note: We recommend using secrets to pass your S3/Object Storage Service credentials to Datashim, as shown in this example.

Feel free to explore our other examples

Questions

The wiki and Frequently Asked Questions documents are a bit out of date. We recommend browsing the issues for previously answered questions. Please open an issue if you are not able to find the answers to your questions, or if you have discovered a bug.

Contributing

We welcome all contributions to Datashim. Please read this document for setting up a Git workflow for contributing to Datashim. This project uses DCO (Developer Certificate of Origin) to certify code ownership and contribution rights.

If you use VSCode, then we have recommendations for setting it up for development.

If you have an idea for a feature request, please open an issue. Let us know in the issue description the problem or the pain point, and how the proposed feature would help solve it. If you are looking to contribute but you don't know where to start, we recommend looking at the open issues first. Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github/workflows		.github/workflows
build-tools		build-tools
chart		chart
ci		ci
doc		doc
docs		docs
examples		examples
plugins/ceph-cache-plugin		plugins/ceph-cache-plugin
release-tools		release-tools
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datashim

Quickstart

Questions

Contributing

About

Releases

Packages

Languages

License

christian-pinto/datashim

Folders and files

Latest commit

History

Repository files navigation

Datashim

Quickstart

Questions

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages