diff --git a/docs/awards-recognition.md b/docs/awards-recognition.md new file mode 100644 index 0000000..1cc3d6a --- /dev/null +++ b/docs/awards-recognition.md @@ -0,0 +1,22 @@ +# Awards and Recognition +### 2023 +#### SK TECH SUMMIT 2023 IMAGE를 제대로 이해하는 ‘AI’는 없나요? (Vision-Language Model을 활용한 SKT만의 Vision Data Asset 구축기) +- Speaker: Sungho Kim, Jiwon Ryu(SK Telecom) + + +### 2022 +#### NVIDIA GTC 22 Vision data warehouse and accelerating the analytics for massive vision data +- Speaker: Sungho Kim(SK Telecom), Allen Xu(NVIDIA) + + +### 2021 +#### NVIDIA GTC 21 Deep-Learning Data-Pipeline Optimization for Network Data Analysis in SK Telecom by Employing Spark Rapids for Custom Data Source +- Speaker: Dooyoung Hwan(SK Telecom), Thomas Graves(NVIDIA) + +### 2020 +#### Spark AI Summit 2020 Vectorized Deep Learning Acceleration from Preprocessing to Inference and Training on Apache Spark in SK Telecom +- Speaker: Hongchan Roh(SK Telecom) + +### 2019 +#### Spark AI Summit Europe 2019 Spark AI Usecase in Telco: Network Quality Analysis and Prediction with Geospatial Visualization +- Speaker: Hongchan Roh, Dooyoung Hwang(SK Telecom) \ No newline at end of file diff --git a/docs/build-lightningdb-on-k8s.md b/docs/build-lightningdb-on-k8s.md index 707257d..80a1b82 100644 --- a/docs/build-lightningdb-on-k8s.md +++ b/docs/build-lightningdb-on-k8s.md @@ -1,3 +1,183 @@ -# Build LightningDB(Admin Only) +# Build 'LightningDB' (Admin Only) -todo... \ No newline at end of file +## 1. LightningDB Source Code(Private Repository) +``` +$ git clone https://github.com/mnms/LightningDB +``` + +## 2. Build +### - v1 +- Branch: release/flashbase_v1.4.3 +- Commands: + +``` +$ ./build.sh compile +``` +``` +$ cd nvkvs +$ docker build . -t harbor.k8s.lightningdb/ltdb/nvkvs:v1.4.3 +$ docker push harbor.k8s.lightningdb/ltdb/nvkvs:v1.4.3 +``` + +### - v2 +- Branch: release/flashbase_v2.0.0 +- Commands: +``` +$ ./build.sh compile debug +``` +``` +$ cd nvkvs +$ docker build . -t harbor.k8s.lightningdb/ltdb/nvkvs:v2.0.0 +$ docker push harbor.k8s.lightningdb/ltdb/nvkvs:v2.0.0 +``` + +### - v2 CXL-CMS +- Branch: cms-integration +- Prerequisite(install daxctl): + +``` +$ yum install -y kmod-devel rubygem-asciidoctor.noarch iniparser-devel.x86_64 meson.noarch +``` + +``` +// json-c (version: json-c-0.14-20200419) + +$ git clone https://github.com/json-c/json-c.git +$ cd json-c +$ git checkout json-c-0.14-20200419 -b json-c-0.14-20200419 +$ mkdir json-c-build +$ cd json-c-build/ +$ cmake ../ +$ make -j48 +``` + +``` +//Required Min Version: v75 (latest version: v78) + +$ git clone https://github.com/pmem/ndctl +$ git checkout v75 -b v75 +$ meson setup build; +$ meson compile -C build; +$ meson install -C build; +``` + +- Commands: + +``` +$ ./build.sh compile debug +``` +``` +// dax-ctl 을 이미지 base 경로에 설치 필요 +// 컴파일 작업 디렉토리가 "/root/ldb/LightningDB_v2_cxl/nvkvs/debug/output"라 가정 +// ndctl github 컴파일 디렉토리로 이동 + +$ cd ndctl +$ rm -rf build +$ meson -Drootprefix=/root/ldb/LightningDB_v2_cxl/nvkvs/debug/output -Dlibdir=/root/ldb/LightningDB_v2_cxl/nvkvs/debug/output/lib build -Dprefix=/root/ldb/LightningDB_v2_cxl/nvkvs/debug/output +$ meson compile -C build; +$ meson install -C build; +``` +``` +$ cd nvkvs +$ docker build . -t harbor.k8s.lightningdb/ltdb/nvkvs:v2-cms-integration +$ docker push harbor.k8s.lightningdb/ltdb/nvkvs:v2-cms-integration +``` + +!!! Tip + How to use maximum cores to compile (e.g. max cpu core:56) + + In 'build.sh', use `cmake --build . --target install -- -j56` and `mvn clean install -DskipTests -P $RELEASE_MODE $MAVEN_OPTS -T 56` + + +# Build 'ltdb-http API Server' (Admin Only) + +## 1. ltdb-http Source Code(Private Repository) +``` +$ git clone https://github.com/mnms/ltdb-http +``` + +## 2. Build +### - v1 +- Branch: develop +- Commands: + +``` +$ mvn clean package -DskipTests -P release-k8s,dist-k8s,tgz -Dsite-spec=k8s -Dk8s.namespace=metavision +$ cd target-k8s +$ tar xzvf ltdb-http-1.0-k8s-xxx_xxx.tar.gz +$ cd ltdb-http +$ docker build . -t harbor.k8s.lightningdb/ltdb/ltdb-http:develop +$ docker push harbor.k8s.lightningdb/ltdb/ltdb-http:develop +``` + +### - v2 / v2 CXL-CMS +- Branch: develop-v2 +- Commands: +``` +$ mvn clean package -DskipTests -P release-k8s,dist-k8s,tgz -Dsite-spec=k8s -Dk8s.namespace=metavision +$ cd target-k8s +$ tar xzvf ltdb-http-1.0-k8s-xxx_xxx.tar.gz +$ cd ltdb-http +$ docker build . -t harbor.k8s.lightningdb/ltdb/ltdb-http:develop-v2 +$ docker push harbor.k8s.lightningdb/ltdb/ltdb-http:develop-v2 +``` + + +# Build 'Thunderquery API Server' (Admin Only) + +## 1. Thunderquery Source Code(Private Repository) +``` +$ git clone https://github.com/mnms/thunderquery_api +$ git clone https://github.com/mnms/thunderquery-cli +``` + +## 2. Build +- Branch: develop +- Prerequisite(install musl-gcc): + +``` +$ yum install -y kmod-devel rubygem-asciidoctor.noarch iniparser-devel.x86_64 meson.noarch +``` + +``` +$ vi /etc/yum.repos.d/cert-forensics-tools.repo + +[cert-forensics-tools] +name=Cert Forensics Tools Repository +baseurl=https://forensics.cert.org/centos/cert/8/x86_64/ +enabled=1 +gpgcheck=1 +gpgkey=https://forensics.cert.org/forensics.asc + +$ yum clean all +$ yum makecache +$ yum install musl-gcc.x86_64 +``` + +- Register public key to github +``` +$ cat ~/.ssh/id_rsa.pub +``` + +- Command: +``` +$ vi ~/.cargo/config.toml + +[net] +git-fetch-with-cli = true + +$ cd thunderquery_api +$ cargo install --path . --target=x86_64-unknown-linux-musl +$ cd thunderquery-cli +$ cargo install --path . --target=x86_64-unknown-linux-musl + +``` +``` +$ cd thunderquery_api + +## thunderquery-cli binary 를 api 디렉토리로 복사 ## +$ cp ../thunderquery-cli/target/x86_64-unknown-linux-musl/release/thunderquery-cli target/x86_64-unknown-linux-musl/release + +$ docker build . -t harbor.k8s.lightningdb/ltdb/thunderquery_api:develop +$ docker push harbor.k8s.lightningdb/ltdb/thunderquery_api:develop +``` \ No newline at end of file diff --git a/docs/deploy-lightningdb-on-k8s.md b/docs/deploy-lightningdb-on-k8s.md index 206ca25..0309aa2 100644 --- a/docs/deploy-lightningdb-on-k8s.md +++ b/docs/deploy-lightningdb-on-k8s.md @@ -1,3 +1,140 @@ -# Deploy LightningDB +# Deploy LightningDB and API Server -todo ... \ No newline at end of file +## 1. Kubernetes manifest github +``` +$ git clone https://github.com/mnms/metavision2_k8s_manifests +``` + +## 2. Install LightningDB v1 +- Install ltdb-operator +``` +$ cd ltdb-operator +$ kubectl create -f ltdb-operator-controller-manager.yaml +``` + +- deploy LightningDB with CRD +``` +$ cd ltdb +$ kubectl create -f ltdb.yaml -n {namespace} +``` + +- 참조 + - size / storageClass / maxMemory 등 통해 클러스터 설정 조정 + - AOF/RDB 는 디폴트 옵션 on + - fs 내 redis/rocksdb mount 위치 + - /tmp-redis_rocksdb_integration_0: redis's aof/rdb, conf + - /tmp-redis_rocksdb_integration_1: rocksdb's sst/wal + - rdb 명시적 명령: bash flashbase cli-all bgsave + - on-premise 경우, 아래 URL 처럼 system 튜닝이 들어감. k8s 운영 상황에서도 튜닝 여부 확인 필요 + - https://docs.lightningdb.io/get-started-with-scratch/ + - 삭제 + - STS 는 PVC 가 자동으로 삭제되지 않으므로 완전히 삭제하려면 해당 PVC 수동 삭제 필요 +``` +$ kubectl delete -f ltdb.yaml +or +$ kubectl delete ltdb ltdb -n metavision +$ for i in {0..39}; do kubectl delete pvc "ltdb-data-logging-ltdb-$i" -n metavision; done +$ for i in {0..39}; do kubectl delete pvc "ltdb-data-ltdb-$i" -n metavision; done +``` + + +## 3. Install LightningDB v2 / Thunderquery + +``` +$ cd ltdbv2 +$ kubectl create -f ltdbv2-all-in-one.yaml +$ kubectl -n metavision exec -it ltdbv2-0 -- redis-cli --cluster-yes --cluster create `kubectl -n metavision get po -o wide -l app=ltdbv2 | grep ltdbv2 | awk '{print $6":6379"}' | tr '\n' ' '` +``` + +- 참조 + - Operator 없이 수동 설치 + - namespace 가 metavision 으로 명시적으로 되어 있음. namespace 를 바꾸고 싶으면 해당 부분 수정 + - 최신 버전은 ann 을 사용한다 하더라도 maxmemory-policy 를 noeviction 으로 바꿀 필요 없이 eviction rule 정상 작동하면서 사용하면 됨 + - AOF/RDB 는 디폴트 옵션 on + - fs 내 redis/rocksdb mount 위치 + - /tmp/redis: redis's aof/rdb, conf, rocksdb's sst/wal + - rdb 명시적 명령: flashbase cli-all bgrewriteaof + - 삭제 + - STS 는 PVC 가 자동으로 삭제되지 않으므로 완전히 삭제하려면 해당 PVC 수동 삭제 필요 +``` +$ kubectl delete -f ltdbv2-all-in-one.yaml +$ for i in {0..99}; do kubectl delete pvc "ltdbv2-pvc-ltdbv2-$i" -n metavision; done +``` + + +## 4. Install ltdb-http v1 +``` +$ cd ltdb-http +$ ls -alh +total 32 +drwxr-xr-x 6 1111462 1437349805 192B 8 31 17:53 . +drwxr-xr-x 11 1111462 1437349805 352B 8 31 17:54 .. +-rw-r--r-- 1 1111462 1437349805 1.3K 8 31 17:53 ltdb-http-configmap.yaml +-rw-r--r-- 1 1111462 1437349805 1.5K 8 31 17:53 ltdb-http.yaml +-rw-r--r-- 1 1111462 1437349805 259B 8 31 17:53 pvc.yaml +-rw-r--r-- 1 1111462 1437349805 342B 8 31 17:53 spark-rbac.yaml +``` + +- ltdb-http.yaml만 가장 나중에 apply +``` +kubectl -n metavision apply -f ltdb-http-configmap.yaml +kubectl -n metavision apply -f spark-rbac.yaml +kubectl -n metavision apply -f pvc.yaml + +kubectl -n metavision apply -f ltdb-http.yaml // 가장 나중에... +``` + +## 5. Install ltdb-http v2 +- 참조: https://www.notion.so/ltdb/LTDB-HTTP-V2-0-K8S-b47ad5741e9a43668c7bee4d40e1616e?pvs=4 +- 아이스버그 사용 안할 시, ltdb-postgresql.yaml 제외 가능 +- namespace 가 metavision 으로 명시적으로 되어 있음. namespace 를 바꾸고 싶으면 해당 부분 수정 +- s3 기능을 사용하고 싶으면, app/s3-secret.yaml 설치 필요 (분당 9층 TB에는 이미 설치 됨) +- s3 region 은 기본값으로 ap-northeast-2 설정 됨 + +``` +$ cd ltdbv2-http +$ kubectl create -f ltdb-http-configmap.yaml +$ kubectl create -f ltdb-http.yaml +$ kubectl create -f ltdbv2-http-vs.yaml +``` + +- 삭제 +``` +$ kubectl delete -f ltdbv2-http-vs.yaml +$ kubectl delete -f ltdb-http.yaml +$ kubectl delete -f ltdb-http-configmap.yaml +``` + + +## 6. Install ltdb-http v2 CXL-CMS +``` +$ cd hynix +$ kubectl create -f ltdbv2.yaml +$ kubectl -n hynix exec -it ltdbv2-0 -- redis-cli --cluster-yes --cluster create `kubectl -n hynix get po -o wide -l app=ltdbv2 | grep ltdbv2 | awk '{print $6":6379"}' | tr '\n' ' '` +$ kubectl create -f thunderquery.yaml +$ kubectl create -f ltdbv2-http.yaml +$ kubectl create -f istio-ingress.yaml +``` + +- 참조 + - cxl-cms 에서 추가 된 config 값은 아래 같으며, cxl-cms dev 용 CSI 드라이버가 없기 때문에 STS 에서 수동으로 pod 개수 및 Node Affinity 설정 하면서 테스트 해야 함 + - dax-device-name /dev/xxx, cms-device-name /dev/yyy 형태로 잡아짐 + +``` +$ vi ltdbv2.yaml +... +cms-enabled no +dax-device-name no +cms-device-name no +``` + +- 삭제 + - STS 는 PVC 가 자동으로 삭제되지 않으므로 완전히 삭제하려면 해당 PVC 수동 삭제 필요 +``` +$ cd hynix +$ kubectl delete -f ltdbv2-http.yaml +$ kubectl delete -f thunderquery.yaml +$ kubectl delete -f ltdbv2.yaml +for i in {0..9}; do kubectl delete pvc "ltdbv2-pvc-ltdbv2-$i" -n hynix; done +$ kubectl delete -f istio-ingress.yaml +``` \ No newline at end of file diff --git a/docs/how-to-use-lightningdb-on-k8s.md b/docs/how-to-use-lightningdb-on-k8s.md new file mode 100644 index 0000000..a982126 --- /dev/null +++ b/docs/how-to-use-lightningdb-on-k8s.md @@ -0,0 +1,166 @@ +# How to use LightningDB on Kubernetes + +### 1. ltdb-http v2 - thrift beeline +``` +kubectl -n metavision exec -it pod/ltdbv2-beeline-857f578cd9-d7kc4 -- beeline.sh + +0: jdbc:hive2://ltdbv2-http-svc:13000> select * from files limit 3; + +``` + +- Create table(Do not use ANN) +``` +CREATE TABLE IF NOT EXISTS ltdb.metavision.img_feats_noann( +id BIGINT, +is_s3 BOOLEAN, +bucket STRING, +obj_key STRING, +features ARRAY, +meta STRING +) USING lightning +LOCATION '127.0.0.1:18500' +TBLPROPERTIES ('partition.size'=2, 'partition.0'='bucket','partition.1'='id'); +``` + +- Create table for ANN +``` +CREATE TABLE IF NOT EXISTS ltdb.metavision.img_feats( +id BIGINT, +is_s3 BOOLEAN, +bucket STRING, +obj_key STRING, +features ARRAY, +meta STRING +) USING lightning +LOCATION 'ltdbv2:6379' +TBLPROPERTIES ('partition.size'=2, 'partition.0'='bucket','partition.1'='id', 'feature_idx'='4', 'ann_type'='1', 'feature_dim'='1024', 'ef_construction'='500', 'ann_max_elem'='10000', 'ann_m'='20'); +``` + +- Ingest ANN data (from parquet) + - parquet 을 받아줄 임시 테이블 생성 +``` +CREATE TABLE IF NOT EXISTS ltdb.parquet.temptable( +id BIGINT, +is_s3 BOOLEAN, +bucket STRING, +obj_key STRING, +features ARRAY, +meta STRING +) USING parquet LOCATION 's3a://upload-data/real/vision-ai-private-data_6.csv.ViT-H-14.laion2b_s32b_b79k.975.parquet'; +``` + +- Insert data +``` +INSERT INTO ltdb.metavision.img_feats +SELECT +(CAST(RANDOM() * 1000000 AS INTEGER) % 400) AS id, +is_s3, +CONCAT('metavision-', bucket) AS bucket, +obj_key, +features, +meta +FROM +ltdb.parquet.temptable +LIMIT 100; +``` + +- Query data +``` +SELECT * FROM ltdb.metavision.img_feats; +SELECT count(obj_key) FROM ltdb.metavision.img_feats; +``` + +- Describe table +``` +DESCRIBE formatted ltdb.metavision.img_feats; +``` + +- Drop table +``` +DROP TABLE IF EXISTS ltdb.parquet.temptable; +DROP TABLE IF EXISTS ltdb.metavision.img_feats; +``` + + +### 2. Thunderquery CLI tool +``` +kubectl -n metavision exec -it thunderquery-68544ff5f7-9shjv -- thunderquery-cli ltdbv2-0.ltdbv2 +``` + +- ANN command +``` +select bucket, obj_key, ann(features, [-0.009953999, -0.0006904541, -0.006250763, -0.009839512, 0.012631393, 0.024262842, -0.029540457, -0.01707404, 0.0061618676, 0.029112583, ... , -0.011023628]) as ann_result from ltdb.metavision.img_feats limit 2; +``` + +- KNN command +``` +select bucket, obj_key, euclideandistance(features, [-0.009953999, -0.0006904541, -0.006250763, -0.009839512, 0.012631393, 0.024262842, -0.029540457, -0.01707404, 0.0061618676, 0.029112583, ... , -0.011023628]) as knn_result from ltdb.metavision.img_feats limit 2; +``` + + +### 3. REST API +- Create table +``` +$ curl --location --request POST http://metavision.k8s.lightningdb/ltdbv2-http/ingest/table \ +--header "Content-Type: text/plain" \ +--data "{ +'table': 'ltdb.metavision.img_feats', +'schema': [{'name': 'id', 'typ': 'BIGINT'}, +{'name': 'is_s3', 'typ': 'BOOLEAN'}, +{'name': 'bucket', 'typ': 'STRING'}, +{'name': 'obj_key', 'typ': 'STRING'}, +{'name': 'features', 'typ': 'ARRAY'}, +{'name': 'meta', 'typ': 'STRING'}], +'loc': 'ltdbv2:6379', +'props': [{'key': 'partition.size', 'val': '2'}, +{'key': 'partition.0', 'val': 'bucket'}, +{'key': 'partition.1', 'val': 'id'}, +{'key': 'feature_idx', 'val': '4'}, +{'key': 'ann_type', 'val': '1'}, +{'key': 'feature_dim', 'val': '1024'}, +{'key': 'ef_construction', 'val': '500'}, +{'key': 'ann_max_elem', 'val': '10000'}, +{'key': 'ann_m', 'val': '20'}] +}" +``` + +- Ingest ANN data( from parquet) +``` +$ curl --location --request POST http://metavision.k8s.lightningdb/ltdbv2-http/ingest/data \ +--header "Content-Type: text/plain" \ +--data "{ +'src_format': 'parquet', +'src_loc': 's3a://upload-data/real/vision-ai-private-data_6.csv.ViT-H-14.laion2b_s32b_b79k.975.parquet', +'dest_table': 'ltdb.metavision.img_feats', +'limit': 100, +'src_cols_with_random': [{'name': 'id', 'range': 400}], +'src_cols_to_modify': [{'name': 'bucket', 'prefix': 'metavision-'}] +}" +``` + +- Query data +``` +$ curl --location --request POST http://metavision.k8s.lightningdb/ltdbv2-http/query \ +--header "Content-Type: text/plain" \ +--data "SELECT count(obj_key) FROM ltdb.metavision.img_feats" +``` + +- Describe table +``` +$ curl --location --request GET http://metavision.k8s.lightningdb/ltdbv2-http/ingest/table/ltdb.metavision.img_feats +``` + +- Drop table +``` +$ curl --location --request DELETE http://metavision.k8s.lightningdb/ltdbv2-http/ingest/table/ltdb.metavision.img_feats +``` + +- ANN command +``` +$ curl -d 'select bucket, obj_key, ann(features, [-0.009953999, -0.0006904541, -0.006250763, -0.009839512, 0.012631393, 0.024262842, -0.029540457, -0.01707404, 0.0061618676, 0.029112583, ... , -0.011023628]) as ann_result from ltdb.metavision.img_feats limit 2;' http://metavision.k8s.lightningdb/thunderquery/sql +``` + +- KNN command +``` +$ curl -d 'select bucket, obj_key, euclideandistance(features, [-0.009953999, -0.0006904541, -0.006250763, -0.009839512, 0.012631393, 0.024262842, -0.029540457, -0.01707404, 0.0061618676, ... , -0.011023628]) as ann_result from ltdb.metavision.img_feats limit 2;' http://metavision.k8s.lightningdb/thunderquery/sql +``` \ No newline at end of file diff --git a/docs/images/curr-build-env.png b/docs/images/curr-build-env.png new file mode 100644 index 0000000..a84047d Binary files /dev/null and b/docs/images/curr-build-env.png differ diff --git a/docs/images/ltdb-deployment-env.png b/docs/images/ltdb-deployment-env.png new file mode 100644 index 0000000..62b9b05 Binary files /dev/null and b/docs/images/ltdb-deployment-env.png differ diff --git a/docs/why-k8s.md b/docs/why-k8s.md new file mode 100644 index 0000000..743e5df --- /dev/null +++ b/docs/why-k8s.md @@ -0,0 +1,12 @@ +# Why Kubernetes? + +### Legacy system +- As we work with multiple companies/organizations, deployment environments have diversified to cloud, Kubernetes, on-premise, etc. and OS requirements have also diversified, requiring us to be agile and responsive. + +![curr-build-env](./images/curr-build-env.png) + + +### On Kubernetes +- We built these environments to help you create versions in a consistently optimized environment across different environments. + +![deployment-env](./images/ltdb-deployment-env.png) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index f4bb742..2b42e8b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -12,6 +12,7 @@ nav: - Introduction: - Overview: index.md - Release Notes: release-note.md + - Awards and Recognition: awards-recognition.md - Getting Started: - Prerequisite: prerequisite.md @@ -36,8 +37,10 @@ nav: - 'KNN SEARCH': zeppelin-example-face-data.md - Build and deploy on Kubernetes: + - Why Kubernetes?: why-k8s.md - Build LightningDB(Admin Only): build-lightningdb-on-k8s.md - Deploy LightningDB: deploy-lightningdb-on-k8s.md + - How to use LightningDB: how-to-use-lightningdb-on-k8s.md - Additional Resources: - Manual Installation: get-started-with-scratch.md