Skip to content

Commit

Permalink
Merge pull request #17 from shruthi-raghuraman/add-initial-load-trigg…
Browse files Browse the repository at this point in the history
…er-script

Added initial load script
  • Loading branch information
gagansk authored Nov 20, 2020
2 parents ce2fdd6 + 5d415a2 commit f2b1d1d
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 0 deletions.
45 changes: 45 additions & 0 deletions Documentation/UsingBulkDataTrigger.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Bulk data trigger into Openshift Metering backend

#### Explore database and data structure within OCP
- Set up code ready container and OpenShift Metering
- Connect to Hive with the following command:
```
oc -n openshift-metering exec -it $(oc -n openshift-metering get pods -l app=hive,hive=server -o name | cut -d/ -f2) -c hiveserver2 -- beeline -u 'jdbc:hive2://127.0.0.1:10000/default;auth=noSasl'
```
- View tables and table content within metering database

#### Modify load script to incorporate new data
- Obtain bulk data that matches schema for data within the OCP
- Create INSERT INTO statements for each datapoint and provide a .sql file
- Obtain raw github link for the .sql file (Example github link: https://gist.githubusercontent.com/shruthi-raghuraman/eecda49f0741bf0ea12ce2f766f3a0bd/raw/ffe9b3f8b49cc77d57dc058d4372a494d266d135/namespace_cpu_request_hourly.sql)
- Within load.sh, add the following lines
```sh
#Obtain the bulk data file
oc -n openshift-metering exec -it $(oc -n openshift-metering get pods -l app=hive,hive=server -o name | cut -d/ -f2) -c hiveserver2 -- curl -o <YOUR_INSERT_FILE.sql> <RAW GITHUB LINK>

#Load data into database from within container
oc -n openshift-metering exec -it $(oc -n openshift-metering get pods -l app=hive,hive=server -o name | cut -d/ -f2) -c hiveserver2 -- beeline -u 'jdbc:hive2://127.0.0.1:10000/default;auth=noSasl' -f <YOUR_INSERT_FILE.sql>
```
#### Run from within crc container
- Using the primary readme, set up code ready containers with openshift metering
- Create the targeted data report utilizing yaml files provided (can create custom yaml files as well). For example, for namespace-cpu-request-daily.yaml, utilize the following command:
```
oc create -f openshift-meteCring-templates/reports-templates/namespace-cpu-request-daily.yaml
```
- Run load script to access database and load bulk insert statements within the table
```
./load.sh
```

#### Creating .sql INSERT INTO statements for bulk data
This can be done in many different ways. The following is a quick method to convert data points into INSERT statements that minimizes manual effort.
- Set up DataGrip (these steps follow DataGrip but any other database management tool works) https://www.jetbrains.com/help/datagrip/quick-start-with-datagrip.html#step-3-write-your-code
- Ensure bulk data collection is in csv file (if it is in json convert to csv https://json-csv.com/)
- Once local database connection is set up, right click on schema -> Import Data From File -> Select .csv file -> OK
- In the next menu, ensure column names and type are similar to how they appear within Openshift Metering database. Utilize the Data Preview shown for this and click Import.
- Use built in extractor to extract and export data as INSERT INTO STATEMENTS (https://www.jetbrains.com/help/datagrip/export-data-in-ide.html#built-in-extractors)
- Relabel table name within INSERT statement to reflect OCP metering tables. For example: for namespace-cpu-request-daily one insert statement should look like this
```
INSERT INTO metering.`report_openshift_metering_namespace_cpu_request_daily` (namespace, period_end, period_start, pod_request_cpu_core_seconds) VALUES ('openshift-apiserver', '2020-10-15 14:00:00', '2020-10-15 13:00:00', 6);
```
- This method outputs individual INSERT INTO statements that can be turned into one insert statement for all points.
14 changes: 14 additions & 0 deletions load.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env bash


#Obtain bulk data file: namespace_cpu_request_daily
oc -n openshift-metering exec -it $(oc -n openshift-metering get pods -l app=hive,hive=server -o name | cut -d/ -f2) -c hiveserver2 -- curl -o namespace_cpu_request_daily.sql https://gist.githubusercontent.com/shruthi-raghuraman/2af7b6892f3fec1d030ff11777ac2cea/raw/fad4da0880c9812d813a3dbe8641121f7edee594/namespace_cpu_request_daily.sql

#Obtain bulk data file: namespace_cpu_request_hourly
oc -n openshift-metering exec -it $(oc -n openshift-metering get pods -l app=hive,hive=server -o name | cut -d/ -f2) -c hiveserver2 -- curl -o namespace_cpu_request_hourly.sql https://gist.githubusercontent.com/shruthi-raghuraman/eecda49f0741bf0ea12ce2f766f3a0bd/raw/ffe9b3f8b49cc77d57dc058d4372a494d266d135/namespace_cpu_request_hourly.sql

#Load data into database from within container
oc -n openshift-metering exec -it $(oc -n openshift-metering get pods -l app=hive,hive=server -o name | cut -d/ -f2) -c hiveserver2 -- beeline -u 'jdbc:hive2://127.0.0.1:10000/default;auth=noSasl' -f namespace_cpu_request_daily.sql

#Load data into database from within container
oc -n openshift-metering exec -it $(oc -n openshift-metering get pods -l app=hive,hive=server -o name | cut -d/ -f2) -c hiveserver2 -- beeline -u 'jdbc:hive2://127.0.0.1:10000/default;auth=noSasl' -f namespace_cpu_request_hourly.sql

0 comments on commit f2b1d1d

Please sign in to comment.