-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create script to produce schema yaml file.
* Use s3cmd to walk ceph files * Determine partitions by key structure (contains "=") * Download a parquet file from each leaf directory * Load parquet file to obtain columns * Produce table definition yaml file from extracted data
- Loading branch information
1 parent
6657a10
commit c5bac39
Showing
6 changed files
with
414 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
S3_ENDPOINT=endpoint | ||
AWS_ACCESS_KEY=AWS_ACCESS_KEY | ||
AWS_SECRET_KEY=AWS_SECRET_KEY | ||
S3_BUCKET=bucket | ||
S3_BUCKET_PREFIX=data | ||
SCHEMA_NAME=myschema | ||
OUTPUT_FILE=out.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,3 +127,6 @@ dmypy.json | |
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# output file | ||
cost-management.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
[[source]] | ||
url = "https://pypi.org/simple" | ||
verify_ssl = true | ||
name = "pypi" | ||
|
||
[packages] | ||
minio = "*" | ||
pandas = "*" | ||
pyarrow = "*" | ||
s3cmd = "*" | ||
pyyaml = "*" | ||
|
||
[dev-packages] | ||
|
||
[requires] | ||
python_version = "3.9" |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,52 @@ | ||
# ceph_trino_schema_gen | ||
Generate Trino schema from an Ceph S3 bucket | ||
|
||
|
||
# Getting Started | ||
|
||
Start by cloning the repository: | ||
``` | ||
git clone https://github.com/chambridge/ceph_trino_schema_gen.git | ||
``` | ||
|
||
Switch to the new directory: | ||
``` | ||
cd ceph_trino_schema_gen | ||
``` | ||
|
||
Create Python 3.9 virual enviroment: | ||
``` | ||
pipenv --python 3.9 | ||
pipenv install | ||
``` | ||
|
||
Copy and configure connection to your Ceph bucket: | ||
``` | ||
cp .env.example .env | ||
``` | ||
|
||
Enter the virtual env: | ||
``` | ||
pipenv shell | ||
``` | ||
|
||
Execute the python script: | ||
``` | ||
python gen_table_defs.py | ||
``` | ||
|
||
_Note:_ You may encounter the following error with Python 3.9 if the dependency has not been fixed yet: | ||
``` | ||
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'getchildren' | ||
``` | ||
|
||
In order to resolve the problem you need to remove the `.getchildren()` method calls in *s3cmd* locally. | ||
To do this find the location of `s3scmd` in your virtual environment: | ||
``` | ||
which s3cmd | ||
``` | ||
Open a terminal to the python directory listed. Change to the S3 site-package: | ||
``` | ||
cd lib/python3.9/site-packages/S3/ | ||
``` | ||
Remove all occurrences of `.getchildren()` from the code. Now the python script should run properly. |
Oops, something went wrong.