Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add section on how to setup testing workspace #12

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

surister
Copy link
Collaborator

Summary of the changes / Why this is an improvement

Checklist

  • Link to issue this PR refers to (if applicable): Fixes #???

@surister surister marked this pull request as ready for review February 10, 2025 16:01
@surister surister requested review from amotl and kneth February 10, 2025 16:01
Copy link
Member

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 👍

NB: I've added a few comments. As always please take them as suggestions when applicable, and skip those which are silly. Just on the __name__ == "__main__" (see below), I do have very strong opinions ;].

Copy link
Member

@amotl amotl Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the previous patch, the pyproject.toml has been removed, while I thought it would have been a good idea to keep it, in order to bundle all dependencies. I only argued against using Poetry, not for removing the pyproject.toml file ;].

In this spirit, I am proposing to reintroduce it, or alternatively a basic requirements.txt, which includes all the needed dependencies. Considering the code in sight, the list of requirements seem to be those:

polars
requests
sqlalchemy-cratedb
tdvt @ git+https://github.com/tableau/connector-plugin-sdk/#subdirectory=tdvt

Now that the environment is set up, we need to also set up the tableau test files and data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now that the environment is set up, we need to also set up the tableau test files and data.
Now that the environment is set up, we need to also set up the Tableau test files and data.

can follow the official tdvt docs or continue reading and do step by step.

Note: There is also a [50 minute video by the tableau team ](https://www.youtube.com/watch?v=rAgnnByJIJA) .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note: There is also a [50 minute video by the tableau team ](https://www.youtube.com/watch?v=rAgnnByJIJA) .
Note: There is also a [50 minute video by the tableau team](https://www.youtube.com/watch?v=rAgnnByJIJA).

## Set up CrateDB.

The quickest way to setup CrateDB is with docker https://cratedb.com/docs/guide/install/container/index.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The quickest way to setup CrateDB is with docker https://cratedb.com/docs/guide/install/container/index.html
The quickest way to set up CrateDB is by [using Docker](https://cratedb.com/docs/guide/install/container/).


1. Go to `./tests/config/tdvt/tdvt_override.ini` and put the path of your installation's `tabquerytool.exe`, for example
for my in windows it is: `C:\Program Files\Tableau\Tableau 2024.3\bin\tabquerytool.exe`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for my in windows it is: `C:\Program Files\Tableau\Tableau 2024.3\bin\tabquerytool.exe`
on a Windows machine it is: `C:\Program Files\Tableau\Tableau 2024.3\bin\tabquerytool.exe`

1. Go to `./tests/config/tdvt/tdvt_override.ini` and put the path of your installation's `tabquerytool.exe`, for example
for my in windows it is: `C:\Program Files\Tableau\Tableau 2024.3\bin\tabquerytool.exe`
2. [Download](https://github.com/tableau/connector-plugin-sdk/tree/master/tests/datasets/TestV1) and Load the table `calcs` and `staples` to your CrateDB instance. You can use the script in `./data/setup_data.py`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. [Download](https://github.com/tableau/connector-plugin-sdk/tree/master/tests/datasets/TestV1) and Load the table `calcs` and `staples` to your CrateDB instance. You can use the script in `./data/setup_data.py`,
2. [Download datasets](https://github.com/tableau/connector-plugin-sdk/tree/master/tests/datasets/TestV1) and load the table `calcs` and `staples` into your CrateDB instance. You can use the script in `./data/setup_data.py`,

Comment on lines +1 to +49
import pathlib
import logging

import polars
import requests

logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO)

CRATE_DB = 'HOST:PORT' # Examples: localhost:4200, 192.168.88.21:4200
CRATEDB_HTTP = f'http://{CRATE_DB}/_sql'
CRATEDB_SQLALCHEMY = f'crate://{CRATE_DB}'
DATA_URL = pathlib.Path(__file__).parent.parent / 'data'

FILES_URL = [
{'name': 'Calcs',
'url': 'https://raw.githubusercontent.com/tableau/connector-plugin-sdk/'
'refs/heads/master/tests/datasets/TestV1/Calcs_headers.csv'},
{'name': 'Staples',
'url': 'https://raw.githubusercontent.com/tableau/connector-plugin-sdk/'
'refs/heads/master/tests/datasets/TestV1/Staples_utf8_headers.csv'}
]

# Create tables.
for file in DATA_URL.iterdir():
if file.suffix == '.sql':
statement = file.read_text()
response = requests.post(CRATEDB_HTTP,
json={'stmt': statement})
logging.info(f'Tried creating {file.name}, response from CrateDB: {response.text}')

# Download data and load it to CrateDB.
for file in FILES_URL:
result = requests.get(file.get('url'))
logging.info(f'Downloading {file.get('url')}')

df = polars.read_csv(result.content, use_pyarrow=True)

# There is a bug where an empty space is added to 'Market Segment' column name.
# Easiest solution is just to try to rename it if it exists.
try:
df.get_column('Market Segment ')
df = df.rename({'Market Segment ': 'Market Segment'})
except polars.exceptions.ColumnNotFoundError:
pass

logging.info(f'Loading to CrateDB {file.get('name')!r}')
df.write_database(file.get('name'),
connection=CRATEDB_SQLALCHEMY,
if_table_exists='append')
Copy link
Member

@amotl amotl Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please never put Python code into a file like this, unless a framework demands you to do so. Instead, always use the canonical Python incantation idiom:

if __name__ == "__main__":
    main()

Copy link

@kneth kneth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl mentions it all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants