-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement deployment API for pandas and polars #5
Comments
Hi @johnnylarner, like I mentioned yesterday. I already started with this task. I will prep. a first version an state a PR Draft. Based on this we can discuss whether it is what you originally had in mind, and what is missing... :-) |
Hey @johnnylarner , one question. In "3. Read parquet and write CSV can be called" is there a typo? Do we need a |
While implementation we encountered two possibile ways of implementation. Opp. 1: Simple Wrapper Functions in polars.py and pandas.pyFor each method needed we could add a wrapper in ./src/ppp/polars.py and ./src/ppp/pandas.py and call it within our feature_engineering.py script e.g., # % pandas.py
def read_parquet(file_path: str) -> pandas.DataFrame:
return pandas.read_parquet(file_path)
# % polars.py
def read_parquet(file_path: str) -> polars.DataFrame:
return polars.read_parquet(file_path)
# % feature_egineering.py
mod = import_module("ppp." + mod_name)
df = mod.read_parquet_file(parquet_path) Upsides
Downsides
Opp. 2: More complex wrappers that are reusable for several use casesWe write less but more complex wrappers. API-specific parameters are set within the config file. The config file is used to steer the behavior of the wrapper e.g., which public API should be used, etc. The example below suggests the idea of one. # % common.py
def read_file(config):
reader_settings = config["reader_settings"]
module_name = importlib.import_module(reader_settings["module_name"])
reader_method = getattr(module_name, reader_settings["method_name"])
return reader_method(**reader_settings["read_kwargs"])
# % feature_egineering.py
config = load_config(CONFIG_PATH)
df = read_file(config) Upsides
Downsides
DecisionWe go for Opp. 2 and see how it will work. |
Description
The implementation detail of our business logic is located in two modules for
polars
andpandas
. These modules contain the same set of public like functions that can be imported into a script to run data transformations. This common API means we can parameterise our imports at runtime such that we can use one script for both modules.However, in its current implementation the destination script also calls static and instance methods of the two public APIs of
polars
andpandas
. We need to design a wrapper API which:Acceptance criteria
ppp
moduleOut of scope
The text was updated successfully, but these errors were encountered: