Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create database updating function #655

Open
gregpawin opened this issue May 13, 2024 · 3 comments
Open

Create database updating function #655

gregpawin opened this issue May 13, 2024 · 3 comments

Comments

@gregpawin
Copy link
Member

gregpawin commented May 13, 2024

Create function that uses the city data API to update a local database of parking citations. This will be used to update the citation database on a daily basis instead of downloading the whole database every time.

@parcheesime
Copy link
Member

Utilizing a small sample, $order and ticket_number, I have implemented a function designed to integrate new ticket data with existing records. This function operates by first identifying the most recent ticket number in a current sample. It then fetches additional ticket records that have been issued since the last recorded ticket, ensuring no overlaps.

The integration process utilizes a predefined schema to standardize the incoming data, aligning new entries with the established data structure of the existing dataset. This schema specifies the expected fields and also sets default values for any missing data. Once the new data is fetched and normalized according to our schema, it is merged into the existing dataset, adding a specified number of new records — for example, the next five new tickets.

I'll try this function with an existing database this coming week.

@parcheesime
Copy link
Member

Using the function, I increased the sample size. No duplicates so far. I began transferring the initial tickets into DuckDB and tesiting incrementally loading data into an in-memory DuckDB. I will do EDA on the data, check for duplicates, and give an update next week.

@parcheesime
Copy link
Member

parcheesime commented Aug 20, 2024

I updated the Merge function for adding more, distinct, parking tickets to an existing dataset.
https://github.com/parcheesime/parking-tickets-app/blob/main/test_merge.ipynb
@gregpawin please code review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New Issues
Development

No branches or pull requests

3 participants