Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Kevin-Prichard / csv2db Public

Notifications You must be signed in to change notification settings
Fork 0
Star 2

Code
Issues 2
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Some things to work on #1

Open

5 of 20 tasks

Kevin-Prichard opened this issue Aug 17, 2023 · 0 comments

Open

5 of 20 tasks

Some things to work on #1

Kevin-Prichard opened this issue Aug 17, 2023 · 0 comments

Assignees

Labels

New feature or request

Comments

Copy link

Owner

Kevin-Prichard commented Aug 17, 2023 •

edited

Loading

2023 Aug 16

CSVScanner has needs:
- provide delimiter params so that csv.reader can do its job correctly
  - extend those parameters to the argparse cli options
- should know nothing about any db or DDL dialect: a function or method reference should be passed in for handling the result of CSVScanner.scan()
- break out the progress indicator by shoving it into a progress_fn: Callable callback parameter [8/20]
- provide a progress_interval: int parameter to control the frequency of the progress indicator [8/20]
- row iteration: provide a sample_pct: number which specifies the percentage of rows to check for type or length
  - maybe use self._csv_fh.seek(n) to skip to the next apparent sample row; this might necessitate reinstantiating csv.reader to begin after the next newline
  - or, use a io.TextBuffer to skip rows behind the scenes so that the reader instance doesn't get affected
- if possible, abstract out that this is about CSV or TSV and make scanning any data source feasible, by passing in a class that handles opening, iterating, breaking down the data source, being invoked by CSVScanner to produce a row, a block of rows, which will be processed in .scan(). Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.
csv2db.py
- zip_walker needs to be a class ZipCollection with a base called CSVCollection or something
  - it's the head interface for instantiating CSVScanner, and outputting CSVScanner.result(), so csv.reader parameters need to go here
- create_import_sqlite does a lot of heavy lifting by interfacing with the given DBMS, issuing create table xyz and then inserting rows. Abstractifying some of this would be healthy:
  - sql dialect
  - separate out the create and insert into at least separate methods, but probably separate classes
  - provide the same type of progress_fn: Callable callback and progress_interval: int that csv2db.py provides. progress_interval: number could be a percent, or an every-n-rows sort of event criteria [8/20]
regex filter file pathname & extension from cli [8/20]
logging: offer a log level level setting via argparse
all stdout should be routed to a callback: a caller using only the lib should be responsible for any console or gui output [8/20]

The text was updated successfully, but these errors were encountered:

All reactions

Kevin-Prichard added the enhancement New feature or request label

Kevin-Prichard self-assigned this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

Labels

New feature or request

Projects

None yet

Milestone

No milestone

Development

No branches or pull requests

1 participant

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.