Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some things to work on #1

Open
5 of 20 tasks
Kevin-Prichard opened this issue Aug 17, 2023 · 0 comments
Open
5 of 20 tasks

Some things to work on #1

Kevin-Prichard opened this issue Aug 17, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@Kevin-Prichard
Copy link
Owner

Kevin-Prichard commented Aug 17, 2023

2023 Aug 16

  • CSVScanner has needs:
    • provide delimiter params so that csv.reader can do its job correctly
      • extend those parameters to the argparse cli options
    • should know nothing about any db or DDL dialect: a function or method reference should be passed in for handling the result of CSVScanner.scan()
    • break out the progress indicator by shoving it into a progress_fn: Callable callback parameter [8/20]
    • provide a progress_interval: int parameter to control the frequency of the progress indicator [8/20]
    • row iteration: provide a sample_pct: number which specifies the percentage of rows to check for type or length
      • maybe use self._csv_fh.seek(n) to skip to the next apparent sample row; this might necessitate reinstantiating csv.reader to begin after the next newline
      • or, use a io.TextBuffer to skip rows behind the scenes so that the reader instance doesn't get affected
    • if possible, abstract out that this is about CSV or TSV and make scanning any data source feasible, by passing in a class that handles opening, iterating, breaking down the data source, being invoked by CSVScanner to produce a row, a block of rows, which will be processed in .scan(). Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.
  • csv2db.py
    • zip_walker needs to be a class ZipCollection with a base called CSVCollection or something
      • it's the head interface for instantiating CSVScanner, and outputting CSVScanner.result(), so csv.reader parameters need to go here
    • create_import_sqlite does a lot of heavy lifting by interfacing with the given DBMS, issuing create table xyz and then inserting rows. Abstractifying some of this would be healthy:
      • sql dialect
      • separate out the create and insert into at least separate methods, but probably separate classes
      • provide the same type of progress_fn: Callable callback and progress_interval: int that csv2db.py provides. progress_interval: number could be a percent, or an every-n-rows sort of event criteria [8/20]
  • regex filter file pathname & extension from cli [8/20]
  • logging: offer a log level level setting via argparse
  • all stdout should be routed to a callback: a caller using only the lib should be responsible for any console or gui output [8/20]
@Kevin-Prichard Kevin-Prichard added the enhancement New feature or request label Aug 17, 2023
@Kevin-Prichard Kevin-Prichard self-assigned this Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant