You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
provide delimiter params so that csv.reader can do its job correctly
extend those parameters to the argparse cli options
should know nothing about any db or DDL dialect: a function or method reference should be passed in for handling the result of CSVScanner.scan()
break out the progress indicator by shoving it into a progress_fn: Callable callback parameter [8/20]
provide a progress_interval: int parameter to control the frequency of the progress indicator [8/20]
row iteration: provide a sample_pct: number which specifies the percentage of rows to check for type or length
maybe use self._csv_fh.seek(n) to skip to the next apparent sample row; this might necessitate reinstantiating csv.reader to begin after the next newline
or, use a io.TextBuffer to skip rows behind the scenes so that the reader instance doesn't get affected
if possible, abstract out that this is about CSV or TSV and make scanning any data source feasible, by passing in a class that handles opening, iterating, breaking down the data source, being invoked by CSVScanner to produce a row, a block of rows, which will be processed in .scan(). Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.
csv2db.py
zip_walker needs to be a class ZipCollection with a base called CSVCollection or something
it's the head interface for instantiating CSVScanner, and outputting CSVScanner.result(), so csv.reader parameters need to go here
create_import_sqlite does a lot of heavy lifting by interfacing with the given DBMS, issuing create table xyz and then inserting rows. Abstractifying some of this would be healthy:
sql dialect
separate out the create and insert into at least separate methods, but probably separate classes
provide the same type of progress_fn: Callable callback and progress_interval: int that csv2db.py provides. progress_interval: number could be a percent, or an every-n-rows sort of event criteria [8/20]
regex filter file pathname & extension from cli [8/20]
logging: offer a log level level setting via argparse
all stdout should be routed to a callback: a caller using only the lib should be responsible for any console or gui output [8/20]
The text was updated successfully, but these errors were encountered:
2023 Aug 16
csv.reader
can do its job correctlyprogress_fn: Callable
callback parameter [8/20]progress_interval: int
parameter to control the frequency of the progress indicator [8/20]sample_pct: number
which specifies the percentage of rows to check for type or lengthself._csv_fh.seek(n)
to skip to the next apparent sample row; this might necessitate reinstantiatingcsv.reader
to begin after the next newlineio.TextBuffer
to skip rows behind the scenes so that thereader
instance doesn't get affected.scan()
. Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.zip_walker
needs to be a class ZipCollection with a base called CSVCollection or somethingCSVScanner.result()
, socsv.reader
parameters need to go herecreate_import_sqlite
does a lot of heavy lifting by interfacing with the given DBMS, issuingcreate table xyz
and then inserting rows. Abstractifying some of this would be healthy:progress_fn: Callable
callback andprogress_interval: int
thatcsv2db.py
provides.progress_interval: number
could be a percent, or an every-n-rows sort of event criteria [8/20]The text was updated successfully, but these errors were encountered: