-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(#102) Create script to migrate data to S3 bucket. #106
Conversation
Expects CSV file as exported by MS SQL Server Management Studio. (See https://github.com/NCIOCPL/clinical-trials-search-print/wiki/Deployment#data-migration) Closes #102
raise RuntimeError("Please provide the input file name as a command line argument.") | ||
|
||
|
||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can condense to:
not os.environ.get("CTS_BUCKET_NAME", "").strip()
With the caveat that a None
os.environ object (I don't think that's possible?) will throw an error.
loadedCount = 0 | ||
errorCount = 0 | ||
totalCount = 0 | ||
with open(sys.argv[1], encoding="utf-8-sig") as datafile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity I would put sys.argv[1]
into a variable filename
up at line 32.
errorCount = 0 | ||
totalCount = 0 | ||
with open(sys.argv[1], encoding="utf-8-sig") as datafile: | ||
csv.field_size_limit(sys.maxsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious--why is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sys.maxsize
specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is too small for some of the larger multi-trial pages.
|
||
for row in reader: | ||
|
||
key = row[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purely a "if you want to minimize your code" option. I believe this works:
key, cacheDate, trialIDs, searchParams, content = row[0:4]
searchParams = row[3] | ||
content = row[4] | ||
|
||
metadata = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find this cleaner:
metadata = {
"migrated-data": "True",
"originally-generated": cacheDate,
"search-criteria": searchParams,
"trial-id-list": trialIDs
}
for row in reader: | ||
|
||
key = row[0] | ||
cacheDate = row[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically python would do cache_date
instead of cacheDate
, per PEP8.
|
||
loadedCount = 0 | ||
errorCount = 0 | ||
totalCount = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically totalCount
isn't needed and could just be len(reader)
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter has comments:
************* Module migrate
migrate.py:43:0: C0103: Constant name "loadedCount" doesn't conform to UPPER_CASE naming style (invalid-name)
migrate.py:44:0: C0103: Constant name "errorCount" doesn't conform to UPPER_CASE naming style (invalid-name)
migrate.py:45:0: C0103: Constant name "totalCount" doesn't conform to UPPER_CASE naming style (invalid-name)
migrate.py:85:15: W0718: Catching too general exception Exception (broad-exception-caught)
The constants aren't actually constants, so they could be ignored. Generally, if you were to use a default function __main__
then this wouldn't pop.
I see the other is a non-AWS catch-all-problems, which is fair.
So for both you could add an exclusion comment for black
to make those go away.
Expects CSV file as exported by MS SQL Server Management Studio. (See https://github.com/NCIOCPL/clinical-trials-search-print/wiki/Deployment#data-migration)
Closes #102