-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVA-3753: load ENA project to EVAPRO #241
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Mostly high-level comments since obviously there are still a few things left to do, but overall it looks good and I'm excited to start testing it out 👍
return f"{ena_ftp_file_prefix_path}/{accession_id[0:6]}/{accession_id}/{filename}" | ||
|
||
|
||
class EvaProjectLoader(AppLogger): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some way of clarifying the entry points of this class would be useful here, since it's quite large and I assume will be the main way we integrate with the rest of eva-submission. Some combination of:
- having the main public methods at the top of the class - basically organise all the methods "top down" (methods above everything called in their body) (you can also do bottom-up, but personally I prefer top-down)
- put docstrings on the main methods - the method names are pretty self-descriptive (which is already much more readable than the original perl script!), but docstrings help me find the important bits of code and can go into some of the details of what is going on
The same could apply to EnaProjectFinder
, though it's less likely to be reused elsewhere it would still be helpful for future development.
return assembly_set_obj | ||
|
||
def insert_custom_assembly_set(self, taxonomy_obj, custom_assembly): | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a TODO
?
Incidentally I guess we might have to think about how to combine this with anything we re-implemented in eva-submission, e.g. here
@@ -0,0 +1,191 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there's anything in any of this code that we (or ENA) might not want to make public? Just mentioning as the previous script was in a private repo...
$sth_eva->execute($eload,6); | ||
$sth_eva = $dbh_eva->prepare("insert into PROJECT_EVA_SUBMISSION (PROJECT_ACCESSION,old_ticket_id,ELOAD_ID) values (?,?,?)"); | ||
$sth_eva->execute($project_accession,$eload,$eload); | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this and any other perl code, unless there are any bits you haven't implemented yet and need to keep track of
return None | ||
|
||
def eva_study_accession_from_project_accession_code(self): | ||
# Is it to really necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I'm kind of leaning towards the strategy of only porting what we know we'll need, then doing really thorough testing to make sure things aren't broken. Maybe it's slightly safer to port everything and gradually remove pieces, but the testing is a necessity either way...
|
||
|
||
def test_find_project_from_ena_database(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
def test_find_ena_submission(self): | ||
project = 'PRJEB66443' | ||
# for project in all_projects[700:800]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# for project in all_projects[700:800]: |
Relatively large PR that provide a replacement for
load_from_ena_postgres_or_file.pl
It currently relies on access to the ERA database bt could be converted to using the API in the future.
Only change to the schema is for the Sample table that should match the new design.
To load information about the Sample we need additional information that is only available in the brokering and the initial VCF file.