Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVA-3753: load ENA project to EVAPRO #241

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tcezard
Copy link
Member

@tcezard tcezard commented Feb 25, 2025

Relatively large PR that provide a replacement for load_from_ena_postgres_or_file.pl
It currently relies on access to the ERA database bt could be converted to using the API in the future.
Only change to the schema is for the Sample table that should match the new design.
To load information about the Sample we need additional information that is only available in the brokering and the initial VCF file.

Copy link
Contributor

@apriltuesday apriltuesday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Mostly high-level comments since obviously there are still a few things left to do, but overall it looks good and I'm excited to start testing it out 👍

return f"{ena_ftp_file_prefix_path}/{accession_id[0:6]}/{accession_id}/{filename}"


class EvaProjectLoader(AppLogger):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some way of clarifying the entry points of this class would be useful here, since it's quite large and I assume will be the main way we integrate with the rest of eva-submission. Some combination of:

  • having the main public methods at the top of the class - basically organise all the methods "top down" (methods above everything called in their body) (you can also do bottom-up, but personally I prefer top-down)
  • put docstrings on the main methods - the method names are pretty self-descriptive (which is already much more readable than the original perl script!), but docstrings help me find the important bits of code and can go into some of the details of what is going on

The same could apply to EnaProjectFinder, though it's less likely to be reused elsewhere it would still be helpful for future development.

return assembly_set_obj

def insert_custom_assembly_set(self, taxonomy_obj, custom_assembly):
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a TODO?
Incidentally I guess we might have to think about how to combine this with anything we re-implemented in eva-submission, e.g. here

@@ -0,0 +1,191 @@
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's anything in any of this code that we (or ENA) might not want to make public? Just mentioning as the previous script was in a private repo...

$sth_eva->execute($eload,6);
$sth_eva = $dbh_eva->prepare("insert into PROJECT_EVA_SUBMISSION (PROJECT_ACCESSION,old_ticket_id,ELOAD_ID) values (?,?,?)");
$sth_eva->execute($project_accession,$eload,$eload);
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this and any other perl code, unless there are any bits you haven't implemented yet and need to keep track of

return None

def eva_study_accession_from_project_accession_code(self):
# Is it to really necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'm kind of leaning towards the strategy of only porting what we know we'll need, then doing really thorough testing to make sure things aren't broken. Maybe it's slightly safer to port everything and gradually remove pieces, but the testing is a necessity either way...



def test_find_project_from_ena_database(self):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


def test_find_ena_submission(self):
project = 'PRJEB66443'
# for project in all_projects[700:800]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# for project in all_projects[700:800]:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants