Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extraction code for processing fusion caller data #229

Closed
jarbesfeld opened this issue Jan 16, 2025 · 0 comments · Fixed by #230
Closed

Add extraction code for processing fusion caller data #229

jarbesfeld opened this issue Jan 16, 2025 · 0 comments · Fixed by #230
Assignees
Labels
enhancement New feature or request priority:medium Medium priority

Comments

@jarbesfeld
Copy link
Contributor

Feature description

The translators are able to standardize fusion caller output to AssayedFusion objects. In #228 I created pydantic classes for the relevant callers (I plan on ultimately dropping support FusionMap and MapSplice since there is no online documentation).

We should develop a series of extraction methods to convert fusion caller output to pydantic classes to enable downstream standardization. I realized that @jsstevenson implemented a similar feature in the MAVE work for processing score set records, so I think a similar thing can be implemented here. For example, if we had a csv that contained 100 detected fusions from JAFFA, we could create a list of 100 JAFFA objects using the following code:

path=Path("../jaffa_results.csv")
fusions_list: list[JAFFA] = []
column_rename= {
    "fusion genes": "fusion_genes",
    "spanning reads": "spanning_reads",
    "spanning pairs": "spanning_pairs"
}
with path.open() as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        row = {column_rename.get(key, key): value for key, value in row.items()}
        fusions_list.append(JAFFA(**row))

Example output for the first item in the list:

{'type': <Caller.JAFFA: 'JAFFA'>,
 'fusion_genes': 'RP4-777O23.3:AC005154.6',
 'chrom1': 'chr7',
 'base1': 30550636,
 'chrom2': 'chr7',
 'base2': 30574881,
 'rearrangement': True,
 'classification': 'HighConfidence',
 'inframe': 'FALSE',
 'spanning_reads': 7,
 'spanning_pairs': 1602}

We could then iterate through this list using from_jaffa to create the standardized AssayedFusion objects

Use case

This will make the standardization of the fusion data more efficient and allow for validation checks

Acceptance Criteria

Extraction methods have been created for each pydantic class and the attributes in the pydantic classes have been appropriately updated

Proposed solution

No response

Alternatives considered

No response

Implementation details

No response

Potential Impact

No response

Additional context

No response

Contribution

Yes, I can create a PR for this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:medium Medium priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant