Set up a pipeline to extract SGD gene names only for a list of test papers #326

valearna · 2025-01-13T18:51:30Z

Run this pipeline on caltech-curation-dev to evaluate how well the extraction works. Select a list of 20 test papers that are in corpus for WB and SGD and have C. elegans and S. Cerevisiae species tags. Store the results in a csv file with paper id and list of sgd gene ids and names

valearna · 2025-01-18T05:02:10Z

test_sgd_genes_extraction_results.csv

valearna · 2025-01-18T05:03:16Z

@draciti @vanaukenk Here's the results of the SGD genes extraction on 20 papers that are in corpus for both WB and SGD

draciti · 2025-01-27T18:28:28Z

thanks @valearna !
@vanaukenk and I will check those and we will get back to you.
@draciti will take the first half, @vanaukenk the second half.
Will evaluate if the genes extracted for SGD are correct, and if there was any missed also for the papers that did not have any gene extracted.

draciti · 2025-01-27T20:51:22Z

Moved the doc on drive: https://docs.google.com/spreadsheets/d/1-f5nHKNLPwRc0ymi-xhPdy3eQPa_LGYtTjQqodf43dE/edit?gid=572566362#gid=572566362

draciti · 2025-01-29T19:35:07Z

@valearna -- Are we also applying TF-IDF with a threshold of 10 for the list of extracted genes for yeast?

valearna · 2025-01-29T19:44:21Z

Yes, we are using TFIDF 10 also for SGD

draciti · 2025-01-31T17:31:13Z

@vanaukenk I am done with the first half -FYI

vanaukenk · 2025-02-01T23:30:43Z

I've finished checking my S. cerevisiae papers.

Generally, I thought the ACKnowledge pipeline did quite well, but one major thing I saw was that we will need to account for protein names that are different from gene names and decide how we want to find those. We missed some SGD gene associations because the mention was actually a protein, for example Puf4p instead of PUF4.

valearna added the SGD label Jan 13, 2025

valearna assigned draciti and vanaukenk Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up a pipeline to extract SGD gene names only for a list of test papers #326

Set up a pipeline to extract SGD gene names only for a list of test papers #326

valearna commented Jan 13, 2025 •

edited

Loading

valearna commented Jan 18, 2025

valearna commented Jan 18, 2025

draciti commented Jan 27, 2025

draciti commented Jan 27, 2025

draciti commented Jan 29, 2025

valearna commented Jan 29, 2025

draciti commented Jan 31, 2025

vanaukenk commented Feb 1, 2025

Set up a pipeline to extract SGD gene names only for a list of test papers #326

Set up a pipeline to extract SGD gene names only for a list of test papers #326

Comments

valearna commented Jan 13, 2025 • edited Loading

valearna commented Jan 18, 2025

valearna commented Jan 18, 2025

draciti commented Jan 27, 2025

draciti commented Jan 27, 2025

draciti commented Jan 29, 2025

valearna commented Jan 29, 2025

draciti commented Jan 31, 2025

vanaukenk commented Feb 1, 2025

valearna commented Jan 13, 2025 •

edited

Loading