Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open datasets for evaluation #4

Open
fritzo opened this issue Mar 22, 2022 · 3 comments
Open

Open datasets for evaluation #4

fritzo opened this issue Mar 22, 2022 · 3 comments

Comments

@fritzo
Copy link
Member

fritzo commented Mar 22, 2022

What are some open datasets for evaluation? These will be needed to answer #3 about hyperparameters and algorithms

cc @andrenguyen

@fritzo
Copy link
Member Author

fritzo commented Mar 27, 2022

Moss et al. (2020) (section 5.2 and appendix E) evaluate their algorithm using minimum free folding energy as an objective function in optimizing short proteins, deferring to ViennaRNA to compute the objective function in experiments. Here is an example where they call the RNAfold utility as a subprocess.

We acknowledge that [minimizing minimum free-fold energy] may not be biologically meaningful on its own, however, as free-folding energy is of critical importance to other down-stream genetic prediction tasks, we believe it to be a reasonable proxy for wet-lab-based genetic design loops.

@fritzo
Copy link
Member Author

fritzo commented Mar 30, 2022

Angermueller et al. (2020) (section 5) provide a number of in-silico benchmarking problems, including tfbind8 and tfbind10.

@EWeinstein
Copy link
Collaborator

I've worked with Tcellmatch (Fischer et al. 2020) before; it makes predictions based on short sequences (CDR3s), including variable length sequences. I believe @andrenguyen has some recent experience with this model also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants