Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New paper: CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex #40

Open
maykcaldas opened this issue Sep 12, 2024 · 0 comments

Comments

@maykcaldas
Copy link
Collaborator

Paper: CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex

Authors: Rong Han, Xiaohong Liu, Tong Pan, Jing Xu, Xiaoyu Wang, Wuyang Lan,

Abstract: Accurately measuring protein-RNA binding affinity is crucial in manybiological processes and drug design. Previous computational methods forprotein-RNA binding affinity prediction rely on either sequence or structurefeatures, unable to capture the binding mechanisms comprehensively. The recentemerging pre-trained language models trained on massive unsupervised sequencesof protein and RNA have shown strong representation ability for variousin-domain downstream tasks, including binding site prediction. However,applying different-domain language models collaboratively for complex-leveltasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trainedlanguage models from different biological domains via Complex structure forProtein-RNA binding Affinity prediction. We demonstrate for the first time thatcross-biological modal language models can collaborate to improve bindingaffinity prediction. We propose a Co-Former to combine the cross-modal sequenceand structure information and a bi-scope pre-training strategy for improvingCo-Former's interaction understanding. Meanwhile, we build the largestprotein-RNA binding affinity dataset PRA310 for performance evaluation. We alsotest our model on a public dataset for mutation effect prediction. CoPRAreaches state-of-the-art performance on all the datasets. We provide extensiveanalyses and verify that CoPRA can (1) accurately predict the protein-RNAbinding affinity; (2) understand the binding affinity change caused bymutations; and (3) benefit from scaling data and model size.

Link: https://arxiv.org/abs/2409.03773

Reasoning: produce the answer. We start by examining the title and abstract for any mention of language models. The title mentions "pretrained sequence models," which often refers to language models. The abstract further elaborates on the use of "pre-trained language models trained on massive unsupervised sequences of protein and RNA" and discusses their application in binding affinity prediction. This indicates that the paper involves the use of language models, specifically in the context of biological sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant