You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Abstract: Accurately measuring protein-RNA binding affinity is crucial in manybiological processes and drug design. Previous computational methods forprotein-RNA binding affinity prediction rely on either sequence or structurefeatures, unable to capture the binding mechanisms comprehensively. The recentemerging pre-trained language models trained on massive unsupervised sequencesof protein and RNA have shown strong representation ability for variousin-domain downstream tasks, including binding site prediction. However,applying different-domain language models collaboratively for complex-leveltasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trainedlanguage models from different biological domains via Complex structure forProtein-RNA binding Affinity prediction. We demonstrate for the first time thatcross-biological modal language models can collaborate to improve bindingaffinity prediction. We propose a Co-Former to combine the cross-modal sequenceand structure information and a bi-scope pre-training strategy for improvingCo-Former's interaction understanding. Meanwhile, we build the largestprotein-RNA binding affinity dataset PRA310 for performance evaluation. We alsotest our model on a public dataset for mutation effect prediction. CoPRAreaches state-of-the-art performance on all the datasets. We provide extensiveanalyses and verify that CoPRA can (1) accurately predict the protein-RNAbinding affinity; (2) understand the binding affinity change caused bymutations; and (3) benefit from scaling data and model size.
Reasoning: produce the answer. We start by examining the title and abstract for any mention of language models. The title mentions "pretrained sequence models," which often refers to language models. The abstract further elaborates on the use of "pre-trained language models trained on massive unsupervised sequences of protein and RNA" and discusses their application in binding affinity prediction. This indicates that the paper involves the use of language models, specifically in the context of biological sequences.
The text was updated successfully, but these errors were encountered:
Paper: CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex
Authors: Rong Han, Xiaohong Liu, Tong Pan, Jing Xu, Xiaoyu Wang, Wuyang Lan,
Abstract: Accurately measuring protein-RNA binding affinity is crucial in manybiological processes and drug design. Previous computational methods forprotein-RNA binding affinity prediction rely on either sequence or structurefeatures, unable to capture the binding mechanisms comprehensively. The recentemerging pre-trained language models trained on massive unsupervised sequencesof protein and RNA have shown strong representation ability for variousin-domain downstream tasks, including binding site prediction. However,applying different-domain language models collaboratively for complex-leveltasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trainedlanguage models from different biological domains via Complex structure forProtein-RNA binding Affinity prediction. We demonstrate for the first time thatcross-biological modal language models can collaborate to improve bindingaffinity prediction. We propose a Co-Former to combine the cross-modal sequenceand structure information and a bi-scope pre-training strategy for improvingCo-Former's interaction understanding. Meanwhile, we build the largestprotein-RNA binding affinity dataset PRA310 for performance evaluation. We alsotest our model on a public dataset for mutation effect prediction. CoPRAreaches state-of-the-art performance on all the datasets. We provide extensiveanalyses and verify that CoPRA can (1) accurately predict the protein-RNAbinding affinity; (2) understand the binding affinity change caused bymutations; and (3) benefit from scaling data and model size.
Link: https://arxiv.org/abs/2409.03773
Reasoning: produce the answer. We start by examining the title and abstract for any mention of language models. The title mentions "pretrained sequence models," which often refers to language models. The abstract further elaborates on the use of "pre-trained language models trained on massive unsupervised sequences of protein and RNA" and discusses their application in binding affinity prediction. This indicates that the paper involves the use of language models, specifically in the context of biological sequences.
The text was updated successfully, but these errors were encountered: