BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations 🔥
🎉July 18 2024: Happy to share that our enhanced version of BioT5+ ranked 1st place in the Text-based Molecule Generation track and 2nd place in the Molecular Captioning Track at Language + Molecule @ ACL2024 Competition
🔥July 11 2024: Data, codes, and pre-trained models for BioT5+ are relased.
🔥May 16 2024: BioT5+ is accepted by ACL 2024 (Findings).
🔥Mar 03 2024: We have published a suvery paper Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey and the related github repository Awesome-Biomolecule-Language-Cross-Modeling. Kindly check it if you are interested in this field~
🔥Feb 29 2024: Update BioT5 to BioT5+ with the ability of IUPAC integration and multi-task learning!
🔥Nov 06 2023: Update example usage for molecule captioning, text-based molecule generation, drug-target interaction prediction!
🔥Oct 20 2023: The data for fine-tuning is released!
🔥Oct 19 2023: The pre-trained and fine-tuned models are released!
🔥Oct 11 2023: Initial commits. More codes, pre-trained model, and data are coming soon.
This repository contains the source code for
- EMNLP 2023 paper "BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations", by Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. BioT5 achieves superior performance on various biological tasks.
- ACL 2024 (Findings) paper "BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning", by Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan. BioT5+ is pre-trained and fine-tuned with a large number of experiments, including 3 types of problems (classification, regression, generation), 15 kinds of tasks, and 21 total benchmark datasets, demonstrating the remarkable performance and state-of-the-art results in most cases.
- If you have questions, don't hesitate to open an issue or ask me via [email protected] or Lijun Wu via [email protected]. We are happy to hear from you!
Please refer to the biot5
or biot5_plus
folder for detailed instructions.
@inproceedings{pei2023biot5,
title={BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations},
author={Pei, Qizhi and Zhang, Wei and Zhu, Jinhua and Wu, Kehan and Gao, Kaiyuan and Wu, Lijun and Xia, Yingce and Yan, Rui},
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.70",
pages = "1102--1123"
}
@article{pei2024biot5+,
title={BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning},
author={Pei, Qizhi and Wu, Lijun and Gao, Kaiyuan and Liang, Xiaozhuan and Fang, Yin and Zhu, Jinhua and Xie, Shufang and Qin, Tao and Yan, Rui},
journal={arXiv preprint arXiv:2402.17810},
year={2024}
}
The code is based on nanoT5.