Skip to content

Releases: open-compass/opencompass

0.3.6

19 Nov 03:54
ff831b1
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompus v0.3.6!

🌟 Highlights
✨ This release brings several updates and new features that enhance the functionality and performance of OpenCompass. Notable additions include support for long context evaluations, the introduction of the BABILong dataset, and enhancements to the MuSR dataset. We have also welcomed new contributors to our community, which we are excited to introduce.

🚀 New Features
🔥 Added long context evaluation for base models, expanding the scope of model assessments.
🔥 Introduced the BABILong dataset, enriching the resources available for research and development.
🔥 Added MUSR dataset evaluation, which evaluates language models on multistep soft reasoning tasks.

📖 Documentation
📚 Updated documentation to reflect the latest changes and features, ensuring that users can easily integrate these updates into their workflows.

🐛 Bug Fixes
🛠 Fixed issues with first_option_postprocess to improve reliability.
🛠 Addressed bugs in the PR testing process to ensure smoother contributions from the community.

Enhancements and Refactors
🔧 Implemented auto-download for FollowBench, streamlining the setup process for new users.
🔧 Refined the CI/CD pipeline, including daily tests and baseline updates, to maintain high standards of quality and performance.

🎉 Welcome New Contributors
👏 We are delighted to welcome three new contributors who have made valuable contributions to this release:

  1. @MCplayerFromPRC for pushing InternTrain evaluation differences.
  2. @DespairL for adding single LoRA adapter support for vLLM inference.
  3. @abrohamLee for contributing MuSR Dataset Evaluation.

We hope you enjoy this new release and find it useful for your projects. Your feedback is always welcome and helps us improve OpenCompass continuously. Thank you for being part of our community! 🌟

Full Changelog: 0.3.5...0.3.6

0.3.5

04 Nov 02:56
db258eb
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompress v0.3.5!

🌟 Highlights

  • 🚀 Introduction of two new datasets: CMO&AIME, expanding our evaluation capabilities.
  • 📖 Several updates to our documentation, ensuring clearer guidance for all users.
  • ⚙ Several enhancements and refactoring efforts to make our codebase more robust and maintainable.

🚀 New Features

  • 🆕 Added support for the CMO&AIME datasets, broadening the scope of models we can evaluate. (#1610)
  • 🆕 Introduced the CompassArenaSubjectiveBench, a new benchmark for subjective evaluations. (#1645)
  • 🆕 Added configurations for the lmdeploy DeepSeek model, enhancing compatibility with cutting-edge technologies. (#1656)

📖 Documentation

  • 📚 Updated the documentation to reflect the latest changes and improvements, making it easier than ever to navigate and understand. (#1655)

🐛 Bug Fixes

  • 🔧 Fixed issues with the ruler_16k_gen component, ensuring more accurate and reliable results. (#1643)
  • 🔧 Resolved an error in the get_loglikelihood function when using lmdeploy as the accelerator. (#1659)
  • 🔧 Addressed problems with automatic downloads for certain datasets, streamlining the user experience. (#1652)

⚙ Enhancements and Refactors

  • 💪 Enhanced the summarizer configurations for models, improving the efficiency and effectiveness of summarization tasks. (#1600)
  • 💪 Added new model configurations, keeping up with the latest advancements in machine learning. (#1653)
  • 💪 Updated the WildBench maximum sequence length, allowing for better handling of longer input sequences. (#1648)
  • 💪 Updated the Needlebench OSS path, ensuring smoother data access and processing. (#1651)
  • 💪 Improved the mmmlu_lite dataloader, optimizing data loading processes. (#1658)

🎉 Welcome New Contributors

  • 👏 A warm welcome to @jnanliu, who has made their first contribution by adding the CMO&AIME datasets! (#1610)

For a complete overview of all changes, please refer to the full changelog: 0.3.4...0.3.5

0.3.4

25 Oct 12:25
9c39cb6
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.4!

🎉 OpenCompass v0.3.4 brings major enhancements including new benchmarks, improved documentation, and numerous bug fixes.
🌈 Notable features include support for new datasets and the integration of lmdeploy pipeline API.

🔧 Support for New Datasets:

  • Addition of GaoKaoMath Dataset for Evaluation.
  • Support for MMMLU & MMMLU-lite Benchmark.
  • Integration of Judgerbench and reorganization of subeval.
  • Support for LiveCodeBench.

📝 Output Format Enhancements:

  • Support for printing and saving results as markdown format tables.

🔧 Pipeline and Integration Improvements:

  • Integration of lmdeploy pipeline API.
  • Update of TurboMindModel through integration of lmdeploy pipeline API.
  • Removal of prefix bos_token from messages when using lmdeploy as the accelerator.

🛠️ Miscellaneous Enhancements:

  • Updates to the common summarizer regex extraction.
  • Internal humaneval postprocess addition and updates.

📖 Documentation Updates

🐛 Bug Fixes

🎉 Welcome New Contributors
👋 New Contributors Joined the Team:

@BobTsang1995 - Contributed support for MMMLU & MMMLU-lite Benchmark.
@noemotiovon - Provided NPU support fixes.
@changlan - Fixed RULER datasets.
@BIGWangYuDong - Added support for printing and saving results as markdown format tables.
Thank you to all contributors who have made this release possible. For a complete list of changes, please see the full changelog linked below.

Full Changelog: 0.3.3...0.3.4

0.3.3

30 Sep 08:58
22a4e76
Compare
Choose a tag to compare

🌟 OpenCompass v0.3.3 Release Log
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.3!

🚀 New Features

  • 🔧 Added support for the SciCode summarizer configuration.
  • 🛠 Introduced support for internal Followbench.
  • 🔧 Updated models and configurations for MathBench & WikiBench under FullBench.
  • 🛠 Enhanced support for OpenAI O1 models and Qwen2.5 Instruct.
  • 🔧 Included a postprocess function for custom models.
  • 🛠 Added InternTrain feature for broader model training scenarios.

📖 Documentation

  • 📚 Updated the README with the latest information on how to use OpenCompass effectively.

🐛 Bug Fixes

  • 🔧 Fixed issues with the link-check workflow and wildbench.
  • 🛠 Resolved errors in partitioning and corrected typos throughout the codebase.
  • 🔧 Addressed compatibility issues with lmdeploy interface type changes.
  • 🛠 Fixed the followbench dataset configuration and token settings.

⚙ Enhancements and Refactors

  • 🛠 Enhanced support for verbose output in OpenAI API interactions.
  • 🔧 Updated maximum output length configurations for multiple models.
  • 🛠 Improved handling of the "begin section" in meta_template for better parsing.
  • 🔧 Added a common summarizer for qabench and expanded test coverage for various models.

🎉 Welcome New Contributors
👋 We'd like to extend a warm welcome to our new contributors who have made their first contributions to OpenCompass:

Thank you to all our contributors for making this release possible!

Full Changelog: 0.3.2.post1...0.3.3

0.3.2.post1

06 Sep 10:48
b5f8afb
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.3.2...0.3.2.post1

0.3.2

06 Sep 08:21
ff18545
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.2!

🚀 New Features

  • 🛠 Added extra_body support for OpenAISDK and introduced proxy URL support when connecting to OpenAI's API.
  • 🗂 Included auto-download functionality for Mmlu-pro, Needlebench, Longbench and other datasets.
  • 🤝 Integrated support for the Rendu API.
  • 🧪 Added a model postprocess function.

📖 Documentation

  • 📜 Updated the README file for better clarity and guidance.

🐛 Bug Fixes

  • 🛠 Fixed CLI evaluation for multiple models.
  • 🛠 Updated requirements to resolve dependency issues.
  • 🛠 Corrected configurations for the Llama model series.
  • 🛠 Addressed bad cases and added environment information to improve testing.

⚙ Enhancements and Refactors

  • 🛠 Made OPENAI_API_BASE compatible with OpenAI's default environment settings.
  • 🛠 Optimized SciCode for improved performance.
  • 🛠 Added an api_key attribute to TurboMindAPIModel.
  • 🛠 Implemented fixes and improvements to the CI test environment, including baselines for vllm.

🎉 Welcome New Contributors

  • 👋 @cpa2001 contributed with the addition of icl_sliding_k_retriever.py and updates to __init__.py.
  • 👋 @gyin94 made the OPENAI_API_BASE compatible with OpenAI's default environment.
  • 👋 @chengyingshe added an attribute api_key into TurboMindAPIModel.
  • 👋 @yanzeyu supported the integration of Rendu API.

Full Changelog: 0.3.1...0.3.2

OpenCompass v0.3.1

23 Aug 03:00
5485207
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.1!


🌟 Highlights

  • 🚀 Support pip installation, update Readme and evaluation demo
  • 🐛 Fixed various dataset loading issues.
  • ⚙️ Enhanced auto-download features for datasets.

🚀 New Features

  • 🆕 Introduced support for Ruler datasets.
  • 🆕 Enhanced model compatibility.
  • 🆕 Improved dataset handling, support auto-download for various datasets

📖 Documentation

  • 📚 Updated README to reflect the latest changes.
  • 📚 Improved documentation for dataset loading procedures.

🐛 Bug Fixes

  • 🐞 Resolved modelscope dataset load issues.
  • 🐞 Corrected evaluation scores for the Lawbench dataset.
  • 🐞 Fixed dataset bugs for CommonsenseQA and Longbench.

⚙ Enhancements and Refactors

  • 🔧 Retained first and last halves of prompts to avoid max_seq_len issues.
  • 🔧 Updated Compassbench to v1.3.
  • 🔧 Switched to Python runner for single GPU operations.

🎉 Welcome New Contributors

  • 🙌 @Yunnglin for fixing modelscope dataset load problem.
  • 🙌 @changyeyu for addressing max_seq_len issues with prompt handling.
  • 🙌 @seetimee for updates to openai_api.py.
  • 🙌 @HariSeldon0 for adding the scicode dataset.

What's Changed

Full Changelog: 0.3.0...0.3.1


Thank you for your continued support and contributions to OpenCompass!

OpenCompass v0.3.0

06 Aug 17:34
264fd23
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.0! This release brings a variety of new features, enhancements, and bug fixes to improve your experience.

🌟 Highlights

  1. Support for OpenAI ChatCompletion
  2. Updated Model Support List
  3. Support Dataset Automatic Download
  4. Support pip install opencompass

🚀 New Features

  1. Support for CompassBench Checklist Evaluation
  2. Adding support for Doubao API
  3. Support for ModelScope Datasets

📖 Documentation

  1. Update NeedleBench Docs
  2. Update Documentation

🐛 Bug Fixes

  1. Fix Typing and Typo
  2. Fix Lint Issues
  3. Fix Summary Error in subjective.py

⚙ Enhancements and Refactors

  1. Upgrade Default Math pred_postprocessor
  2. Fix Path and Folder Updates
  3. Update Get Data Path for LCBench and HumanEval

🔗 Full Change Logs

🎉 Welcome New Contributors

Full Changelog: 0.2.6...0.3.0

OpenCompass v0.2.6

05 Jul 16:36
a62c613
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.6!

🌟 Highlights

  • No noteworthy highlights.

🚀 New Features

  1. #1215 #1224 #1266 Add Datasets MT-Bench-101, Fofo, wildbench
  2. #1286 Add Models InternLM2.5-7B

📖 Documentation

  1. #1252 Add doc for accelerator function
  2. #1263 Update quick start guide

🐛 Bug Fixes

  1. #1221 Resolve release version installation and import issues
  2. #1228 Fix pip version issues
  3. #1282 Update MathBench summarizer & fix cot setting

⚙ Enhancements and Refactors

  1. #1284 Reorganize subjective eval

🎉 Welcome New Contributors

🔗 Full Change Logs

Full Changelog: 0.2.5...0.2.6

OpenCompass v0.2.5

29 May 16:35
a77b8a5
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.5!

🌟 Highlights

  • Simplify the huggingface / vllm / lmdeploy model wrapper. meta_template is no longer needed to be hand-crafted in model configs
  • Introduce evaluation results README in ~20 dataset config folders.

🚀 New Features

  1. #1065 Add LLaMA-3 Series Configs
  2. #1048 Add TheoremQA with 5-shot
  3. #1094 Support Math evaluation via judgemodel
  4. #1080 Add gpqa prompt from simple_evals, openai
  5. #1074 Add mmlu prompt from simple_evals, openai
  6. #1123 Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs

📖 Documentation

  1. #1053 Update readme
  2. #1102 Update NeedleInAHaystack Docs
  3. #1110 Update README.md
  4. #1205 Remove --no-batch-padding and Use --hf-num-gpus

🐛 Bug Fixes

  1. #1036 Update setup.py install_requires
  2. #1051 Fixed the issue caused
  3. #1043 fix multiround
  4. #1070 Fix sequential runner
  5. #1079 Fix Llama-3 meta template

⚙ Enhancements and Refactors

  1. #1163 enable HuggingFacewithChatTemplate with --accelerator via cli
  2. #1104 fix prompt template
  3. #1109 Update performance of common benchmarks

🎉 Welcome New Contributors

🔗 Full Change Logs

Read more