0.3.6
The OpenCompass team is thrilled to announce the release of OpenCompus v0.3.6!
π Highlights
β¨ This release brings several updates and new features that enhance the functionality and performance of OpenCompass. Notable additions include support for long context evaluations, the introduction of the BABILong dataset, and enhancements to the MuSR dataset. We have also welcomed new contributors to our community, which we are excited to introduce.
π New Features
π₯ Added long context evaluation for base models, expanding the scope of model assessments.
π₯ Introduced the BABILong dataset, enriching the resources available for research and development.
π₯ Added MUSR dataset evaluation, which evaluates language models on multistep soft reasoning tasks.
π Documentation
π Updated documentation to reflect the latest changes and features, ensuring that users can easily integrate these updates into their workflows.
π Bug Fixes
π Fixed issues with first_option_postprocess
to improve reliability.
π Addressed bugs in the PR testing process to ensure smoother contributions from the community.
β Enhancements and Refactors
π§ Implemented auto-download for FollowBench, streamlining the setup process for new users.
π§ Refined the CI/CD pipeline, including daily tests and baseline updates, to maintain high standards of quality and performance.
π Welcome New Contributors
π We are delighted to welcome three new contributors who have made valuable contributions to this release:
- @MCplayerFromPRC for pushing InternTrain evaluation differences.
- @DespairL for adding single LoRA adapter support for vLLM inference.
- @abrohamLee for contributing MuSR Dataset Evaluation.
We hope you enjoy this new release and find it useful for your projects. Your feedback is always welcome and helps us improve OpenCompass continuously. Thank you for being part of our community! π
Full Changelog: 0.3.5...0.3.6