New paper: Elsevier Arena: Human Evaluation of Chemistry/Biology/Health #38

maykcaldas · 2024-09-12T09:42:11Z

Paper: Elsevier Arena: Human Evaluation of Chemistry/Biology/Health

Authors: Camilo Thorne, Christian Druckenbrodt, Kinga Szarkowska, Deepika

Abstract: The quality and capabilities of large language models cannot be currentlyfully assessed with automated, benchmark evaluations. Instead, humanevaluations that expand on traditional qualitative techniques from naturallanguage generation literature are required. One recent best-practice consistsin using A/B-testing frameworks, which capture preferences of human evaluatorsfor specific models. In this paper we describe a human evaluation experimentfocused on the biomedical domain (health, biology, chemistry/pharmacology)carried out at Elsevier. In it a large but not massive (8.8B parameter)decoder-only foundational transformer trained on a relatively small (135Btokens) but highly curated collection of Elsevier datasets is compared toOpenAI's GPT-3.5-turbo and Meta's foundational 7B parameter Llama 2 modelagainst multiple criteria. Results indicate -- even if IRR scores weregenerally low -- a preference towards GPT-3.5-turbo, and hence towards modelsthat possess conversational abilities, are very large and were trained on verylarge datasets. But at the same time, indicate that for less massive modelstraining on smaller but well-curated training sets can potentially give rise toviable alternatives in the biomedical domain.

Link: https://arxiv.org/abs/2409.05486

Reasoning: Reasoning: Let's think step by step in order to produce the answer. We need to determine if the paper is about a language model. The abstract mentions the evaluation of large language models, specifically comparing a custom model to GPT-3.5-turbo and Llama 2. It discusses the performance and preferences of these models in the biomedical domain. Since the focus is on evaluating and comparing language models, it is clear that the paper is about language models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New paper: Elsevier Arena: Human Evaluation of Chemistry/Biology/Health #38

New paper: Elsevier Arena: Human Evaluation of Chemistry/Biology/Health #38

maykcaldas commented Sep 12, 2024

New paper: Elsevier Arena: Human Evaluation of Chemistry/Biology/Health #38

New paper: Elsevier Arena: Human Evaluation of Chemistry/Biology/Health #38

Comments

maykcaldas commented Sep 12, 2024