From 92a1235126c27d47bdea98381e3ff76537d4f253 Mon Sep 17 00:00:00 2001 From: Jon Craton Date: Fri, 23 Feb 2024 19:11:10 -0500 Subject: [PATCH] Update paper --- makefile | 2 +- paper.bib | 31 +++++++++++++++++++++++++++++++ paper.md | 17 ++++++++--------- 3 files changed, 40 insertions(+), 10 deletions(-) diff --git a/makefile b/makefile index cf42794..d667cac 100644 --- a/makefile +++ b/makefile @@ -32,7 +32,7 @@ doc: python3 -m pdoc -o doc languagemodels paper.pdf: paper.md paper.bib - pandoc $< --citeproc -o $@ + pandoc $< --citeproc --pdf-engine=xelatex -o $@ spellcheck: aspell -c --dont-backup readme.md diff --git a/paper.bib b/paper.bib index 46b2174..a548dbb 100644 --- a/paper.bib +++ b/paper.bib @@ -192,3 +192,34 @@ @article{zhao2023survey journal={arXiv preprint arXiv:2303.18223}, year={2023} } + +@inproceedings{ctranslate2, + title={The OpenNMT neural machine translation toolkit: 2020 edition}, + author={Klein, Guillaume and Hernandez, Fran{\c{c}}ois and Nguyen, Vincent and Senellart, Jean}, + booktitle={Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)}, + pages={102--109}, + year={2020} +} + +@article{lamini-lm, + author = {Minghao Wu and + Abdul Waheed and + Chiyu Zhang and + Muhammad Abdul-Mageed and + Alham Fikri Aji + }, + title = {LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions}, + journal = {CoRR}, + volume = {abs/2304.14402}, + year = {2023}, + url = {https://arxiv.org/abs/2304.14402}, + eprinttype = {arXiv}, + eprint = {2304.14402} +} + +@article{openchat, + title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data}, + author={Wang, Guan and Cheng, Sijie and Zhan, Xianyuan and Li, Xiangang and Song, Sen and Liu, Yang}, + journal={arXiv preprint arXiv:2309.11235}, + year={2023} +} \ No newline at end of file diff --git a/paper.md b/paper.md index 1594fde..3d64224 100644 --- a/paper.md +++ b/paper.md @@ -22,17 +22,17 @@ bibliography: paper.bib # Statement of Need -Large language models are starting to change the way software is designed [@mialon2023augmented]. The development of the transformer [@vaswani2017attention] has led to rapid progress in many NLP and generative tasks [@zhao2023survey; @bert; @gpt2; @gpt3; @t5; @palm; @flan-t5; @bubeck2023sparks]. These models are becoming more powerful as they scale in both parameters [@kaplan2020scaling] and training data [@hoffmann2022training]. +Large language models are having an impact on the way software is designed [@mialon2023augmented]. The development of the transformer [@vaswani2017attention] has led to rapid progress in many NLP and generative tasks [@zhao2023survey; @bert; @gpt2; @gpt3; @t5; @palm; @flan-t5; @bubeck2023sparks]. These models are becoming more powerful as they scale in both parameters [@kaplan2020scaling] and training data [@hoffmann2022training]. Early research suggests that there are many tasks performed by humans that can be transformed by LLMs [@eloundou2023gpts]. For example, large language models trained on code [@codex] are already being used as capable pair programmers via tools such as Microsoft's Copilot. To build with these technologies, students need to understand their capabilities and begin to learn new paradigms for programming. -There are many software tools already available for working with large language models [@hftransformers; @pytorch; @tensorflow; @langchain; @llamacpp; @gpt4all]. While these options serve the needs of software engineers, researchers, and hobbyists, they may not be simple enough for new learners. This package aims to radically lower the barriers to entry for using these tools to solve problems. +There are many software tools already available for working with large language models [@hftransformers; @pytorch; @tensorflow; @langchain; @llamacpp; @gpt4all]. While these options serve the needs of software engineers, researchers, and hobbyists, they may not be simple enough for new learners. This package aims to lower the barriers to entry for using these tools in an educational context. \newpage # Example Usage -This package eliminates boilerplate and configuration options that are meaningless to new learners, and uses basic types and simple functions. Here's an example from a Python REPL session: +This package eliminates boilerplate and configuration options that create noise for new learners, and uses basic types and simple functions. Here's an example from a Python REPL session: ```python >>> import languagemodels as lm @@ -44,9 +44,9 @@ This package eliminates boilerplate and configuration options that are meaningle 'Hello, world!' >>> lm.do("What is the capital of France?") -'paris' +'Paris.' ->>> lm.classify("Language models are useful", "positive", "negative") +>>> lm.do("Classify as positive or negative: I like games", choices=["positive", "negative"]) 'positive' >>> lm.extract_answer("What color is the ball?", "There is a green ball and a red box") @@ -58,7 +58,7 @@ This package eliminates boilerplate and configuration options that are meaningle >>> lm.store_doc(lm.get_wiki("Python"), "Python") >>> lm.store_doc(lm.get_wiki("Javascript"), "Javascript") >>> lm.get_doc_context("What does it mean for batteries to be included in a language?") -'Python: It is often described as a "batteries included" language due to its comprehensive standard library... +'From Python document: It is often described as a "batteries included" language due to its comprehensive standard library... ``` # Features @@ -67,8 +67,7 @@ Despite its simplicity, this package provides a number of building blocks that c - Text generation via the `complete` function - Instruction following with the `do` function -- Chat-style inference using `chat` function -- Zero-shot classification with the `classify` function +- Zero-shot classification with the `do` function and `choices` parameter - Semantic search via a document store using the `store_doc` and `get_doc_context` functions - Extractive question answering using the `extract_answer` function - Basic web retrieval using the `get_wiki` function @@ -85,7 +84,7 @@ The package includes the following features under the hood The design of this software package allows its internals to be loosely coupled to the models and inference engines it uses. At the time of writing, rapid progress is being made to speed up inference on consumer hardware, but much of this software is difficult to install and may not work well for all learners. -This package currently uses the Hugging Face Transformers library [@hftransformers], which internally uses PyTorch [@pytorch] for inference. The main model used is a variant of the T5 base model [@t5] that has been fine-tuned to better follow instructions [@flan-t5]. Models that focus on inference efficiency are starting to become available [@llama]. It will be possible to replace the internals of this package with more powerful and efficient models in the future. In addition to simple local inference, it is also possible to provide API keys to the package to allow access to more powerful hosted inference services. +This package currently uses CTranslate2 [@ctranslate2] for efficient inference on CPU and GPU. The main models used include Flan-T5 [@flan-t5], LaMini-LM [lamini-lm], and OpenChat [@openchat]. The default models used by this package can be swapped out in future versions to provide improved generation quality. # Future work