Skip to content

Commit

Permalink
Update paper
Browse files Browse the repository at this point in the history
  • Loading branch information
jncraton committed Feb 24, 2024
1 parent 8cd62fc commit 80ecd6a
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 11 deletions.
2 changes: 1 addition & 1 deletion makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ doc:
python3 -m pdoc -o doc languagemodels

paper.pdf: paper.md paper.bib
pandoc $< --citeproc -o $@
pandoc $< --citeproc --pdf-engine=xelatex -o $@

spellcheck:
aspell -c --dont-backup readme.md
Expand Down
31 changes: 31 additions & 0 deletions paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -192,3 +192,34 @@ @article{zhao2023survey
journal={arXiv preprint arXiv:2303.18223},
year={2023}
}

@inproceedings{ctranslate2,
title={The OpenNMT neural machine translation toolkit: 2020 edition},
author={Klein, Guillaume and Hernandez, Fran{\c{c}}ois and Nguyen, Vincent and Senellart, Jean},
booktitle={Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)},
pages={102--109},
year={2020}
}

@article{lamini-lm,
author = {Minghao Wu and
Abdul Waheed and
Chiyu Zhang and
Muhammad Abdul-Mageed and
Alham Fikri Aji
},
title = {LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions},
journal = {CoRR},
volume = {abs/2304.14402},
year = {2023},
url = {https://arxiv.org/abs/2304.14402},
eprinttype = {arXiv},
eprint = {2304.14402}
}

@article{openchat,
title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data},
author={Wang, Guan and Cheng, Sijie and Zhan, Xianyuan and Li, Xiangang and Song, Sen and Liu, Yang},
journal={arXiv preprint arXiv:2309.11235},
year={2023}
}
19 changes: 9 additions & 10 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,17 @@ bibliography: paper.bib

# Statement of Need

Large language models are starting to change the way software is designed [@mialon2023augmented]. The development of the transformer [@vaswani2017attention] has led to rapid progress in many NLP and generative tasks [@zhao2023survey; @bert; @gpt2; @gpt3; @t5; @palm; @flan-t5; @bubeck2023sparks]. These models are becoming more powerful as they scale in both parameters [@kaplan2020scaling] and training data [@hoffmann2022training].
Large language models are having an impact on the way software is designed [@mialon2023augmented]. The development of the transformer [@vaswani2017attention] has led to rapid progress in many NLP and generative tasks [@zhao2023survey; @bert; @gpt2; @gpt3; @t5; @palm; @flan-t5; @bubeck2023sparks]. These models are becoming more powerful as they scale in both parameters [@kaplan2020scaling] and training data [@hoffmann2022training].

Early research suggests that there are many tasks performed by humans that can be transformed by LLMs [@eloundou2023gpts]. For example, large language models trained on code [@codex] are already being used as capable pair programmers via tools such as Microsoft's Copilot. To build with these technologies, students need to understand their capabilities and begin to learn new paradigms for programming.

There are many software tools already available for working with large language models [@hftransformers; @pytorch; @tensorflow; @langchain; @llamacpp; @gpt4all]. While these options serve the needs of software engineers, researchers, and hobbyists, they may not be simple enough for new learners. This package aims to radically lower the barriers to entry for using these tools to solve problems.
There are many software tools already available for working with large language models [@hftransformers; @pytorch; @tensorflow; @langchain; @llamacpp; @gpt4all]. While these options serve the needs of software engineers, researchers, and hobbyists, they may not be simple enough for new learners. This package aims to lower the barriers to entry for using these tools in an educational context.

\newpage

# Example Usage

This package eliminates boilerplate and configuration options that are meaningless to new learners, and uses basic types and simple functions. Here's an example from a Python REPL session:
This package eliminates boilerplate and configuration options that create noise for new learners, and uses basic types and simple functions. Here's an example from a Python REPL session:

```python
>>> import languagemodels as lm
Expand All @@ -44,9 +44,9 @@ This package eliminates boilerplate and configuration options that are meaningle
'Hello, world!'

>>> lm.do("What is the capital of France?")
'paris'
'Paris.'

>>> lm.classify("Language models are useful", "positive", "negative")
>>> lm.do("Classify as positive or negative: I like games", choices=["positive", "negative"])
'positive'

>>> lm.extract_answer("What color is the ball?", "There is a green ball and a red box")
Expand All @@ -58,7 +58,7 @@ This package eliminates boilerplate and configuration options that are meaningle
>>> lm.store_doc(lm.get_wiki("Python"), "Python")
>>> lm.store_doc(lm.get_wiki("Javascript"), "Javascript")
>>> lm.get_doc_context("What does it mean for batteries to be included in a language?")
'Python: It is often described as a "batteries included" language due to its comprehensive standard library...
'From Python document: It is often described as a "batteries included" language due to its comprehensive standard library...
```

# Features
Expand All @@ -67,8 +67,7 @@ Despite its simplicity, this package provides a number of building blocks that c

- Text generation via the `complete` function
- Instruction following with the `do` function
- Chat-style inference using `chat` function
- Zero-shot classification with the `classify` function
- Zero-shot classification with the `do` function and `choices` parameter
- Semantic search via a document store using the `store_doc` and `get_doc_context` functions
- Extractive question answering using the `extract_answer` function
- Basic web retrieval using the `get_wiki` function
Expand All @@ -83,9 +82,9 @@ The package includes the following features under the hood

# Implementation

The design of this software package allows its internals to be loosely coupled to the models and inference engines it uses. At the time of writing, rapid progress is being made to speed up inference on consumer hardware, but much of this software is difficult to install and may not work well for all learners.
The design of this software package allows its interface to be loosely coupled to the models and inference engines it uses. Progress is being made to speed up inference on consumer hardware, and this package seeks to find a balance between inference efficiency, software stability, and broad hardware support.

This package currently uses the Hugging Face Transformers library [@hftransformers], which internally uses PyTorch [@pytorch] for inference. The main model used is a variant of the T5 base model [@t5] that has been fine-tuned to better follow instructions [@flan-t5]. Models that focus on inference efficiency are starting to become available [@llama]. It will be possible to replace the internals of this package with more powerful and efficient models in the future. In addition to simple local inference, it is also possible to provide API keys to the package to allow access to more powerful hosted inference services.
This package currently uses CTranslate2 [@ctranslate2] for efficient inference on CPU and GPU. The main models used include Flan-T5 [@flan-t5], LaMini-LM [lamini-lm], and OpenChat [@openchat]. The default models used by this package can be swapped out in future versions to provide improved generation quality.

# Future work

Expand Down

0 comments on commit 80ecd6a

Please sign in to comment.