Batched Multi-Contextual Token Sampling

A significantly faster implementation of my novel 'needle in a haystack' methodology for SLMs.

Explanation

SLMs struggle to effectively respond in a chat modelling setting, due to their inability to effectively utilise longer context windows. In order to solve this, I propose two key changes to the sampling of logits for Chat SLMs:

All unseen tokens should be masked with negative infinity
The response from the agent should be generated by sampling the highest logit across all batches of previous messages

Together these changes create an interesting interaction experience, as the user acts as the sole source of vocabulary, and therefore the agent evolves to speak in a similar way. In addition to this, the model considers all previous messages in concentrated small context windows. This ensures that all context can be attended to properly by the SLM, allowing it to consistently remember birthdays and events etc.

Performance Improvements over Linear MCTS

Aspect	Batched Multi-Contextual Token Sampling	Linear Multi-Contextual Token Sampling
Tokens considered	30K	20K
Response length	25 tokens	25 tokens
Time constraint	10 seconds	10 seconds
Performance	50% increase	Baseline
Hardware	RTX 3090 (24GB)	RTX 3090 (24GB)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
batched_mcts.ipynb		batched_mcts.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Batched Multi-Contextual Token Sampling

Explanation

Performance Improvements over Linear MCTS

About

Releases

Packages

Languages

georgepullen/batched-multi-contextual-token-sampling

Folders and files

Latest commit

History

Repository files navigation

Batched Multi-Contextual Token Sampling

Explanation

Performance Improvements over Linear MCTS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages