Skip to content

A significantly faster implementation of my novel 'needle in a haystack' methodology for SLMs.

Notifications You must be signed in to change notification settings

georgepullen/batched-multi-contextual-token-sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Batched Multi-Contextual Token Sampling

A significantly faster implementation of my novel 'needle in a haystack' methodology for SLMs.

image

Explanation

SLMs struggle to effectively respond in a chat modelling setting, due to their inability to effectively utilise longer context windows. In order to solve this, I propose two key changes to the sampling of logits for Chat SLMs:

  1. All unseen tokens should be masked with negative infinity
  2. The response from the agent should be generated by sampling the highest logit across all batches of previous messages

Together these changes create an interesting interaction experience, as the user acts as the sole source of vocabulary, and therefore the agent evolves to speak in a similar way. In addition to this, the model considers all previous messages in concentrated small context windows. This ensures that all context can be attended to properly by the SLM, allowing it to consistently remember birthdays and events etc.

Performance Improvements over Linear MCTS

Aspect Batched Multi-Contextual Token Sampling Linear Multi-Contextual Token Sampling
Tokens considered 30K 20K
Response length 25 tokens 25 tokens
Time constraint 10 seconds 10 seconds
Performance 50% increase Baseline
Hardware RTX 3090 (24GB) RTX 3090 (24GB)

About

A significantly faster implementation of my novel 'needle in a haystack' methodology for SLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published