Add a first example for the new Dlib layers to build a transform-type network #3041
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This example demonstrates a minimal implementation of a Very Small Language Model (VSLM) using Dlib's Transformer architecture.
The code showcases key features of the new Transformer layers, including attention mechanisms, positional embeddings, and a classification head, while maintaining a simple character-based tokenization approach.
Using Shakespeare's text as training data, the example illustrates both the training process and text generation capabilities, making it an excellent educational tool for understanding Transformer architecture basics.
The implementation is intentionally kept lightweight with a small parameter count to ensure quick training and generation while still achieving perfect memorization of training sequences, demonstrating the effectiveness of attention mechanisms in sequence learning tasks.