Skip to content
This repository has been archived by the owner on Feb 12, 2018. It is now read-only.

Commit

Permalink
The Great Endockening 1
Browse files Browse the repository at this point in the history
Lots and lots of doc comments
  • Loading branch information
trishume committed Jun 16, 2016
1 parent e841371 commit 93d4ce5
Show file tree
Hide file tree
Showing 11 changed files with 158 additions and 5 deletions.
51 changes: 48 additions & 3 deletions src/highlighting/highlighter.rs
Original file line number Diff line number Diff line change
@@ -1,23 +1,52 @@
/// Code based on https://github.com/defuz/sublimate/blob/master/src/core/syntax/highlighter.rs
/// released under the MIT license by @defuz
//! Iterators and data structures for transforming parsing information into styled text.
// Code based on https://github.com/defuz/sublimate/blob/master/src/core/syntax/highlighter.rs
// released under the MIT license by @defuz

use std::iter::Iterator;

use parsing::{Scope, ScopeStack, ScopeStackOp};
use super::theme::Theme;
use super::style::{Style, StyleModifier, FontStyle, BLACK, WHITE};

/// Basically a wrapper around a `Theme` preparing it to be used for highlighting.
/// This is part of the API to preserve the possibility of caching
/// matches of the selectors of the theme on various scope paths
/// or setting up some kind of accelerator structure.
///
/// So for now this does very little but eventually if you keep it around between
/// highlighting runs it will preserve its cache.
#[derive(Debug)]
pub struct Highlighter<'a> {
theme: &'a Theme, // TODO add caching or accelerator structure
}

#[derive(Debug, Clone)]
/// Keeps a stack of scopes and styles as state between highlighting different lines.
/// If you are highlighting an entire file you create one of these at the start and use it
/// all the way to the end.
///
/// # Caching
///
/// One reason this is exposed is that since it implements `Clone` you can actually cache
/// these (probably along with a `ParseState`) and only re-start highlighting from the point of a change.
/// You could also do something fancy like only highlight a bit past the end of a user's screen and resume
/// highlighting when they scroll down on large files.
///
/// Alternatively you can save space by caching only the `path` field of this struct
/// then re-create the HighlightState when needed by passing that stack as the `initial_stack`
/// parameter to the `new` method. This takes less space but a small amount of time to re-create the style stack.
///
/// **Note:** Caching is for advanced users who have tons of time to maximize performance or want to do so eventually.
/// It is not recommended that you try caching the first time you implement highlighting.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct HighlightState {
styles: Vec<Style>,
pub path: ScopeStack,
}

/// Highlights a line of parsed code given a `HighlightState`
/// and line of changes from the parser.
///
/// It splits a line of text into different pieces each with a `Style`
#[derive(Debug)]
pub struct HighlightIterator<'a, 'b> {
index: usize,
Expand All @@ -29,6 +58,9 @@ pub struct HighlightIterator<'a, 'b> {
}

impl HighlightState {
/// Note that the `Highlighter` is not stored, it is used to construct the initial
/// stack of styles. Most of the time you'll want to pass an empty stack as `initial_stack`
/// but see the docs for `HighlightState` for discussion of advanced caching use cases.
pub fn new(highlighter: &Highlighter, initial_stack: ScopeStack) -> HighlightState {
let mut initial_styles = vec![highlighter.get_default()];
for i in 0..initial_stack.len() {
Expand Down Expand Up @@ -64,6 +96,8 @@ impl<'a, 'b> HighlightIterator<'a, 'b> {
impl<'a, 'b> Iterator for HighlightIterator<'a, 'b> {
type Item = (Style, &'b str);

/// Yields the next token of text and the associated `Style` to render that text with.
/// the concatenation of the strings in each token will make the original string.
fn next(&mut self) -> Option<(Style, &'b str)> {
if self.pos == self.text.len() && self.index >= self.changes.len() {
return None;
Expand Down Expand Up @@ -106,6 +140,8 @@ impl<'a> Highlighter<'a> {
Highlighter { theme: theme }
}

/// The default style in the absence of any matched rules.
/// Basically what plain text gets highlighted as.
pub fn get_default(&self) -> Style {
Style {
foreground: self.theme.settings.foreground.unwrap_or(WHITE),
Expand All @@ -114,6 +150,15 @@ impl<'a> Highlighter<'a> {
}
}

/// Figures out which scope selector in the theme best matches this scope stack.
/// It only returns any changes to the style that should be applied when the top element
/// is pushed on to the stack. These actually aren't guaranteed to be different than the current
/// style. Basically what this means is that you have to gradually apply styles starting with the
/// default and working your way up the stack in order to get the correct style.
///
/// Don't worry if this sounds complex, you shouldn't need to use this method.
/// It's only public because I default to making things public for power users unless
/// I have a good argument nobody will ever need to use the method.
pub fn get_style(&self, path: &[Scope]) -> StyleModifier {
let max_item = self.theme
.scopes
Expand Down
4 changes: 4 additions & 0 deletions src/highlighting/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//! Everything having to do with turning parsed text into styled text.
//! You might want to check out `Theme` for its handy text-editor related
//! settings like selection colour, `ThemeSet` for loading themes,
//! as well as things starting with `Highlight` for how to highlight text.
mod selector;
mod settings;
mod style;
Expand Down
5 changes: 5 additions & 0 deletions src/highlighting/selector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,17 @@
use parsing::{Scope, ScopeStack, MatchPower, ParseScopeError};
use std::str::FromStr;

/// A single selector consisting of a stack to match and a possible stack to exclude from being matched.
/// You probably want `ScopeSelectors` which is this but with union support.
#[derive(Debug, Clone, PartialEq, Eq, Default, RustcEncodable, RustcDecodable)]
pub struct ScopeSelector {
path: ScopeStack,
exclude: Option<ScopeStack>,
}

/// A selector set that matches anything matched by any of its component selectors.
/// See [The TextMate Docs](https://manual.macromates.com/en/scope_selectors) for how these
/// work.
#[derive(Debug, Clone, PartialEq, Eq, Default, RustcEncodable, RustcDecodable)]
pub struct ScopeSelectors {
pub selectors: Vec<ScopeSelector>,
Expand Down
9 changes: 7 additions & 2 deletions src/highlighting/style.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
/// Code based on https://github.com/defuz/sublimate/blob/master/src/core/syntax/style.rs
/// released under the MIT license by @defuz
// Code based on https://github.com/defuz/sublimate/blob/master/src/core/syntax/style.rs
// released under the MIT license by @defuz

/// The foreground, background and font style
#[derive(Debug, Clone, Copy, PartialEq, Eq, RustcEncodable, RustcDecodable)]
pub struct Style {
/// Foreground color.
Expand All @@ -10,6 +12,7 @@ pub struct Style {
pub font_style: FontStyle,
}

/// A change to a `Style` applied incrementally by a theme rule.
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq, RustcEncodable, RustcDecodable)]
pub struct StyleModifier {
/// Foreground color.
Expand Down Expand Up @@ -42,6 +45,7 @@ pub struct Color {
}

bitflags! {
/// This can be a combination of `FONT_STYLE_BOLD`, `FONT_STYLE_UNDERLINE` and `FONT_STYLE_ITALIC`
#[derive(RustcEncodable, RustcDecodable)]
flags FontStyle: u8 {
const FONT_STYLE_BOLD = 1,
Expand All @@ -51,6 +55,7 @@ bitflags! {
}

impl Style {
/// Applies a change to this style, yielding a new changed style
pub fn apply(&self, modifier: StyleModifier) -> Style {
Style {
foreground: modifier.foreground.unwrap_or(self.foreground),
Expand Down
7 changes: 7 additions & 0 deletions src/highlighting/theme.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ use parsing::ParseScopeError;

use self::ParseThemeError::*;

/// A theme parsed from a `.tmTheme` file.
/// Contains fields useful for a theme list as well as `settings` for styling your editor.
#[derive(Debug, Default, RustcEncodable, RustcDecodable)]
pub struct Theme {
pub name: Option<String>,
Expand All @@ -18,6 +20,9 @@ pub struct Theme {
pub scopes: Vec<ThemeItem>,
}

/// Various properties meant to be used to style a text editor.
/// Basically all the styles that aren't directly applied to text like selection colour.
/// Use this to make your editor UI match the highlighted text.
#[derive(Debug, Default, RustcEncodable, RustcDecodable)]
pub struct ThemeSettings {
/// Foreground color for the view.
Expand Down Expand Up @@ -89,6 +94,8 @@ pub struct ThemeSettings {
pub highlight_foreground: Option<Color>,
}

/// A component of a theme meant to highlight a specific thing (e.g string literals)
/// in a certain way.
#[derive(Debug, Default, RustcEncodable, RustcDecodable)]
pub struct ThemeItem {
/// Target scope name.
Expand Down
15 changes: 15 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
//! Welcome to the syntect docs.
//! These are still a work in progress but a lot of the important things have
//! been documented already.
//!
//! May I suggest that you start by reading the `Readme.md` file in the main repo.
//! Once you're done with that you can look at the docs for `parsing::SyntaxSet`
//! and for the `easy` module.
//!
//! Almost everything in syntect is divided up into either the `parsing` module
//! for turning text into text annotated with scopes, and the `highlighting` module
//! for turning annotated text into styled/coloured text.
//!
//! Some docs have example code but a good place to look is the `syncat` example as well as the source code
//! for the `easy` module in `easy.rs` as that shows how to plug the various parts together for common use cases.
extern crate yaml_rust;
extern crate onig;
extern crate walkdir;
Expand All @@ -20,6 +34,7 @@ use std::io::Error as IoError;
use parsing::ParseSyntaxError;
use highlighting::{ParseThemeError, SettingsError};

/// Common error type used by syntax and theme loading
#[derive(Debug)]
pub enum LoadingError {
WalkDir(walkdir::Error),
Expand Down
2 changes: 2 additions & 0 deletions src/parsing/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
//! Everything about parsing text into text annotated with scopes.
//! The most important struct here is `SyntaxSet`, check out the docs for that.
pub mod syntax_definition;
mod yaml_load;
mod syntax_set;
Expand Down
26 changes: 26 additions & 0 deletions src/parsing/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,22 @@ use onig::{self, Region};
use std::usize;
use std::i32;

/// Keeps the current parser state (the internal syntax interpreter stack) between lines of parsing.
/// If you are parsing an entire file you create one of these at the start and use it
/// all the way to the end.
///
/// # Caching
///
/// One reason this is exposed is that since it implements `Clone` you can actually cache
/// these (probably along with a `HighlightState`) and only re-start parsing from the point of a change.
/// See the docs for `HighlightState` for more in-depth discussion of caching.
///
/// This state doesn't keep track of the current scope stack and parsing only returns changes to this stack
/// so if you want to construct scope stacks you'll need to keep track of that as well.
/// Note that `HighlightState` contains exactly this as a public field that you can use.
///
/// **Note:** Caching is for advanced users who have tons of time to maximize performance or want to do so eventually.
/// It is not recommended that you try caching the first time you implement highlighting.
#[derive(Debug, Clone)]
pub struct ParseState {
stack: Vec<StateLevel>,
Expand Down Expand Up @@ -40,6 +56,16 @@ impl ParseState {
}
}

/// Parses a single line of the file. Because of the way regex engines work you unfortunately
/// have to pass in a single line contigous in memory. This can be bad for really long lines.
/// Sublime Text avoids this by just not highlighting lines that are too long (thousands of characters).
///
/// For efficiency reasons this returns only the changes to the current scope at each point in the line.
/// You can use `ScopeStack#apply` on each operation in succession to get the stack for a given point.
/// Look at the code in `highlighter.rs` for an example of doing this for highlighting purposes.
///
/// The vector is in order both by index to apply at (the `usize`) and also by order to apply them at a
/// given index (e.g popping old scopes before pusing new scopes).
pub fn parse_line(&mut self, line: &str) -> Vec<(usize, ScopeStackOp)> {
assert!(self.stack.len() > 0,
"Somehow main context was popped from the stack");
Expand Down
24 changes: 24 additions & 0 deletions src/parsing/scope.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,20 @@ lazy_static! {
pub static ref SCOPE_REPO: Mutex<ScopeRepository> = Mutex::new(ScopeRepository::new());
}

/// A hierarchy of atoms with semi-standardized names
/// used to accord semantic information to a specific piece of text.
/// Generally written with the atoms separated by dots.
/// By convention atoms are all lowercase alphanumeric.
///
/// Example scopes: `text.plain`, `punctuation.definition.string.begin.ruby`,
/// `meta.function.parameters.rust`
///
/// `syntect` uses an optimized format for storing these that allows super fast comparison
/// and determining if one scope is a prefix of another. It also always takes 16 bytes of space.
/// It accomplishes this by using a global repository to store string values and using bit-packed
/// 16 bit numbers to represent and compare atoms. Like "atoms" or "symbols" in other languages.
/// This means that while comparing and prefix are fast, extracting a string is relatively slower
/// but ideally should be very rare.
#[derive(Clone, PartialEq, Eq, Copy, Default)]
pub struct Scope {
a: u64,
Expand All @@ -34,11 +48,21 @@ pub struct ScopeRepository {
atom_index_map: HashMap<String, usize>,
}

/// A stack/sequence of scopes. This is used both to represent hierarchies for a given
/// token of text, as well as in `ScopeSelectors`. Press `ctrl+shift+p` in Sublime Text
/// to see the scope stack at a given point.
/// Also see [the TextMate docs](https://manual.macromates.com/en/scope_selectors).
///
/// Example for a JS string inside a script tag in a Rails `ERB` file:
/// `text.html.ruby text.html.basic source.js.embedded.html string.quoted.double.js`
#[derive(Debug, Clone, PartialEq, Eq, Default, RustcEncodable, RustcDecodable)]
pub struct ScopeStack {
scopes: Vec<Scope>,
}

/// A change to a scope stack. Generally `Noop` is only used internally and you don't have
/// to worry about ever getting one back from a public function.
/// Use `ScopeStack#apply` to apply this change.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ScopeStackOp {
Push(Scope),
Expand Down
17 changes: 17 additions & 0 deletions src/parsing/syntax_definition.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! This module contains data structures for representing syntax definitions.
//! Everything is public because I want this library to be useful in super
//! integrated cases like text editors and I have no idea what kind of monkeying
//! you might want to do with the data. Perhaps parsing your own syntax format
//! into this data structure?
use std::collections::HashMap;
use onig::{self, Regex, Region, Syntax};
use std::rc::{Rc, Weak};
Expand All @@ -9,6 +14,13 @@ use rustc_serialize::{Encodable, Encoder, Decodable, Decoder};
pub type CaptureMapping = HashMap<usize, Vec<Scope>>;
pub type ContextPtr = Rc<RefCell<Context>>;

/// The main data structure representing a syntax definition loaded from a
/// `.sublime-syntax` file. You'll probably only need these as references
/// to be passed around to parsing code.
///
/// Some useful public fields are the `name` field which is a human readable
/// name to display in syntax lists, and the `hidden` field which means hide
/// this syntax from any lists because it is for internal use.
#[derive(Debug, RustcEncodable, RustcDecodable)]
pub struct SyntaxDefinition {
pub name: String,
Expand Down Expand Up @@ -37,6 +49,9 @@ pub enum Pattern {
Include(ContextReference),
}

/// Used to iterate over all the match patterns in a context.
/// Basically walks the tree of patterns and include directives
/// in the correct order.
#[derive(Debug, RustcEncodable, RustcDecodable)]
pub struct MatchIter {
ctx_stack: Vec<ContextPtr>,
Expand All @@ -54,6 +69,8 @@ pub struct MatchPattern {
pub with_prototype: Option<ContextPtr>,
}

/// This wrapper only exists so that I can implement a serialization
/// trait that crashes if you try and serialize this.
#[derive(Debug)]
pub struct LinkerLink {
pub link: Weak<RefCell<Context>>,
Expand Down
3 changes: 3 additions & 0 deletions src/parsing/syntax_set.rs
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@ impl SyntaxSet {
self.syntaxes.push(syn);
}

/// Finds a syntax by its default scope, for example `source.regexp` finds the regex syntax.
/// This and all similar methods below do a linear search of syntaxes, this should be fast
/// because there aren't many syntaxes, but don't think you can call it a bajillion times per second.
pub fn find_syntax_by_scope<'a>(&'a self, scope: Scope) -> Option<&'a SyntaxDefinition> {
self.syntaxes.iter().find(|&s| s.scope == scope)
}
Expand Down

0 comments on commit 93d4ce5

Please sign in to comment.