diff --git a/_pkgdown.yml b/_pkgdown.yml
index 210132cd..a4a73e0c 100644
--- a/_pkgdown.yml
+++ b/_pkgdown.yml
@@ -89,3 +89,5 @@ articles:
navbar: ~
contents:
- articles/validation
+ - intro-xml
+ - intro-episode
diff --git a/vignettes/intro-episode.Rmd b/vignettes/intro-episode.Rmd
new file mode 100644
index 00000000..c7e969a6
--- /dev/null
+++ b/vignettes/intro-episode.Rmd
@@ -0,0 +1,461 @@
+---
+title: "Introduction to the Episode Object"
+output: rmarkdown::html_vignette
+vignette: >
+ %\VignetteIndexEntry{Introduction to the Episode Object}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+ collapse = TRUE,
+ comment = "#>"
+)
+```
+
+## Introduction
+
+The {pegboard} package facilitates the analysis and manipulation of Markdown and
+R Markdown files by translating them to XML and back again. This extends the
+{tinkr} package by providing additional methods that are specific for
+Carpentries-style lessons. There are two `R6` classes defined in {pegboard}:
+
+ - `Episode` objects that contain the XML data, YAML metadata and extra fields
+ that define the child and parent files for a particular episode
+ - `Lesson` objects that contain lists of `Episode` objects categorised as
+ "episodes", "extra", or "children"
+
+This vignette will be discussing the structure of Episode objects, how to
+query the contents with the {xml2} package, and how to use the methods and
+active bindings to get information about, extract, and manipulate anything
+inside of a Markdown or R Markdown document.
+
+## Reading Markdown Content
+
+Each `Episode` object starts from a Markdown file. In particular for {pegboard},
+we assume that this Markdown file is written using
+[Pandoc](https://pandoc.org/MANUAL.html) syntax (a superset of
+[CommonMark](https://commonmark.org/)). It can be any markdown file, but for us
+to explore what the `Episode` object has to offer us, let's take an example R
+Markdown file that is present in a fragment of a Carpentries Workbench lesson
+that we have in this package. We will be using the {xml2} package to explore
+the object and the {fs} package to help with constructing file paths.
+
+```{r setup}
+library("pegboard")
+library("xml2")
+library("fs")
+```
+
+This is what our lesson fragment looks like. It is a fragment because it's main
+purpose is to be used for examples and tests, but it contains the basic structure
+of a lesson that we want.
+
+```{r intro-read-noshow, echo = FALSE}
+dir_tree(lesson_fragment("sandpaper-fragment"), recurse = 1, regex = "site/[^R].*", invert = TRUE)
+```
+
+We can retrieve it with the `lesson_fragment()` function, which loads example
+data from pegboard. Here we will take that lesson fragment and read in the first
+episode with the initialization method, `Episode$new()`, followed by
+`$confirm_sandpaper()`, a confirmation that the episode was created to work
+with [{sandpaper}], the user interface and build engine of The Carpentries
+Workbench (for information on non-workbench content, see the section on [Jekyll
+Lesson Markdown Content](#jekyll-lesson-markdown-content)) and `$protect_math()`
+which will prevent special characters in LaTeX math from being escaped.
+
+[{sandpaper}]: https://carpentries.github.io/sandpaper/
+
+```{r intro-read}
+lsn <- lesson_fragment("sandpaper-fragment")
+# Read in the intro.Rmd document as an `Episode` object
+intro_path <- path(lsn, "episodes", "intro.Rmd")
+intro <- Episode$new(intro_path)$confirm_sandpaper()$protect_math()
+```
+
+If we print out the Episode object, I'm going to get a long list of methods,
+fields and active bindings (functions that act like fields) printed:
+
+```{r intro-print}
+intro
+```
+
+The actual XML content is in the `$body` field. This contains all the data from
+the markdown document, but in XML form.
+
+```{r intro-body}
+intro$body
+```
+
+If we want to see what the contents look like, you can use the `$show()`,
+`$head()`, or `$tail()` methods (note: the `$show()` method will print out the
+entire markdown document).
+
+```{r intro-show}
+intro$head(10)
+intro$tail(10)
+intro$show()
+```
+
+## File information
+
+For information about the file and its relationship to other files, you can use
+the following active bindings, which are useful when working with Episodes in a
+lesson context.
+
+```{r file-active-bindings}
+intro$path
+intro$name
+intro$lesson
+# NOTE: relationships to other episodes are automatically handled in the
+# Lesson context
+intro$has_parents
+intro$has_children
+intro$children # separate documents processed as if they were part of this document
+intro$parents # the immediate documents that would require this document to build
+intro$build_parents # the final documents that would require this document to build
+```
+
+## Accessing Markdown Elements
+
+The `Episode` object is centered around the `$body` item, which contains the XML
+representation of document. It is possible to find markdown elements from XPath
+statments:
+
+```{r xpath-active-bindings}
+xml2::xml_find_all(intro$body, ".//md:link", ns = intro$ns)
+xml2::xml_find_first(intro$body, ".//md:list[@type='ordered']", ns = intro$ns)
+```
+
+However, there are some useful elements that we want to know about, so I have
+implemented them in active bindings and methods:
+
+
+```{r active-bindings}
+# headings where level 2 headings are equivalent to sections
+intro$headings
+# all callouts/fenced divs
+intro$get_divs()
+intro$challenges
+intro$solutions
+# questions, objectives, and keypoints are standard and return char vectors
+intro$objectives
+intro$questions
+intro$keypoints
+# code blocks and output types
+intro$code
+intro$output
+intro$warning
+intro$error
+# images and links
+intro$images
+intro$get_images() # parses images embedded in `` tags
+intro$links
+```
+
+Much of these are summarized in the `$summary()` method:
+
+```{r summary}
+intro$summary()
+```
+
+## Code blocks and code chunks
+
+In markdown, a **code block** is written with fences of at least three backtick
+characters (`` ` ``) followed by the language for syntax highlighting:
+
+````markdown
+
+List all files in reverse temporal order, printing their sizes in
+a human-readable format:
+
+```bash
+ls -larth /path/to/folder
+```
+````
+
+> List all files in reverse temporal order, printing their sizes in
+> a human-readable format:
+>
+> ````bash
+> ls -larth /path/to/folder
+> ````
+
+When these are processed by {pegboard}, the resulting XML has this structure
+where the backticks inform that kind of node (`code_block`) and the language
+type is known as the "info" attribute. Everything inside the code block is the
+node text and has whitespace preserved
+
+````{r show-code-block, echo = FALSE, results = 'asis'}
+cb <- "```bash
+
+ls -larth /path/to/folder
+```"
+cbx <- xml2::read_xml(commonmark::markdown_xml(cb))
+txt <- as.character(xml2::xml_find_first(cbx, ".//d1:code_block"))
+writeLines(c("```xml", txt, "```"))
+````
+
+In R Markdown, there are special code blocks that are called code chunks that
+can be dynamically evaluated. These are distinguished by the curly braces
+around the language specifier and [optional
+attributes](https://yihui.org/knitr/options/) that control the output of the
+chunk.
+
+````{verbatim}
+
+There is a code chunk here that will produce a plot, but not show the code:
+
+```{r chunky, echo=FALSE, fig.alt="a plot of y = mx + b for m = 1 and b = 0"}
+plot(1:10, type = "l")
+```
+
+````
+
+
+> There is a code chunk here that will produce a plot, but not show the code:
+>
+> ````{r chunk-name, echo = FALSE, fig.alt="a plot of y = mx + b for m = 1 and b = 0"}
+> plot(1:10, type = "l")
+> ````
+
+When this is processed with {pegboard}, the "info" part of the code block is
+further split into "language", "name" and further attributes based on the chunk
+options:
+
+````{r show-code-chunk, echo = FALSE, results = 'asis'}
+
+chunk <- 'There is a code chunk here that will produce a plot, but not show the code:
+
+```{r chunky, echo=FALSE, fig.alt="a plot of y = mx + b for m = 1 and b = 0"}
+
+plot(1:10, type = "l")
+```'
+tmp <- tempfile()
+writeLines(chunk, tmp)
+chunky <- pegboard::Episode$new(tmp)$code[[1]]
+xml2::xml_set_attr(chunky, "sourcepos", NULL)
+txt <- as.character(chunky)
+writeLines(c("```xml", txt, "```"))
+unlink(tmp)
+````
+
+Both code blocks will be encountered, but the difference between them is that
+the R Markdown code chunks will have the "language" attribute. This is an
+important concept to know about when you are searching and manipulating R
+Markdown documents with XPath
+(see `vignette("intro-xml", package = "pegboard")`). The next section will walk
+through some aspects of manipulation that we can do with these documents.
+
+## Manipulation
+
+Because everything centers around the `$body` element and is extracted with
+{xml2}, it's possible to manipulate the elements of the document. One thing that
+is possible is that we can add new content to the document using the `$add_md()`
+method, which will add a markdown element after any paragraph in the document.
+
+For example, we can add information about pegboard with a new code block after
+the first heading:
+
+````{r add-code-block}
+intro$head(26) # first 26 lines
+intro$body # first heading is item 11
+cb <- c("You can clone the **{pegboard} package**:
+
+```sh
+git clone https://github.com/carpentries/pegboard.git
+```
+")
+intro$add_md(cb, where = 11)
+intro$head(26) # code block has been added
+intro$code
+````
+
+You can also manipulate existing elements. For example, let's say we wanted to
+make sure all R code chunks were named. We can do so by querying and
+manipulating the code blocks:
+
+```{r update-code-block}
+code <- intro$code
+code
+# executable code chunks will have the "language" attribute
+is_chunk <- xml2::xml_has_attr(code, "language")
+chunks <- code[is_chunk]
+chunk_names <- xml2::xml_attr(chunks, "name")
+nonames <- chunk_names == ""
+chunk_names[nonames] <- paste0("chunk-", seq(sum(nonames)))
+xml2::xml_set_attr(chunks, "name", chunk_names)
+code
+```
+
+We can see that the chunks now have names, but the proof is in the rendering:
+
+```{r show-updated}
+intro$show()
+```
+
+One of the things about manipulating these documents in code is that it is
+possible to go back and reset if things are not correct, which is why we have
+the `$reset()` method:
+
+```{r}
+intro$reset()$confirm_sandpaper()$protect_math()$head(25)
+```
+
+## Jekyll Lesson Markdown Content
+
+This section describes the features that you would expect to find in a lesson
+that was built with the former infrastructure,
+, which was built using the Jekyll
+static site generator. These style lessons are no longer supported by The
+Carpentries. {pegboard} does support these lessons so that they can be
+transitioned to use The Workbench syntax via [The Carpentries Lesson Transition
+Tool](https://github.com/carpentries/lesson-transition#readme). This
+was the _first_ syntax that was supported by {pegboard} because the package was
+written initially as a way to explore the structure of our lessons.
+
+### The Syntax of Jekyll Lessons
+
+The former Jekyll syntax used [kramdown-flavoured
+markdown](https://kramdown.gettalong.org/syntax.html), which evolved separately
+from [commonmark](https://spec.commonmark.org/), the syntax that {pegboard}
+knows and that Pandoc-flavoured markdown extends. One of the key differences
+with the kramdown syntax is that it used something known as [Inline Attribute
+Lists (IAL)](https://kramdown.gettalong.org/syntax.html#inline-attribute-lists) to
+help define classes for markdown elements. These elements were formated as
+`{: }` where `` is replaced by class definitions and
+key/value pairs. They always appear _after_ the relevant block which lead to
+code blocks that looked like this:
+
+````markdown
+~~~
+ls -larth /path/to/dir
+~~~
+{: .language-bash}
+````
+
+Moreover, to achieve the special callout blocks, we used blockquotes that were
+given special classes (which is an accessbility no-no because those blocks were
+not semantic HTML) and the nesting of these block quotes looked like this:
+
+
+````markdown
+> ## Challenge
+>
+> How do you list all files in a directory in reverse order by the time it was
+> last updated?
+>
+> > ## Solution
+> >
+> > ~~~
+> > ls -larth /path/to/dir
+> > ~~~
+> > {: .language-bash}
+> {: .solution}
+{: .challenge}
+````
+
+One of the biggest challenges with this for authors was that, unless you used an
+editor like vim or emacs, this was difficult to write with all the prefixed
+blockquote characters and keeping track of which IALs belonged to which block.
+
+### Special methods and active bindings
+
+```{r setup-again}
+library("pegboard")
+library("xml2")
+library("fs")
+```
+
+Episodes written in the Jekyll syntax have special functions and active bindings
+that allow them to be analyzed and transformed to Workbench episodes. Here is an
+example from a lesson fragment:
+
+
+```{r jekyll-fragment-read}
+lf <- lesson_fragment()
+ep <- Episode$new(path(lf, "_episodes", "14-looping-data-sets.md"))
+# show relevant sections of head and tail
+ep$head(29)
+ep$tail(53)
+```
+
+Notice that the questions, objectives, and keypoints are in the yaml frontmatter.
+This is why we have an accessor that returns the list instead of the node, for
+compatibility with the Jekyll lessons:
+
+```{r qok}
+ep$questions
+ep$objectives
+ep$keypoints
+```
+
+Even though the challenges are formatted differently, the accessors will still
+return them correctly:
+
+```{r challenges}
+ep$challenges
+ep$solutions
+```
+
+You can also get _all_ of the block quotes using the `$get_blocks()` method.
+NOTE: this will extract _all_ block quotes (including those that do not have
+the `ktag` attributes.
+
+```{r get_blocks}
+ep$get_blocks() # default is all top-level blocks (challenges/callouts)
+ep$get_blocks(level = 2) # nested blocks are usually solutions
+ep$get_blocks(level = 0) # level zero is all levels
+ep$get_blocks(type = ".solution", level = 0) # filter by type
+```
+
+One of the things that was advantageous about blockquotes is that we could
+analyze the pathway through the blockquotes and figure out how they were comonly
+written in a lesson. The `$get_challenge_graph()` creates a data frame that
+describes these relationships:
+
+```{r get-challenge-graph}
+ep$get_challenge_graph()
+```
+
+You might notice that there is an attribute called `ktag`. When a
+Jekyll-formatted episode is read in, all of the IAL tags are processed and
+placed in an attribute called `ktag` (**k**ramdown **tag**), which is
+accessible via the `$tags` active binding. This is needed because commonmark
+does not know how to process postfix tags and it is important for the
+translation to commonmark syntax:
+
+```{r ktags}
+ep$tags
+xml2::xml_parent(ep$tags)
+```
+
+
+### Transformation
+
+It was always known that we would want to use a different syntax to write the
+lessons as much of the community struggled with the kramdown syntax and it
+was difficult to parse and validate. The automated transformation workflow is
+what powers the Lesson Transformation Tool and we have composed it into a few
+basic steps:
+
+1. transform block quotes to fenced divs
+2. removing the jekyll syntax, liquid templating, and fix relative links
+3. moving the yaml frontmatter
+
+The process looks like this composable chain of methods:
+
+```{r}
+ep$reset()
+ep$
+ unblock()$
+ use_sandpaper()$
+ move_questions()$
+ move_objectives()$
+ move_keypoints()
+ep$head(31)
+ep$tail(65)
+```
+
+
diff --git a/vignettes/intro-xml.Rmd b/vignettes/intro-xml.Rmd
new file mode 100644
index 00000000..a43a44b4
--- /dev/null
+++ b/vignettes/intro-xml.Rmd
@@ -0,0 +1,536 @@
+---
+title: "Working with XML data"
+output: rmarkdown::html_vignette
+vignette: >
+ %\VignetteIndexEntry{Working with XML data}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+ collapse = TRUE,
+ comment = "#>"
+)
+```
+
+## Introduction
+
+You will want to read this vignette if you are interested in contributing to
+{pegboard}, or if you would like to understand how to fine-tune the transition of
+a lesson from the styles infrastructure to The Workbench (see
+), or if you want to
+know how to better inspect the output of some of {pegboard}'s accessors. In
+this vignette, I assume that you are familiar with writing R functions and that
+R will default to passing an object's _value_ to a function and not a
+_reference_ (though if you do not understand that last part, do not worry, I
+will try to dispell this).
+
+The {pegboard} package is an enhancement of the {tinkr} package, which
+transforms Markdown to XML and back again. [XML is a markup language that
+is derived from HTML](https://www.geeksforgeeks.org/html-vs-xml/) designed to
+handle structured data. A more modern format for storing and transporting data
+on the web is JSON, but the advantage of using XML is that we are able to use the
+[XPath] language to parse it (more on that later). Moreover, because XML has
+the same structure as HTML, it can be parsed using the same tools, which is
+advantageous for a suite of packages that transforms Markdown to HTML. This
+transformation is facilitated by the [{commonmark}] for transforming Markdown
+to XML and [{xslt}] for transforming XML to Markdown.
+
+[{commonmark}]: https://docs.ropensci.org/commonmark/
+[{xslt}]: https://docs.ropensci.org/xslt/
+[{xml2}]: https://xml2.r-lib.org/
+[XPath]: https://en.wikipedia.org/wiki/XPath
+
+## Motivating Example
+
+During the lesson transition, I was often faced with situations that required
+me to perform intricate replacements in documents while preserving the structure.
+One such example is transitioning the "workshop" or "overview" lessons that did
+not have any episodes and relied on separate child documents to separate out
+redundant elements. Let's say we had a file called `setup.md` and two other
+files called `setup-python.md` and `setup-r.md` that look like this:
+
+`setup.md`:
+
+````markdown
+## Setup Instructions
+
+### Python
+
+{% include setup-python.md%}
+
+### R
+
+{% include setup-r.md %}
+````
+
+`setup-python.md`:
+
+````markdown
+Install _python_ from **anaconda**
+````
+
+`setup-r.md`:
+
+````markdown
+Install _R_ from **CRAN**
+````
+
+The output of `setup.md` when its rendered would include the text from both
+`setup-python.md` and `setup-r.md`, but the thing is, the `{% include %}` tags
+are a syntax that is specific to Jekyll. Instead, for The Workbench, we wanted
+to use the [R Markdown child document
+declaration](https://bookdown.org/yihui/rmarkdown-cookbook/child-document.html),
+so that `setup.md` would look like this:
+
+`setup.md`:
+
+````{verbatim}
+## Setup Instructions
+
+### Python
+
+```{r child="files/setup-python.md"}
+```
+
+### R
+
+```{r child="files/setup-r.md"}
+```
+````
+
+
+```{r setup-setup}
+setup_file <- tempfile(fileext=".md")
+stp <- "## Setup Instructions
+
+### Python
+
+{% include setup-python.md%}
+
+### R
+
+{% include setup-r.md %}
+"
+writeLines(stp, setup_file)
+```
+
+By using the following function (originally in
+[lesson-transition/datacarpentry/ecology-workshop.R](https://github.com/carpentries/lesson-transition/blob/f8edb10b2e13a926e3df9ba522983f930d0ee19b/datacarpentry/ecology-workshop.R#L23-L44)), it was possible:
+
+```{r child-from-include}
+child_from_include <- function(from, to = NULL) {
+ to <- if (is.null(to)) fs::path_ext_set(from, "Rmd") else to
+ rlang::inform(c(i = from))
+ ep <- pegboard::Episode$new(from)
+ # find all the {% include file.ext %} statements
+ includes <- xml2::xml_find_all(ep$body,
+ ".//md:text[starts-with(text(), '{% include')]", ns = ep$ns)
+ # trim off everything but our precious file path
+ f <- gsub("[%{ }]|include", "", xml2::xml_text(includes))
+ # give it a name
+ fname <- paste0("include-", fs::path_ext_remove(f))
+ # make sure the file path is correct
+ f <- sQuote(fs::path("files", f), q = FALSE)
+ p <- xml2::xml_parent(includes)
+ # remove text node
+ xml2::xml_remove(includes)
+ # change paragraph node to a code block and add chunk attributes
+ xml2::xml_set_name(p, "code_block")
+ xml2::xml_set_attr(p, "language", "r")
+ xml2::xml_set_attr(p, "child", f)
+ xml2::xml_set_attr(p, "name", fname)
+ fs::file_move(from, to)
+ ep$write(fs::path_dir(to), format = "Rmd")
+}
+writeLines(readLines(setup_file)) # show the file
+child_from_include(setup_file)
+writeLines(readLines(fs::path_ext_set(setup_file, "Rmd"))) # show the file
+```
+
+This is only a small peek of what is possible with XML data and if you are
+familiar with R, some of this may seem like strange syntax. If you would like
+to understand a bit more, read on.
+
+## Working with XML data
+
+Each `Episode` object contains a field (you can think of each field as a list
+element) called `$body`, which contains an {xml2} document. This is the core of
+the `Episode` object and every function works in some way with this field.
+
+### The memory of XML objects
+
+For the casual R user (and even for the more experienced), the way you use this
+package may seem a little strange. This is because in R, functions will not
+have side effects, but the vast majority of methods in the `Episode` object
+will modify the object itself and this all has to do with the way XML data is
+handled in R by the {xml2} package.
+
+Normally in R, when you pass data to a function, it will make a copy of the
+data and then apply the function to the copy of the data:
+
+```{r}
+x <- 1:10
+f <- function(x) {
+ # insert 99 after the fourth position in a vector
+ return(append(x, 99, after = 4))
+}
+print(f(x))
+# note that x is not modified
+print(x)
+```
+
+When working with XML in R, the {xml2} package is unparalleled, but it leads to
+surprising outcomes because when you modify content within an XML object, you
+are modifying the object in place:
+
+```{r xml-example}
+x <- xml2::read_xml("")
+print(x)
+f <- function(x, new = "c") {
+ xml2::xml_add_child(x, new, .where = xml2::xml_length(x))
+ return(x)
+}
+y <- f(x)
+# note that x and y are identical
+print(x)
+print(y)
+```
+
+It gets a bit stranger when you consider that in the above code, `y` and `x` are
+_exactly the same object_ as shown with the fact that if I manipulate `y`, then
+`x` will also be modified:
+
+```{r xml-example-dup}
+f(y, "d")
+print(y)
+print(x)
+```
+
+I can even extract child elements from the XML document and manipulate _those_
+and have them be reflected in the parent. For example, if I extract the second
+child of the document, and then apply the `cool="verified"` attribute to the
+child, it will be reflected in the parent document.
+
+```{r xml-example-child}
+child <- xml2::xml_child(x, 2)
+xml2::xml_set_attr(child, "cool", "verified")
+print(child)
+print(x)
+print(y)
+```
+
+This persistance lends itself very well to using the {R6} package for creating
+objects that work in a more object-oriented way (where methods belong to classes
+instead of the other way around). If you are familiar with how Python methods
+work, then you will be mostly familiar with how the {R6} objects behave. It is
+worthwhile to read the [{R6} introduction
+vignette](https://r6.r-lib.org/articles/Introduction.html) if you want to
+understand how to program and modify this package.
+
+In the example above, you notice that I use `xml2::xml_child()` to extract child
+nodes, but the real power of XML comes with searching for items using XPath
+syntax for traversing the XML nodes where I would be able to do one of the
+following to get the child called "c"
+
+```{r xml-example-xpath}
+xml2::xml_find_first(x, ".//c")
+xml2::xml_find_first(x, "/a/c")
+```
+
+The next section will cover a bit of XPath and provide some resources on how to
+practice and learn because this comes in very handy to quickly traverse the XML
+nodes without relying on loops.
+
+## Using XPath to parse XML
+
+### The structure of XPath
+
+In the section, we will talk about [XPath syntax][XPath-1.0], but it will be
+non-exhaustive. Unfortunately, good tutorials on the web are few and far between,
+but here are some that can help:
+
+ - The [MDN documentation](https://developer.mozilla.org/en-US/docs/Web/XPath)
+ is _usually_ pretty good, but instead, it's better as a reference
+ - [MDN XPath Axes](https://developer.mozilla.org/en-US/docs/Web/XPath/Axes)
+ good for knowing how to navigate among nodes
+ - [MDN XPath
+ functions](https://developer.mozilla.org/en-US/docs/Web/XPath/Functions)
+ good for knowing how to filter node matches
+ - The [w3schools tutorial on
+ XPath](https://www.w3schools.com/xml/xpath_intro.asp) is actually one of the
+ best out there, but this is an excpetion to the rule. Other than this
+ tutorial, I would not trust any content from w3schools (they are not aligned
+ at all with the web consortium).
+ - An [XPath tester](https://extendsclass.com/xpath-tester.html) like a regex
+ tester to allow you to try out complex queries in a visual manner.
+
+[XPath-1.0]: https://en.wikipedia.org/wiki/XPath#Syntax_and_semantics_(XPath_1.0)
+
+It's important to remember that an XML document is a tree-like structure that
+is similar to directories or folders on your computer. For example, if you look
+at the source directory structure of this package, you would see a folder
+called `R/` and a nested folder called `tests/testhat/`. If you started from
+the root directory of this package, you would list the R files in the `R/`
+folder with `ls R/*.R` similarly, if you wanted to list the R files in the
+`tests/testthat/` folder, you would us `ls tests/testthat/*.R`. In this
+respect, XPath has a very similar syntax: to enter the next level of nesting,
+you add a slash (`/`). For example, let's take a look a what the file structure
+would look like in XML form:
+
+```{r XML-files, echo = FALSE, results = "asis"}
+x <- '
+
+ one
+ two
+
+
+
+
+ test-data
+
+ test-one
+ test-two
+
+
+'
+writeLines(c("```xml", x, "```"))
+xml <- xml2::read_xml(gsub("\\n", "", x))
+```
+
+The XPath syntax to find all files in the the R and testthat folders would be
+the same if you started from the root: `R/file` and
+`tests/testthat/file`.
+
+```{r}
+xml2::xml_find_all(xml, "R/file")
+xml2::xml_find_all(xml, "tests/testthat/file")
+```
+
+However, XPath has one advantage that normal command line syntax doesn't have:
+you can short-cut paths, so if we wanted to find all files in any given folder,
+you can use the double slash (`//`) to recursively search through nesting. By
+habit, I will normally use the precede these slashes with a dot (`.`) so that
+I can be sure to start with the node that I have in my variable:
+
+```{r}
+xml2::xml_find_all(xml, ".//file")
+```
+
+Of course, this method finds _all_ files, so if you wanted to filter them, you
+can use the bracket notation to create filters for our selection based on the
+`ext` attribute, which are prefixed by `@`. With the bracket notation, you add
+brackets to a node selection with a condition. In this case, we want to test
+that the extension is 'R', so we would use `[@ext='R']`:
+
+```{r}
+xml2::xml_find_all(xml, ".//file[@ext='R']")
+```
+
+In this scheme, I've put the file names as the text of the nodes, so we can
+use the bracket notation again with [XPath functions](https://developer.mozilla.org/en-US/docs/Web/XPath/Functions) to filter for only files that contain "one"
+
+```{r}
+xml2::xml_find_all(xml, ".//file[@ext='R'][contains(text(), 'one')]")
+```
+
+If I only wanted to extract source files that contain "one", I could also use
+the `parent::` [XPath axis](https://developer.mozilla.org/en-US/docs/Web/XPath/Axes):
+
+```{r}
+xml2::xml_find_all(xml, ".//file[@ext='R'][contains(text(), 'one')][parent::R]")
+```
+
+Note that if I used a slash (`/`) instead of square brackets for the parent, I
+would get the parent back:
+
+```{r}
+xml2::xml_find_all(xml, ".//file[@ext='R'][contains(text(), 'one')]/parent::R")
+```
+
+As you an see, many times, an XPath query can get kind of hairy, which is why
+I often like to compose it into different parts during programming with {glue}:
+
+```{r}
+predicate <- "[@ext='R'][contains(text(), 'one')]"
+XPath <- glue::glue(".//file{predicate}/parent::R")
+xml2::xml_find_all(xml, XPath)
+```
+
+In the next section, I will discuss how to extract and manipulate XML that comes
+from Markdown with namespaces.
+
+## XML data from Markdown using namespaces
+
+The XML from markdown transformation is fully handled by the {commonmark}
+package, which has the convenient `commonmark::markdown_xml()` function. For
+example, this is how how the following markdown is processed:
+
+```markdown
+This is a bunch of [example markdown](https://example.com 'for example') text
+
+- this
+- is
+- a **list**
+```
+
+> This is a bunch of [example markdown](https://example.com 'for example') text
+>
+> - this
+> - is
+> - a **list**
+
+
+```{r commonmark-ex}
+md <- c("This is a bunch of [example markdown](https://example.com 'for example') text",
+ "",
+ "- this",
+ "- is",
+ "- a **list**"
+)
+xml_txt <- commonmark::markdown_xml(paste(md, collapse = "\n"))
+class(xml_txt)
+writeLines(xml_txt)
+```
+
+You can see that it has successfully parsed the markdown into a paragraph and
+a list and then the various elements within.
+
+### The default namespace
+
+Now here's the catch: The commonmark markdown always starts with this basic
+skeleton which has the root node of ``. The `xmlns` attribute defines the
+[default XML namespace][namespace]:
+
+[namespace]: https://developer.mozilla.org/en-US/docs/Web/SVG/Namespaces_Crash_Course
+
+```{r commonmark-skel, echo = FALSE}
+lines <- strsplit(commonmark::markdown_xml("hi"), "\n")[[1]][-(4:6)]
+writeLines(append(lines, "\nMARKDOWN CONTENT HERE\n", after = 3))
+```
+
+In many XML applications, namespaces will come with prefixes, which are defined
+in the `xmlns` attribute (e.g. `xmlns:svg="http://www.w3.org/2000/svg"`). If a
+node has a namespace, it needs to be selected with the namespace prefix like
+so: `.//svg:circle`. For default namespaces, the same rule applies, but the
+question becomes: how do you know what the namespace prefix is? In {xml2}, the
+default namespace always begins with `d1` and increments up as new namespaces
+are added. You can inspect the namespace with `xml2::xml_ns()`:
+
+```{r commonmark-namespace-show}
+xml <- xml2::read_xml(xml_txt)
+xml2::xml_ns(xml)
+```
+
+Thus, the XPath query you would use to select a paragraph would be
+`.//d1:paragraph`:
+
+```{r commonmark-namespace}
+# with namespace prefix
+xml2::xml_find_all(xml, ".//d1:paragraph")
+```
+
+Of course, having a default namespace in {xml2} has some drawbacks in that
+[adding new nodes will duplicate the namespace with a different
+identifier](https://community.rstudio.com/t/adding-nodes-in-xml2-how-to-avoid-duplicate-default-namespaces/84870), so one way we have avoided this in {tinkr} (the
+package that does the basic conversion) is to define a namespace with a prefix
+in a function so that we can use it when querying:
+
+```{r commonmark-namespace-md}
+tinkr::md_ns()
+xml2::xml_find_all(xml, ".//md:paragraph", ns = tinkr::md_ns())
+```
+
+It's also important to remember that _all nodes_ will require this namespace
+prefix, so if we wanted to only select paragraphs that were inside of a list,
+we would need to specify use `.//md:list//md:paragraph`:
+
+```{r commonmark-list-paragraph-select}
+xml2::xml_find_all(xml, ".//md:list//md:paragraph", ns = tinkr::md_ns())
+```
+
+### Pegboard namespace
+
+One of the reasons why we created pegboard was to handle markdown content that
+also included [fenced divs](https://pandoc.org/MANUAL.html#divs-and-spans), but
+we needed a way to programmatically label and extract them without affecting the
+stylesheet that is used to translate the XML back to Markdown (not covered in
+this tutorial). To acheive this we place nodes under a different namespace
+around the fences and define our own namespace.
+
+Here's an example:
+
+```markdown
+This is markdown with fenced divs
+
+::: discussion
+
+This is a discussion
+
+:::
+
+::: spoiler
+
+This is a spoiler that is hidden by default
+
+:::
+```
+
+When it's parsed by commonmark, the fenced divs are treated as paragraphs:
+
+```{r show-fenced-divs-paragraph}
+md <- 'This is markdown with fenced divs
+
+::: discussion
+
+This is a discussion
+
+:::
+
+::: spoiler
+
+This is a spoiler that is hidden by default
+
+:::
+'
+fences <- xml2::read_xml(commonmark::markdown_xml(md))
+fences
+```
+
+In {pegboard}, we have an internal function called `label_div_tags()` that will
+allow us to label and parse these tags without affecting the markdown document:
+
+```{r label-divs}
+pb <- asNamespace("pegboard")
+pb$label_div_tags(fences)
+fences
+```
+
+Note that we have defined a `` XML node that is defined under the pegboard
+namespace. These sandwich the nodes that we want to query and allow us to use
+`tinkr::find_between()` to search for specific tags:
+
+```{r find-between}
+ns <- pb$get_ns()
+ns # both md and pegboard namespaces
+tinkr::find_between(fences, ns = ns, pattern = "pb:dtag[@label='div-1-discussion']")
+```
+
+This is automated in the `get_divs()` internal function:
+
+```{r get-divs}
+pb$get_divs(fences)
+```
+
+## Conclusion
+
+This is but a short introduction to using XML with {pegboard}. You now have the
+basics of what the structure of XML is, how to use XPath (with further resources),
+how to use XPath with namespaces, and how we use namespaces in {pegboard} to
+allow us to parse specific items. It is a good idea to practices working with
+XPath because it is useful not only for working with XML representations of
+markdown documents, but it is also heavily used for post-processing of HTML in
+both {pkgdown} and the {sandpaper} packages.
+