Skip to content

Latest commit

 

History

History
102 lines (67 loc) · 6.58 KB

02-intro-to-Rmd-files.md

File metadata and controls

102 lines (67 loc) · 6.58 KB

Getting started with R Markdown files

Gaston Sanchez

Learning Objectives:

  • Differentiate between .R and .Rmd files
  • To understand dynamic documents
  • To gain familiarity with R Markdown .Rmd files
  • To gain familiarity with code chunks

Introduction to R Markdown files

Besides using R script files to write source code, you will be using other type of source files known as R markdown files. Because you will be turning in most homework assignments as Rmd files, it is important that you quickly become familiar with this resource.

Opening and knitting an Rmd file

In the menu bar of RStudio, click on File, then New File, and choose R Markdown. Select the default option (Document), and click Ok. RStudio will open a new .Rmd file in the source pane. And you should be able to see a file with some default content.

Locate the button Knit HTML, the one with an icon of a ball of yarn and two needles. Click the button (knit to HTML) so you can see how Rmd files are rendered and displayed as HTML documents. Alternatively, you can use a keyboard shortcut: in Mac Command+Shift+K, in Windows Ctrl+Shift+K

What is an Rmd file?

Rmd files are a special type of file, referred to as a dynamic document. This is the fancy term we use to describe a document that allows us to combine narrative (text) with R code in one single file.

Rmd files are plain text files. This means that you can open an Rmd file with any text editor (not just RStudio) and being able to see and edit its contents.

The main idea behind dynamic documents is simple yet very powerful: instead of working with two separate files, one that contains the R code, and another one that contains the narrative, you use an .Rmd file to include both the commands and the narrative.

One of the main advantages of this paradigm, is that you avoid having to copy results from your computations and paste them into a report file. In fact, there are more complex ways to work with dynamic documents and source files. But the core idea is the same: combine narrative and code in a way that you let the computer do the manual, repetitive, and time consuming job.

Rmd files is just one type of dynamic document that you will find in RStudio. In fact, RStudio provides other file formats that can be used as dynamic documents: e.g. .Rnw, .Rpres, .Rhtml, etc.

Anatomy of an Rmd file

The structure of an .Rmd file can be divided in two parts: 1) a YAML header, and 2) the body of the document. In addition to this structure, you should know that .Rmd files use three types of syntaxes: YAML, Markdown, and R.

The YAML header consists of the first few lines at the top of the file. This header is established by a set of three dashes --- as delimiters (one starting set, and one ending set). This part of the file requires you to use YAML syntax (Yet Another Markup Language.) Within the delimiter sets of dashes, you specify settings (or metadata) that will apply to the entire document. Some of the common options are things like:

  • title
  • author
  • date
  • output

The body of the document is everything below the YAML header. It consists of a mix of narrative and R code. All the text that is narrative is written in a markup syntax called Markdown (although you can also use LaTeX math notation). In turn, all the text that is code is written in R syntax inside blocks of code.

There are two types of blocks of code: 1) code chunks, and 2) inline code. Code chunks are lines of text separated from any lines of narrative text. Inline code is code inserted within a line of narrative text .

How does an Rmd file work?

Rmd files are plain text files. All that matters is the syntax of its content. The content is basically divided in the header, and the body.

  • The header uses YAML syntax.
  • The narrative in the body uses Markdown syntax.
  • The code and commands use R syntax.

The process to generate a nice rendered document from an Rmd file is known as knitting. When you knit an Rmd file, various R packages and programs run behind the scenes. But the process can be broken down in three main phases: 1) Parsing, 2) Execution, and 3) Rendering.

  1. Parsing: the content of the file is parsed (examined line by line) and each component is identified as yaml header, or as markdown text, or as R code.

Each component receives a special treatment and formatting.

The most interesting part is in the pieces of text that are R code. Those are separated and executed if necessary. The commands may be included in the final document. Also, the output may be included in the final document. Sometimes, nothing is executed nor included.

Depending on the specified output format (e.g. HTML, pdf, word), all the components are assembled, and one single document is generated.

Yet Another Syntax to Learn

R markdown (Rmd) files use markdown as the main syntax to write content. Markdown is a very lightweight type of markup language, and it is relatively easy to learn.

One of the most common sources of confusion when learning about R and Rmd files has to do with the hash symbol #. As you know, # is the character used by R to indicate comments. The issue is that the # character has a different meaning in markdown syntax. Hashes in markdown are used to define levels of headings.

In an Rmd file, a hash # that is inside a code chunk will be treated as an R comment. A hash outside a code chunk, will be treated as markdown syntax, making its associated text a given type of heading.

Code chunks

There are dozens of options available to control the executation of the code, the formatting and display of both the commands and the output, the display of images, graphs, and tables, and other fancy things. Here's a list of the basic options you should become familiar with:

  • eval: whether the code should be evaluated
    • TRUE
    • FALSE
  • echo: whether the code should be displayed
    • TRUE
    • FALSE
    • numbers indicating lines in a chunk
  • error: whether to stop execution if there is an error
    • TRUE
    • FALSE
  • results: how to display the output
    • markup
    • asis
    • hold
    • hide
  • comment: character used to indicate output lines
    • the default is a double hash ##
    • "" empty character (to have a cleaner display)