-
Notifications
You must be signed in to change notification settings - Fork 118
/
chapter-abstracts_Advanced_R.Rmd
170 lines (103 loc) · 31.4 KB
/
chapter-abstracts_Advanced_R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
## R Markdown
# Meta data (for Advanced R)
# Template:
# * Start with in chapter <no> we discuss <take sth from prerequisites>.
# * <Then use a bit more from the introsection of a chapter, e.g. why it's important. What's the goal>
# * Then use the subsection names and the stuff from the "Outline" to sequentially refer to what is done in the chapter
# * 150-200 words per chapter
# * Don't use R code within the abstracts
## 2. Names and values
The second chapter explains the distinction between an object and its name. This distinction is important, as it helps:
* More accurately predicting the performance and memory usage of R code.
* Writeing faster code by avoiding accidental copies, a major source of slow code.
* Better understanding R’s functional programming tools.
In section "Binding basics", we introduce the distinction between names and values, and how bindings, or references, between a name and a value are created. Then, in "Copy-on-modify", we learn when R makes a copy and how to use tracemem to figure out when a copy actually occurs. We also learn about the implications for different R objects. In "Object size" we explore how much memory an object occupies. "Modify-in-place" describes two exceptions to copy-on-modify: environments and values with a single name are actually modified in place. Lastly, we conclude this chapter with a discussion of the garbage collector, in "Unbinding and the garbage collector".
## 3. Vectors
The third chapter discusses the most important family of data types in base R: vectors. It won’t cover individual vector types in too much detail, but it will show how all the types fit together as a whole.
Vectors come in two flavours: atomic vectors and lists. We start with "Atomic vectors", which are R’s simplest data structures. For atomic vectors, all elements must have the same type. Then, we take a small detour to discuss "Attributes", R’s flexible metadata specification. "S3 atomic vectors" discusses the important vector types that are built by combining atomic vectors with special attributes. These include factors, dates, date-times, and durations. Next, we dive into "Lists". Lists are very similar to atomic vectors. However, an element of a list can be any data type, including another list. This makes them suitable for representing hierarchical data. "Data frames and tibbles" teaches you about data frames and tibbles, which are used to represent rectangular data. They combine the behaviour of lists and matrices to make a structure ideally suited for the needs of statistical data. To finish up this chapter, we briefly talk about one final important data structure that’s closely related to vectors: "NULL".
## 4. Subsetting
In chapter 4 we discuss R's fast and powerful subsetting operators. Mastering them allows you to succinctly perform complex operations in a way that few other languages can match. Subsetting in R is easy to learn but hard to master because you need to internalise a number of interrelated concepts.
"Selecting multiple elements" starts by teaching you about [. You’ll learn the six ways to subset atomic vectors. You’ll then learn how those six ways act when used to subset lists, matrices, and data frames. "Selecting a single element" expands your knowledge of subsetting operators to include [[ and $ and focuses on the important principles of simplifying versus preserving. In "Subsetting and assignment" you’ll learn the art of subassignment, which combines subsetting and assignment to modify parts of an object. "Applications" leads you through eight important, but not obvious, applications of subsetting to solve problems that you often encounter in data analysis.
## 5. Control Flow
In chapter 5 you'll learn about the two primary tools of control flow: choices and loops. Choices, like if statements and switch() calls, allow you to run different code depending on the input. Loops, like for and while, allow you to repeatedly run code, typically with changing options.
We’d expect that you’re already familiar with the basics of these functions so we’ll briefly cover some technical details and then introduce some useful, but lesser known, features. The condition system (messages, warnings, and errors), which you’ll learn about in Chapter 8, also provides non-local control flow.
In "Choices" we dive into the details of the if statement, and then discuss the close relatives ifelse() and switch(). "Loops" starts off by reminding you of the basic structure of the for loop in R, discusses some common pitfalls, and then talks about the related while and repeat statements.
## 6. Functions
In chapter 6, you’ll learn how to turn your informal, working knowledge about functions into more rigorous, theoretical understanding. The tricks and techniques you will learn along the way will be important for understanding the more advanced topics discussed later in the book.
"Function fundamentals" describes the basics of creating a function, the three main components of a function, and the exception to many function rules: primitive functions (which are implemented in C, not R). "Function composition" discusses the strengths and weaknesses of the three forms of function composition commonly used in R code. "Lexical scoping" shows you how R finds the value associated with a given name, i.e. the rules of lexical scoping. "Lazy evaluation" is devoted to an important property of function arguments: they are only evaluated when used for the first time. "... (dot-dot-dot)" discusses the special ... argument, which allows you to pass on extra arguments to another function. "Exiting a function" discusses the two primary ways that a function can exit, and how to define an exit handler, code that is run on exit, regardless of what triggers it. "Function forms" shows you the various ways in which R disguises ordinary function calls, and how you can use the standard prefix form to better understand what’s going on.
## 7. Environments
In chapter 7 you will learn about environments, the data structure that powers scoping. This chapter dives deep into environments, describing their structure in depth, and using them to improve your understanding of the four scoping rules described in the “Lexical scoping” section of the functions chapter. Understanding environments is not necessary for day-to-day use of R. But they are important to understand because they power many important R features like lexical scoping, namespaces, and R6 classes, and interact with evaluation to give you powerful tools for making domain specific languages, like dplyr and ggplot2.
"Environment basics" introduces you to the basic properties of an environment and shows you how to create your own. "Recursing over environments" provides a function template for computing with environments, illustrating the idea with a useful function. "Special environments" describes environments used for special purposes: for packages, within functions, for namespaces, and for function execution. "Call stacks" explains the last important environment: the caller environment. This requires you to learn about the call stack, that describes how a function was called. You’ll have seen the call stack if you’ve ever called traceback() to aid debugging. “As data structures" briefly discusses three places where environments are useful data structures for solving other problems.
## 8. Conditions
In chapter 8 we will discuss R’s condition system, which provides a paired set of tools that allow the author of a function to indicate that something unusual is happening, and the user of that function to deal with it. The function author signals conditions with functions like stop(), warning(), and message(), then the function user can handle them with functions like tryCatch() and withCallingHandlers(). Understanding the condition system is important because you’ll often need to play both roles: signalling conditions from the functions you create, and handle conditions signalled by the functions you call.
R’s condition system is based on ideas from Common Lisp. Like R’s approach to object-oriented programming, it is rather different to currently popular programming languages so it is easy to misunderstand, and there has been relatively little written about how to use it effectively. Historically, this has meant that few people have taken full advantage of its power. The goal of this chapter is to remedy that situation. Here you will learn about the big ideas of R’s condition system, as well as learning a bunch of practical tools that will make your code stronger.
“Signalling conditions” introduces the basic tools for signalling conditions, and discusses when it is appropriate to use each type. “Ignoring conditions” teaches you about the simplest tools for handling conditions: functions like try() and supressMessages() that swallow conditions and prevent them from getting to the top level. “Handling conditions” introduces the condition object, and the two fundamental tools of condition handling: tryCatch() for error conditions, and withCallingHandlers() for everything else. “Custom conditions” shows you how to extend the built-in condition objects to store useful data that condition handlers can use to make more informed decisions. “Applications” closes out the chapter with a grab bag of practical applications based on the low-level tools found in earlier sections.
## 9. Functionals
In chapter 9 we'll learn about functionals. A functional is a function that takes a function as an input and returns a vector as output. A common use of functionals is as an alternative to for loops. As Loops do not convey what should be done with the results, it’s better to use a functional. Each functional is tailored for a specific task, so when you recognise the functional you immediately know why it’s being used.
"My first functional: map()" introduces your first functional: purrr::map(). "Purrr style" demonstrates how you can combine multiple simple functionals to solve a more complex problem and discusses how purrr style differs from other approaches. "Map variants" teaches you about 18 (!!) important variants of purrr::map(). Fortunately, their orthogonal design makes them easy to learn, remember, and master. "Reduce family" introduces a new style of functional: purrr::reduce(). reduce() systematically reduces a vector to a single result by applying a function that takes two inputs. "Predicate functionals" teaches you about predicates: functions that return a single TRUE or FALSE, and the family of functionals that use them to solve common problems. "Base functionals" reviews some functionals in base R that are not members of the map, reduce, or predicate families.
## 10. Function factories
In chapter 10 we’ll discuss function factories. A function factory is a function that makes functions.
Of the three main functional programming tools (functionals, function factories, and function operators), function factories are the least used. Generally, they don’t tend to reduce overall code complexity but instead partition complexity into more easily digested chunks. Function factories are also an important building block for the very useful function operators, which you’ll learn about in Chapter 11 (“Function operators”).
"Factory fundamentals" begins the chapter with an explanation of how function factories work, pulling together ideas from scoping and environments. You’ll also see how function factories can be used to implement a memory for functions, allowing data to persist across calls. "Graphical factories" illustrates the use of function factories with examples from ggplot2. You’ll see two examples of how ggplot2 works with user supplied function factories, and one example of where ggplot2 uses a function factory internally. "Statistical factories" uses function factories to tackle three challenges from statistics: understanding the Box-Cox transform, solving maximum likelihood problems, and drawing bootstrap resamples. "Function factories + functionals" shows how you can combine function factories and functionals to rapidly generate a family of functions from data.
## 11. Function operators
In chapter 11, you’ll learn about function operators. A function operator is a function that takes one (or more) functions as input and returns a function as output.
Function operators are closely related to function factories; indeed they’re just a function factory that takes a function as input. Like factories, there’s nothing you can’t do without them, but they often allow you to factor out complexity in order to make your code more readable and reusable. Function operators are typically paired with functionals. If you’re using a for-loop, there’s rarely a reason to use a function operator, as it will make your code more complex for little gain. If you’re familiar with Python, decorators is just another name for function operators.
"Existing function operators" introduces you to two extremely useful existing function operators, and shows you how to use them to solve real problems. "Case study: Creating your own function operators" works through a problem amenable to solution with function operators: downloading many web pages.
## 12. Base types
In chapter 12 we will learn about base types.
To talk about objects and OOP in R we first need to clear up a fundamental confusion about two uses of the word “object”. So far in this book, we’ve used the word in the general sense captured by John Chambers’ pithy quote: “Everything that exists in R is an object”. However, while everything is an object, not everything is object-oriented. This confusion arises because the base objects come from S, and were developed before anyone thought that S might need an OOP system. The tools and nomenclature evolved organically over many years without a single guiding principle.
Most of the time, the distinction between objects and object-oriented objects is not important. But here we need to get into the nitty gritty details so we’ll use the terms base objects and OO objects to distinguish them.
"Base versus OO objects" shows you how to identify base and OO objects.
"Base types" gives a complete set of the base types used to build all objects.
## 13. S3
In chapter 13 we’ll learn about R’s first and simplest OO system: S3. S3 is informal and ad hoc, but there is a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system. For these reasons, you should use it, unless you have a compelling reason to do otherwise. S3 is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages.
"Basics" gives a rapid overview of all the main components of S3: classes, generics, and methods. You’ll also learn about sloop::s3_dispatch(), which we’ll use throughout the chapter to explore how S3 works. Then, "Classes" goes into the details of creating a new S3 class, including the three functions that should accompany most classes: a constructor, a helper, and a validator. Further, "Generics and methods" describes how S3 generics and methods work, including the basics of method dispatch. "Object styles" discusses the four main styles of S3 objects: vector, record, data frame, and scalar. "Inheritance" demonstrates how inheritance works in S3, and shows you what you need to make a class “subclassable”. "Dispatch details" concludes the chapter with a discussion of the finer details of method dispatch including base types, internal generics, group generics, and double dispatch.
## 14. R6
Chapter 14 describes the R6 OOP system which is more like OOP in other languages. It uses the encapsulated OOP paradigm, which means that methods belong to objects, not generics, and you call them like object$method(). Also, R6 objects are mutable, which means that they are modified in place, and hence have reference semantics.
If you’ve learned OOP in another programming language, it’s likely that R6 will feel very natural, and you’ll be inclined to prefer it over S3. Resist the temptation to follow the path of least resistance: in most cases R6 will lead you to non-idiomatic R code.
"Classes and methods" introduces R6::R6Class(), the one function that you need to know to create R6 classes. You’ll learn about the constructor method, $new(), which allows you to create R6 objects, as well as other important methods like $initialize() and $print(). "Controlling access" discusses the access mechanisms of R6: private and active fields. Together, these allow you to hide data from the user, or expose private data for reading but not writing. "Reference semantics" explores the consequences of R6’s reference semantics. You’ll learn about the use of finalizers to automatically clean up any operations performed in the initializer, and a common gotcha if you use an R6 object as a field in another R6 object. "Why R6?" describes why we cover R6, rather than the base RC system.
## 15. S4
In chapter 15 we’ll learn about the S4 OO system, which provides a formal approach to functional OOP. The underlying ideas are similar to S3, but the implementation is much stricter and makes use of specialised functions for creating classes (setClass()), generics (setGeneric()), and methods (setMethod()). Additionally, S4 provides both multiple inheritance (i.e. a class can have multiple parents) and multiple dispatch (i.e. method dispatch can use the class of multiple arguments).
An important new component of S4 is the slot, a named component of the object that is accessed using the specialised subsetting operator @ (pronounced at). The set of slots, and their classes, forms an important part of the definition of an S4 class.
“Basics” gives a quick overview of the main components of S4: classes, generics, and methods. “Classes” dives into the details of S4 classes, including prototypes, constructors, helpers, and validators. “Generics and methods” shows you how to create new S4 generics, and how to supply those generics with methods. You’ll also learn about accessor functions which are designed to allow users to safely inspect and modify object slots. “Method dispatch” dives into the full details of method dispatch in S4. The basic idea is simple, then it rapidly gets more complex once multiple inheritance and multiple dispatch are combined. “S4 and S3” discusses the interaction between S4 and S3, showing you how to use them together.
## 16. Trade-offs
By the time you get to Chapter 16, you should now know about the three most important OOP toolkits available in R and understand their basic operation and the principles that underlie them.
Overall, when picking an OO system, choosing S3 is usually a good default. S3 is simple, and widely used throughout base R and CRAN. While it’s far from perfect, its idiosyncrasies are well understood and there are known approaches to overcome most shortcomings. If you have an existing background in programming you are likely to lean towards R6, because it will feel familiar. There are two good reasons why to resist this tendency. Firstly, if you use R6 it’s very easy to create a non-idiomatic API that will feel very odd to native R users, and will have surprising pain points because of the reference semantics. Secondly, if you stick to R6, you’ll lose out on learning a new way of thinking about OOP that gives you a new set of tools for solving problems.
“S4 versus S3” compares S3 and S4. In brief, S4 is more formal and tends to require more upfront planning. That makes it more suitable for big projects developed by teams, not individuals. “R6 versus S3” compares S3 and R6. This section is quite long because these two systems are fundamentally different and there are a number of tradeoffs that you need to consider.
## 17. Big picture
Chapter 17 describes the big picture on metaprogramming. Metaprogramming brings together many formerly unrelated topics and forces you grapple with issues that you probably haven’t thought about before. You’ll also need to learn a lot of new vocabulary, and even if you’re an experienced programmer in another language, your existing skills are unlikely to be much help as few modern popular languages expose the level of metaprogramming that R provides. So don’t be surprised if you’re frustrated or confused at first; this is a natural part of the process that happens to everyone!
Each section in this chapter introduces one big new idea: “Code is data” shows that code is data and teaches you how to create and modify expressions by capturing code. “Code is a tree” describes the tree-like structure of code, called an abstract syntax tree. “Code can generate code” shows how to create new expressions programmatically. “Evaluation runs code” shows how to execute expressions by evaluating them in an environment. “Customising evaluation with functions” illustrates how to customise evaluation by supplying custom functions in a new environment. “Customising evaluation with data” extends that customisation to data masks, which blur the line between environments and data frames. “Quosures” introduces a new data structure called the quosure that makes all this simpler and more correct.
## 18. Expressions
The focus of chapter 18 chapter are the data structures that underlie expressions. Mastering this knowledge will allow you to inspect and modify captured code, and to generate code with code. We’ll come back to expr() in Chapter 19 (“Quasiquotation”), and to eval() in Chapter 20 (“Evaluation”).
“Abstract syntax trees” introduces the idea of the abstract syntax tree (AST), and reveals the tree like structure that underlies all R code. “Expressions” dives into the details of the data structures that underpin the AST: constants, symbols, and calls, which are collectively known as expressions. “Parsing and grammar” covers parsing, the act of converting the linear sequence of character in code into the AST, and uses that idea to explore some details of R’s grammar. “Walking AST with recursive functions” shows you how you can use recursive functions to compute on the language, writing functions that compute with expressions. “Specialised data structures” circles back to three more specialised data structures: pairlists, missing arguments, and expression vectors.
## 19. Quasiquotation
In chapter 19 you will learn about quasiquotation. Quasiquotation is one of the three pillars of tidy evaluation. You’ll learn about the other two (quosures and the data mask) in Chapter 20 (“Evaluation”). When used alone, quasiquotation is most useful for programming, particularly for generating code. But when it’s combined with the other techniques, tidy evaluation becomes a powerful tool for data analysis.
“Motivation” motivates the development of quasiquotation with a function, cement(), that works like paste() but automatically quotes its arguments so that you don’t have to. “Quoting” gives you the tools to quote expressions, whether they come from you or the user, or whether you use rlang or base R tools. “Unquoting” introduces the biggest difference between rlang quoting functions and base quoting function: unquoting with !! and !!!. “Non-quoting” discusses the three main non-quoting techniques that base R functions uses to disable quoting behaviour. “... (dot-dot-dot)” explores another place that you can use !!!, functions that take .... It also introduces the special := operator, which allows you to dynamically change argument names. “Case studies” shows a few practical uses of quoting to solve problems that naturally require some code generation. “History” finishes up with a little history of quasiquotation for those who are interested.
## 20. Evaluation
Chapter 20 discusses the topic evaluation. The user-facing inverse of quotation is unquotation: it gives the user the ability to selectively evaluate parts of an otherwise quoted argument. The developer-facing complement of quotation is evaluation: this gives the developer the ability to evaluate quoted expressions in custom environments to achieve specific goals.
“Evaluation basics” discusses the basics of evaluation using eval(), and shows how you can use it to implement key functions like local() and source(). “Quosures” introduces a new data structure, the quosure, which combines an expression with an environment. You’ll learn how to capture quosures from promises, and evaluate them using rlang::eval_tidy(). “Data masks” extends evaluation with the data mask, which makes it trivial to intermingle symbols bound in an environment with variables found in a data frame. “Using tidy evaluation” shows how to use tidy evaluation in practice, focussing on the common pattern of quoting and unquoting, and how to handle ambiguity with pronouns. “Base evaluation” circles back to evaluation in base R, discusses some of the downsides, and shows how to use quasiquotation and evaluation to wrap functions that use NSE.
## 21. Translating R code
Chapter 20 discusses the creation of domain specific languages. The combination of first-class environments, lexical scoping, and metaprogramming gives us a powerful toolkit for translating R code into other languages. One fully-fledged example of this idea is dbplyr, which powers the database backends for dplyr, allowing you to express data manipulation in R and automatically translate it into SQL. You can see the key idea in translate_sql() which takes R code and returns the equivalent SQL.
Translating R to SQL is complex because of the many idiosyncrasies of SQL dialects, so here I’ll develop two simple, but useful, domain specific languages (DSL): one to generate HTML, and the other to generate mathematical equations in LaTeX.
In “HTML” we create a DSL for generating HTML, using quasiquotation and purrr to generate a function for each HTML tag, then tidy evaluation to easily access them. In “LaTeX” we transform mathematically R code into its LaTeX equivalent using a combination of tidy evaluation and expression walking.
## 22. Debugging
Chapter 22 teaches you the art and science of debugging, starting with a general strategy, then following up with specific tools.
I’ll show the tools provided by both R and the RStudio IDE. I recommend using RStudio’s tools if possible, but I’ll also show you the equivalents that work everywhere. You may also want to refer to the official RStudio debugging documentation which always reflects the latest version of RStudio.
NB: You shouldn’t need to use these tools when writing new functions. If you find yourself using them frequently with new code, reconsider your approach. Instead of trying to write one big function all at once, work interactively on small pieces. If you start small, you can quickly identify why something doesn’t work, and don’t need sophisticated debugging tools.
“Overall approach” outlines a general strategy for finding and fixing errors. “Locating errors” introduces you to the traceback() function which helps you locate exactly where an error occurred. “Interactive debugger” shows you how to pause the execution of a function and launch environment where you can interactively explore what’s happening. “Non-interactive debugging” discusses the challenging problem of debugging when you’re running code non-interactively. “Non-error failures” discusses a handful of non-error problems that occassionally also need debugging.
## 23. Measuring performance
In chapter 23 we learn about measuring performance.
Before you can make your code faster, you first need to figure out what’s making it slow. This sounds easy, but it’s not. Even experienced programmers have a hard time identifying bottlenecks in their code. So instead of relying on your intuition, you should profile your code: measure the run-time of each line of code using realistic inputs.
Once you’ve identified bottlenecks you’ll need to carefully experiment with alternatives to find faster code that is still equivalent. In Chapter 24 (“Improving performance”) you’ll learn a bunch of ways to speed up code, but first you need to learn how to microbenchmark so that you can precisely measure the difference in performance.
“Profiling” shows you how to use profiling tools to dig into exactly what is making code slow. Section “Microbenchmarking” shows how to use microbenchmarking to explore alternative implementations and figure out exactly which one is fastest.
## 24. Improving performance
Chapter 24 teaches how to improve performance. Once you’ve used profiling to identify a bottleneck, you need to make it faster. It’s difficult to provide general advice on improving performance, but in this chapter we’ll see four techniques that can be applied in many situations and also a general strategy for performance optimisation that helps ensure that your faster code is still correct. Also it is easy to get caught up in trying to remove all bottlenecks. Don’t! Be pragmatic: don’t spend hours of your time to save seconds of computer time.
“Code organisation” teaches you how to organise your code to make optimisation as easy, and bug free, as possible. “Checking for existing solutions” reminds you to look for existing solutions. “Doing as little as possible” emphasises the importance of being lazy: often the easiest way to make a function faster is to let it to do less work. “Vectorise” concisely defines vectorisation, and shows you how to make the most of built-in functions. “Avoiding copies” discusses the performance perils of copying data. “Case study: t-test” pulls all the pieces together into a case study showing how to speed up repeated t-tests by about a thousand times. “Other techniques” finishes the chapter with pointers to more resources that will help you write fast code.
## 25. Rewriting R code in C++
Chapter 25 shows how to improve the performance by rewriting R code in C++ via the Rcpp package.
Rcpp makes it very simple to connect C++ to R. While it is possible to write C or Fortran code for use in R, it will be painful by comparison. Rcpp provides a clean, approachable API that lets you write high-performance code, insulated from R’s complex C API.
Typical bottlenecks that C++ can address include:
* Loops that can’t be easily vectorised because subsequent iterations depend on previous ones.
* Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.
* Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues.
“Getting started with C++” teaches you how to write C++ by converting simple R functions to their C++ equivalents. You’ll learn how C++ differs from R, and what the key scalar, vector, and matrix classes are called. The Subsection “Using sourceCpp” shows you how to use sourceCpp() to load a C++ file from disk in the same way you use source() to load a file of R code. “Other classes” discusses how to modify attributes from Rcpp, and mentions some of the other important classes. “Missing values” teaches you how to work with R’s missing values in C++. “Standard Template Library” shows you how to use some of the most important data structures and algorithms from the standard template library, or STL, built-in to C++. “Case studies” shows two real case studies where Rcpp was used to get considerable performance improvements. “Using Rcpp in a package” teaches you how to add C++ code to a package. “Learning more” concludes the chapter with pointers to more resources to help you learn Rcpp and C++.