-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various fixes #541
Various fixes #541
Conversation
trevorcampbell
commented
Sep 25, 2023
•
edited
Loading
edited
- Closes Failed code in text in predictor selection #532
- Closes Latex formatted numbers in predictor selection that should be regular font #533
- Closes Repetitive text in webscraping #536
- Closes Inconsistent subheadings in scraping #538
- Closes Slightly improved writing in predictor selection from py #534
- Closes HTML syntax highlighting in craigslist snippet broken by broken html #535
- Closes Scraping section not reproducible #537
- Closes Twitter #487
- Closes Poorly formatted variable names in clustering #545
- Closes Add acknowledge to @gcohenfr 's regr-1 slides from MDS #544
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new changes look great! I've added my comments below
@@ -1045,20 +1033,18 @@ in the additional resources section. SelectorGadget provides in its toolbar | |||
the following list of CSS selectors to use: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should change the following line
That is definitely storing the price of a particular apartment_
to add "That snippet..." or "That line of source code" to make it more clear what we are referring to
@@ -1159,124 +1135,176 @@ is that data owners have much more control over the data they provide to users. | |||
web scraping, there is no consistent way to access an API across websites. Every website typically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this line:
The benefit of this is that data owners have much more control over the data they provide to users.
- I think we should add a short example after this line of what we mean by control
- The beginning of the sentence is a little awkward ("The benefit of this is that..."). Maybe we can change to "The benefit of using an API is..."
source/reading.Rmd
Outdated
Let's construct a small data set of the last 400 tweets and | ||
retweets from the [\@tidyverse](https://twitter.com/tidyverse) account. A few of the most recent tweets | ||
are shown in Figure \@ref(fig:01-tidyverse-twitter). | ||
First, you will need to visit the [NASA APIs page](https://api.nasa.gov/) and generate an API key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a brief explanation of "API key" in brackets after "...and generate an API key". Could say "... an API key (a password)." or "(a way to authenticate the user)" or whatever phrasing you think is best to explain it
source/reading.Rmd
Outdated
```{r 01-tidyverse-twitter, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "The tidyverse account Twitter feed.", fig.retina = 2, out.width="100%"} | ||
knitr::include_graphics("img/reading/tidyverse_twitter.png") | ||
```{r NASA-API-signup, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Generating the API access token for the NASA API", fig.retina = 2, out.width="100%"} | ||
knitr::include_graphics("img/reading/NASA-API-signup.png") | ||
``` | ||
|
||
**Stop! Think about your API usage carefully!** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Stop!" feels abrupt since if we are following along all we have done so far is sign up. Perhaps we can say "Caution" or "Before moving forward" or something? Not a huge deal
source/reading.Rmd
Outdated
API you are trying to access. NASA offers a variety of APIs, each with its own | ||
endpoint; in the case of the NASA "Astronomy Picture of the Day" API, the URL | ||
endpoint is `https://api.nasa.gov/planetary/apod`, as shown at the top of | ||
Figure \@ref(fig:NASA-API-parameters). Second, we write `?`, which denotes that a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot going on in Figure 2.11, and it took me a little bit to understand what the main point of the figure was. I think we should remove the HTTP Request and the Example query in that figure and only show the query parameters so there is less to absorb for the reader. Also with the line ...in the case of the NASA "Astronomy Picture of the Day" API, the URL
endpoint is https://api.nasa.gov/planetary/apod
, as shown at the top of figure 2.11. I'm not sure how helpful it is for the reader to see the URL at the top of the figure over you just telling them what the URL is. And you give an example query below anyway so I think those parts of the figure are unnecessary
from July 13, 2023, the API query would have two parameters: `api_key=YOUR_API_KEY` | ||
and `date=2023-07-13`. | ||
``` | ||
https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&date=2023-07-13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is obvious, but maybe we should add a reminder for the students to replace YOUR_API_KEY with their API key. Something like "Remember to replace YOUR_API_KEY
with the API key you received in from NASA in your email!"
source/reading.Rmd
Outdated
`"date":"2023-07-13"`, which indicates that we indeed successfully received | ||
data corresponding to July 13, 2023. | ||
|
||
So now the job is to do all of this programmatically in R. We will load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now the job is to do all of this programmatically in R. We will load | |
So now our job is to do all of this programmatically in R. We will load |
source/reading.Rmd
Outdated
|
||
So for example, to obtain the image of the day | ||
from July 13, 2023, the API query would have two parameters: `api_key=YOUR_API_KEY` | ||
and `date=2023-07-13`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and `date=2023-07-13`. | |
and `date=2023-07-13`. Putting it all together our query will look like the following: |
source/reading.Rmd
Outdated
tidyverse_tweets <- get_timelines('tidyverse', n = 400) | ||
tidyverse_tweets | ||
req <- request("https://api.nasa.gov/planetary/apod?api_key=YOUR_API_KEY&start_date=2023-05-01&end_date=2023-07-13") | ||
response <- req_perform(req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should give these variables different names from the ones above to avoid confusion. At least nasa_data
We can obtain more records at once by using the `start_date` and `end_date` parameters, as | ||
shown in the table of parameters in \@ref(fig:NASA-API-parameters). | ||
Let's obtain all the records between May 1, 2023, and July 13, 2023, and store the result | ||
in an object called `nasa_data`; now the response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is a little confusing:
now the response will take the form of an R list (you’ll learn more about these in Chapter 3), with one item similar to the above for each of the 74 days between the start and end dates:
- The "now" makes it sound like this new
nasa_data
is a list and the othernasa_data
is not, whereas they are both lists. - Also I know what you mean with this part of the sentence with one item similar to the above for each of the 74 days between the start and end dates, but its a little confusing. Is there a way we can rephrase?