Skip to content

Commit

Permalink
Create Lab3Assignment.qmd
Browse files Browse the repository at this point in the history
  • Loading branch information
danagonz committed Sep 13, 2024
1 parent 1ee5e0f commit e0afb52
Showing 1 changed file with 146 additions and 0 deletions.
146 changes: 146 additions & 0 deletions Lab3Assignment.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: "Lab 3 Assignment"
author: Dana Gonzalez
format: html
editor: visual
---
## 1. Read in the data
```{r}
download.file(
"https://raw.githubusercontent.com/USCbiostats/data-science-data/master/02_met/met_all.gz",
destfile = file.path("~", "Downloads", "met_all.gz"),
method = "libcurl",
timeout = 60
)
met <- data.table::fread(file.path("~", "Downloads", "met_all.gz"))
met <- as.data.frame(met)
```

## 2. Check the dimensions, headers, footers

There are 30 columns and 6 rows in this dataset (the dataset was cleaned up during lecture).

```{r}
dim(met)
```
```{r}
head (met)
```
```{r}
tail(met)
```

## 3. Take a look at the variables

Based on our objective stated at the beginning of lab (find the weather station with the highest elevation and look at patterns in the time series of its wind speed and temperature), our variables of interest are most likely: 'USAFID', 'elev', 'wind.sp', and 'temp'.

```{r}
str(met)
```

## 4. Take a closer look at the key variables

There were 91853 NA values for the 'wind.sp' variable.

The highest elevation of any weather station in this dataset is 4,113 meters above sea level.

```{r}
table(met$year)
```
```{r}
table(met$day)
```
```{r}
table(met$hour)
```
```{r}
summary(met$temp)
```
```{r}
summary(met$elev)
```
```{r}
met$wind.sp[met$winds.sp == 9999.0] <- NA
summary(met$wind.sp)
```

```{r}
met[met$elev==9999.0, ] <- NA
summary(met$elev)
```
```{r}
met <- met[met$temp > -40, ]
head(met[order(met$temp), ])
```

```{r}
met$elev[met$elev == 9999.0] <- NA
summary(met$elev)
```


## 5. Check the data against an external source

The range of elevations are valid (according to my Google search). The lowest elevation in the continental US in Death Valley, CA (-86m), whereas the highest elevation in the continental US is found in Mount Whitney, CA (4,418m). This clearly encompasses this dataset's range of -13m to 4,113m.

The coordinates for the -17.2 degrees celsius data leads to a location outside of Colorado Springs, Colorado. Knowing that this data was pulled in August, this temperature does not make sense in this context.

## 6. Calculate summary statistics
```{r}
elev <- met[which(met$elev == max(met$elev, na.rm = TRUE)), ]
summary(elev)
```
```{r}
cor(elev$temp, elev$wind.sp, use="complete")
```
```{r}
cor(elev$temp, elev$hour, use="complete")
```
```{r}
cor(elev$wind.sp, elev$day, use="complete")
```
```{r}
cor(elev$wind.sp, elev$hour, use="complete")
```
```{r}
cor(elev$temp, elev$day, use="complete")
```

## 7. Exploratory graphs
```{r}
hist(met$elev,)
```
```{r}
hist(met$temp,)
```
```{r}
hist(met$wind.sp,)
```

```{r}
library(leaflet)
leaflet(elev) |>
addProviderTiles('OpenStreetMap') %>%
addCircles(lat=~lat,lng=~lon, opacity=1, fillOpacity=1, radius=100)
```

```{r}
library(lubridate)
elev$date <- with(elev, ymd_h(paste(year, month, day, hour, sep= ' ')))
summary(elev$date)
```
```{r}
elev <- elev[order(elev$date), ]
head(elev)
```
```{r}
plot(elev$date, elev$temp, type = "l", cex = 0.5)
```
```{r}
plot(elev$date, elev$wind.sp, type = "l", cex = 0.5)
```
Our temperature and date line graph seems to have two peaks, one around August 6th and another around August 20th. Two peaks were also seen in our wind speed line plot, however both peaks were seen later in the month, the first being around August 18th and the second around August 26th. Too, wind speeds seemed to be lower in the first third of the August.

## Ask questions
Given that we spent some time in class cleaning up this data, I'm wondering what led to the missing data in this dataset. Was it an issue with collection at the sites, gathering and distributing the data, or another issue?

0 comments on commit e0afb52

Please sign in to comment.