-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathashrae-weather-analysis-with-openair.irnb
1 lines (1 loc) · 10.4 KB
/
ashrae-weather-analysis-with-openair.irnb
1
{"cells":[{"metadata":{},"cell_type":"markdown","source":"# ASHRAE WeatheR Analysis with OpenAir\n\nThis package by [David Carslaw](https://davidcarslaw.com/) is designed for air quality analysis, check it out.\n\nBy Nick Brooks, December 2019\n\n- [OpenAir Website](http://www.openair-project.org/)\n- [OpenAir Package Full Manual](http://www.openair-project.org/PDF/OpenAir_Manual.pdf)\n- [OpenAir Package Demo by me](https://www.kaggle.com/nicapotato/great-pollution-viz-with-r-openair-demo)"},{"metadata":{"trusted":true,"_kg_hide-output":true},"cell_type":"code","source":"library(ggplot2) # Data visualization\nlibrary(readr) # CSV file I/O, e.g. the read_csv function\nlibrary(openair)\nlibrary(lubridate)\nlibrary(tidyverse)\n\ndf <- read.csv(\"../input/ashrae-energy-prediction/weather_train.csv\") %>%\n mutate(\n date = as.POSIXct(strptime(timestamp, \"%Y-%m-%d %H:%M:%S\")),\n site_id = as.factor(site_id),\n year = year(date),\n wday = wday(date),\n hour = hour(date)) %>%\n rename(ws = wind_speed) %>%\n rename(wd = wind_direction) %>%\n select(-c(timestamp)) %>%\n as_tibble()\n\n# Source: https://dplyr.tidyverse.org/reference/mutate_all.html\nmin_max_scaler <- function(x, na.rm = TRUE) (x - mean(x, na.rm = na.rm)) / sd(x, na.rm)\nnorm_df <- df %>%\n mutate_at(c(\"air_temperature\",\"dew_temperature\",\"precip_depth_1_hr\",\"cloud_coverage\",\"sea_level_pressure\"), min_max_scaler)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"head(norm_df)","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"*******\n# OpenAir Visualisations\n\nThe summary plot is great to check the health of the time series data, get a feel for distributions and descriptive statistics."},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=19, repr.plot.height=20)\nsummaryPlot(df %>% select(-c(year, wday, hour)), par.settings=list(fontsize=list(text=25)),\n main = \"Summary Plot of ASHRAE Weather Data Set\")","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"Next, the calendar plot is a universally understood through which to present data. Air temperature is seasonal as expected.."},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=15, repr.plot.height=10)\ncalendarPlot(df, pollutant = \"air_temperature\",\n par.settings=list(fontsize=list(text=25)),\n main = \"Calendar Plot for Mean Air Temperature\",\n statistic = 'mean')","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"Now lets look at the time series element of this. First I will plot the min-max scaled values across all variables to check for obvious patterns"},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=20, repr.plot.height=7)\ntimePlot(norm_df, pollutant = c(\"air_temperature\",\"dew_temperature\",\"precip_depth_1_hr\",\"cloud_coverage\",\"sea_level_pressure\"),\n avg.time = \"week\",\n lwd = 2, lty = 1, group = TRUE,\n main=\"Normalized Line Plot for All Variables\",\n par.settings=list(fontsize=list(text=14)))","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"Plot unscaled variables of same magnitude."},{"metadata":{"trusted":true},"cell_type":"code","source":"timePlot(df, pollutant = c(\"air_temperature\",\"dew_temperature\",\"precip_depth_1_hr\"),\n avg.time = \"day\", lwd = 2, lty = 1,\n group = TRUE,\n smooth = TRUE,\n ci = TRUE,\n main=\"Line Plot for All Variables\",\n par.settings=list(fontsize=list(text=14)))","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":" timeVariation(df, pollutant = \"air_temperature\",\n statistic = \"median\", col = \"firebrick\",\n main=\"Time Variation Plot\",\n par.settings=list(fontsize=list(text=14)))","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"There definitely exist some correlations. Lets do a simple correlation check. This package also has a nifty dendrogram feature."},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=13, repr.plot.height=13)\ncorPlot(df, dendrogram = TRUE, col=\"default\", par.settings=list(fontsize=list(text=25)),\n main = \"Weather Correlation and Dendrogram\")","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=8, repr.plot.height=8)\nscatterPlot(df, x = \"air_temperature\", y = \"dew_temperature\", method = \"hexbin\", col= \"jet\")","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=16, repr.plot.height=8)\nscatterPlot(df, x = \"air_temperature\",\n y = \"dew_temperature\",\n z = 'cloud_coverage',\n col= \"jet\",\n type = c(\"season\", \"weekend\"),\n par.settings=list(fontsize=list(text=25)))","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=16, repr.plot.height=4)\nlinearRelation(df, x = \"air_temperature\", y = \"dew_temperature\", period = \"day.hour\",\n par.settings=list(fontsize=list(text=25)))","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=16, repr.plot.height=9)\ntrendLevel(df, y = \"wd\", pollutant = \"air_temperature\", type = \"site_id\",\n par.settings=list(fontsize=list(text=25)))","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=10, repr.plot.height=9) \nsmoothTrend(df, pollutant = c(\"air_temperature\", \"dew_temperature\"), type = c(\"wd\"), lty = 0)","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"*******\n\n## WindPlots - Looking at Wind Speed and Direction\n\nHow does wind speed and direction impact temperature? Since location is not considered in this analysis, least I can do is seperate by site."},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=20, repr.plot.height=10)\npercentileRose(df, pollutant = \"air_temperature\",\n percentile = c(25, 50, 75, 90, 95, 99, 99.9),\n col = \"brewer1\",\n key.position = \"right\", smooth = TRUE,\n type = c(\"site_id\"),\n par.settings=list(fontsize=list(text=18)),\n main = \"Air Temperature Pizza Plot by Site ID\")","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"polarPlot(df,\n pollutant = \"air_temperature\",\n col=\"jet\", type = c(\"site_id\"),\n par.settings=list(fontsize=list(text=18)),\n main = \"Air Temperature Polar Plot by Site ID\",\n force.positive = FALSE)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"polarPlot(df,\n pollutant = \"dew_temperature\",\n col=\"default\", type = c(\"site_id\"),\n par.settings=list(fontsize=list(text=18)),\n main = \"Dew Temperature Polar Plot by Site ID\",\n force.positive = FALSE)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"polarPlot(df,\n pollutant = \"precip_depth_1_hr\",\n col=c('blue', 'gold'), type = c(\"site_id\"),\n par.settings=list(fontsize=list(text=18)),\n main = \"Precipitation Depth 1h Polar Plot by Site ID\",\n force.positive = FALSE)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"polarPlot(df,\n pollutant = \"sea_level_pressure\",\n col=c(\"blue\",'green','yellow'), type = c(\"site_id\"),\n par.settings=list(fontsize=list(text=18)),\n main = \"Sea Level Pressure Polar Plot by Site ID\")","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"The next plot can help shed light on temporal impacts on wind direction and variables of interest."},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=20, repr.plot.height=10)\npolarAnnulus(df, poll = \"air_temperature\",\n period = \"hour\",\n main = \"Air Temperature Annulus by Hour\",\n par.settings=list(fontsize=list(text=18)),\n type = c(\"site_id\"))","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"*******\n## Look at Cloud Coverage\n\nHow does cloud coverage affect temperature across the day and night cycle?"},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=20, repr.plot.height=5)\npolarPlot(df,\n pollutant = \"air_temperature\",\n col= 'jet', type = c(\"cloud_coverage\",\"daylight\"),\n par.settings=list(fontsize=list(text=18)),\n main = \"Impact of Cloud Coverage on Air Temperature\",\n par.settings=list(fontsize=list(text=18)),\n force.positive = FALSE)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"polarPlot(df,\n pollutant = \"dew_temperature\",\n col=\"default\", type = c(\"cloud_coverage\",\"daylight\"),\n par.settings=list(fontsize=list(text=18)),\n main = \"Impact of Cloud Coverage on Dew Temperature\",\n force.positive = FALSE)","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"*******\n## Clustering\n\nLastly, this package can also cluster the variable at hand by wind speed/direction."},{"metadata":{"trusted":true},"cell_type":"code","source":"options(repr.plot.width=8, repr.plot.height=8)\nair_temp_cluster <- polarCluster(df, pollutant=\"air_temperature\", n.clusters = c(3,5), cols= \"jet\", force.positive = FALSE)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"dew_temp_cluster <- polarCluster(df, pollutant=\"dew_temperature\", n.clusters = c(3,5), cols= \"default\", force.positive = FALSE)","execution_count":null,"outputs":[]}],"metadata":{"kernelspec":{"display_name":"R","language":"R","name":"ir"},"language_info":{"mimetype":"text/x-r-source","name":"R","pygments_lexer":"r","version":"3.4.2","file_extension":".r","codemirror_mode":"r"}},"nbformat":4,"nbformat_minor":1}