Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
jirlong committed Mar 24, 2024
1 parent caf232f commit 4fc1d58
Show file tree
Hide file tree
Showing 47 changed files with 73 additions and 67 deletions.
2 changes: 1 addition & 1 deletion 404.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
11 changes: 3 additions & 8 deletions R23_join_twdemo_ref.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,9 @@ raw %>% head

### 清理資料 {#moi_clean}

我們之前在談資料的「觀察、統計、和二維表格」三種型態時,曾經談到統計型態和二維表格型態間的差異。當時所提到的「統計型態」,也就是每個變項欄恰好是我們所認知的單一變項(如每一個變項欄恰是人口統計變項的年齡、性別、教育程度、數量),會有助於進行統計分析,也就是tidy型態的資料。相較之下,上述的表格是把資料攤成二維的型態,每一個變項是某個年齡層的某種性別的某種婚姻狀況,包含了三個人口統計變項,是方便一般大眾讀的,但不是適合進行統計的tidy型態。
這類的資料tidyverse的相關套件把它稱為tidy form。遵守tidy
form形式的資料是,每一個欄恰好一個變項。例如在內政部開放資料「15歲以上現住人口按性別、年齡、婚姻狀況及教育程度分」中,每個變數(年齡、婚姻狀況、教育程度、人口數等等)均各自為一個欄上的變項。
我們之前在談資料的「觀察、統計、和二維表格」三種型態時,曾經談到統計型態和二維表格型態間的差異。當時所提到的「統計型態」,也就是每個變項欄恰好是我們所認知的單一變項(如每一個變項欄恰是人口統計變項的年齡、性別、教育程度、數量),會有助於進行統計分析,也就是tidy型態的資料。相較之下,上述的表格是把資料攤成二維的型態,每一個變項是某個年齡層的某種性別的某種婚姻狀況,包含了三個人口統計變項,是方便一般大眾讀的,但不是適合進行統計的tidy型態。 這類的資料tidyverse的相關套件把它稱為tidy form。遵守tidy form形式的資料是,每一個欄恰好一個變項。例如在內政部開放資料「15歲以上現住人口按性別、年齡、婚姻狀況及教育程度分」中,每個變數(年齡、婚姻狀況、教育程度、人口數等等)均各自為一個欄上的變項。

- [15歲以上現住人口按性別、年齡、婚姻狀況及教育程度分 \|
政府資料開放平臺 (data.gov.tw)](https://data.gov.tw/dataset/32944)
- [15歲以上現住人口按性別、年齡、婚姻狀況及教育程度分 \| 政府資料開放平臺 (data.gov.tw)](https://data.gov.tw/dataset/32944)

接下來,我要把表格型態的資料轉為tidy型態資料。原本的資料是這樣的型態。

Expand Down Expand Up @@ -85,9 +82,7 @@ tidy_data <- raw %>%
之後,我使用`tidyr::separate()`函式將`key`切成四個變項,分別為`married``ageLower``ageUpper``gender`

- `separate()`有一個參數是`remove=T`(預設值),意思是說,當把`key`變項切割為四個變項後,預設把key變項給丟棄;但如果未來你還會用到`key`變項的話,你可以把`remove`改為`FALSE`,代表切割完後,還保留`key`變項。
- `tidyr::separate()`:Given either regular expression or a vector of
character positions, separate() turns a single character column into
multiple columns.
- `tidyr::separate()`:Given either regular expression or a vector of character positions, separate() turns a single character column into multiple columns.

此時我清理出來的資料大致如下:

Expand Down
12 changes: 10 additions & 2 deletions R25_tidy_temoral_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,19 @@ class(t)
```r
?strptime
t1 <- strptime(t, "%Y-%m-%dT%H:%M:%SZ")
raw %>% glimpse()
clean %>% head # %>% View
```

```{.output}
## function (length = 0L)
## # A tibble: 6 × 7
## plink board pcontent poster ptitle ptime ipaddr
## <chr> <chr> <chr> <chr> <chr> <dttm> <chr>
## 1 https://www.ptt.cc/bb… Hate… "\n\n韓… loveb… Re: [… 2019-04-12 02:21:14 83.22…
## 2 https://www.ptt.cc/bb… Hate… "\n\n\n… ikr36… Re: [… 2019-04-12 02:13:45 114.4…
## 3 https://www.ptt.cc/bb… Hate… "\n\n正… sunye… Re: [… 2019-04-12 02:10:18 118.1…
## 4 https://www.ptt.cc/bb… Hate… "\n:\n\… rock7… Re: [… 2019-04-12 02:03:14 118.1…
## 5 https://www.ptt.cc/bb… Hate… "\n\n我… btm97… Re: [… 2019-04-12 02:01:12 101.1…
## 6 https://www.ptt.cc/bb… Hate… "\n\n\n… cblade [討論… 2019-04-12 01:55:06 180.2…
```

### Density plot along time
Expand Down
4 changes: 2 additions & 2 deletions R42_read_json.md
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ raw$retVal[["0001"]]

## Case 1: Air-Quality (well-formatted )

前往 [https://data.gov.tw/dataset/40448](https://data.gov.tw/dataset/40448,點擊)對JSON 檔案按右鍵,然後複製連結,例如 "[https://data.epa.gov.tw/api/v2/aqx_p\_432?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=ImportDate%20desc&format=JSON"。](https://data.epa.gov.tw/api/v2/aqx_p_432?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=ImportDate%20desc&format=JSON%22。) (但是,連結地址,特別是 `api_key=9be7b239-557b-4c10-9775-78cadfc555e9`,每次都會更改。所以你必須要自己嘗試)。
前往 [https://data.gov.tw/dataset/40448](https://data.gov.tw/dataset/40448,點擊)對JSON 檔案按右鍵,然後複製連結,例如 "[https://data.epa.gov.tw/api/v2/aqx_p_432?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=ImportDate%20desc&format=JSON"。](https://data.epa.gov.tw/api/v2/aqx_p_432?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=ImportDate%20desc&format=JSON%22。) (但是,連結地址,特別是 `api_key=9be7b239-557b-4c10-9775-78cadfc555e9`,每次都會更改。所以你必須要自己嘗試)。


```r
Expand Down Expand Up @@ -384,7 +384,7 @@ glimpse(df)
### Combining all

- UVI Open data: <https://data.gov.tw/dataset/6076>
- [https://data.epa.gov.tw/api/v2/uv_s\_01?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=publishtime desc&format=JSON](https://data.epa.gov.tw/api/v2/uv_s_01?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=publishtime%20desc&format=JSON)
- [https://data.epa.gov.tw/api/v2/uv_s_01?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=publishtime desc&format=JSON](https://data.epa.gov.tw/api/v2/uv_s_01?api_key=e8dd42e6-9b8b-43f8-991e-b3dee723a52d&limit=1000&sort=publishtime%20desc&format=JSON)

#### Get from web api

Expand Down
Binary file modified V01_Learning_ggplot_files/figure-html/unnamed-chunk-25-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified V01_Learning_ggplot_files/figure-html/unnamed-chunk-26-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Z2_Exploring_data_Visually_files/figure-html/eda-boxplot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion amount.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion appendix.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion association.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion base2dplyr.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion basic.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion categorical.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion coordinate.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion crawler-overview.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion crosstab.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion dataframe.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion distribution-histogram-density.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion geospatial.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion ggplot.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
2 changes: 1 addition & 1 deletion html-parser.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down
8 changes: 4 additions & 4 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<meta name="author" content="HSIEH, JI-LUNG" />


<meta name="date" content="2024-03-17" />
<meta name="date" content="2024-03-24" />

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
Expand Down Expand Up @@ -784,15 +784,15 @@ <h1>
<div id="header">
<h1 class="title">R for Data Journalism</h1>
<p class="author"><em>HSIEH, JI-LUNG</em></p>
<p class="date"><em>2024-03-17</em></p>
<p class="date"><em>2024-03-24</em></p>
</div>
<div id="about" class="section level1 unnumbered hasAnchor">
<h1>About<a href="index.html#about" class="anchor-section" aria-label="Anchor link to header"></a></h1>
<p>這本書是寫給臺大新聞所「<strong>新聞資料分析與視覺化</strong>」課程使用。該課程並重三個面向的訓練:程式語言、視覺化、資料新聞。學生必須先能夠熟練地使用R語言來操作、讀取、清理、視覺化資料;然後以產製新聞為課程目標,了解資料要如何清理,以及選擇適合的視覺化的方法來強化新聞敘事,並避免視覺化方式引起讀者對新聞的理解謬誤。準此,本書分為幾個部分,包含PART I介紹程式語言基礎;PART II則以國際或國內新聞為個案,來介紹資料獲取(爬蟲)、清理、合併、篩選、轉換;PART III則著重如何用資料視覺化來強化敘事。</p>
<p>本書所沿用的資料分析與視覺化案例均為國內、國外的新聞案例如各國產假支薪等級、居住正義、空氣污染、人口議題、COVID-19、資源區域分佈不均、選舉與公投、運輸交通等相關議題的新聞。並大量採用紐約時報挑選作為數據理解與視覺化推廣的「<a href="https://www.nytimes.com/column/whats-going-on-in-this-graph">What’s going on in this graph?</a>」系列新聞,包含美國不同年代各年齡層的淨資產來做視覺化案例。在視覺化教材的設計上,本書大量參考紐時「<a href="https://www.nytimes.com/column/whats-going-on-in-this-graph">What’s going on in this graph?</a>」的分類與<span class="citation">(<a href="#ref-wilke2019fundamentals">Wilke 2019</a>)</span>所著「<a href="https://clauswilke.com/dataviz/">Fundamentals of Data Visualization</a>」一書的內容安排,強調利用資料視覺化方法來呈現新聞數據中的數量、分佈、比例、趨勢等,並均換用國內或紐時的相關資料新聞案例做範例,以利中文讀者的理解。</p>
<p><strong>學習路徑</strong></p>
<div class="grViz html-widget html-fill-item" id="htmlwidget-2db1c4bba43f2459a86d" style="width:672px;height:480px;"></div>
<script type="application/json" data-for="htmlwidget-2db1c4bba43f2459a86d">{"x":{"diagram":"\ndigraph G {\n fontname=\"Helvetica,Arial,sans-serif\"\n graph [layout = dot, rankdir=TD]\n node [shape = rect, height=0, fontname=\"Helvetica\", width=2]\n node [style = filled, fillcolor=\"honeydew1\"]\n edge [fontname=\"Courier\", splines=false, weight=2]\n\n # define nodes\n rbasic [label = \"R Basic\"]\n matleave[label = \"Filter & Select data\n(Maternal leave)\"]\n tptheft [label = \"Pivot Analysis\n(Taipei Theft)\"]\n \n subgraph cluster_1{\n label=\"base to dplyr\"\n style=filled; fillcolor=\"lightgrey\";\n dplyr_ml [label = \"Filter & Select data with dplyr\n(Maternal leave)\"]\n dplyr_tp [label = \"Pivot Analysis with dplyr\n(Taipei Theft)\"]\n }\n \n # define edge\n node [fillcolor=\"azure1\"]\n rbasic -> matleave -> tptheft -> {dplyr_tp, dplyr_ml} -> dplyr \n node [fillcolor=\"gold\"]\n dplyr -> {join, ggplot, textmining, scraper}\n node [fillcolor=\"lightgrey\"]\n join -> DB\n node [fillcolor=\"yellow1\"]\n ggplot -> temporal -> geospatial\n node [fillcolor=\"skyblue\"]\n scraper -> json -> htmlparser\n node [fillcolor=\"pink1\"]\n textmining -> doclevel -> wordlevel -> keyness -> POS -> sentiment\n \n # define path over edge\n edge [constraint=false, penwidth=3, color=\"#ff000033\", weight=1, splines=curved]\n rbasic -> matleave -> tptheft -> dplyr_ml\n dplyr_ml -> dplyr_tp -> dplyr -> join -> ggplot\n ggplot -> temporal -> geospatial -> textmining\n textmining -> doclevel -> wordlevel -> keyness -> scraper\n}\n","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>
<div class="grViz html-widget html-fill-item" id="htmlwidget-aa696927c88eeb1fe24b" style="width:672px;height:480px;"></div>
<script type="application/json" data-for="htmlwidget-aa696927c88eeb1fe24b">{"x":{"diagram":"\ndigraph G {\n fontname=\"Helvetica,Arial,sans-serif\"\n graph [layout = dot, rankdir=TD]\n node [shape = rect, height=0, fontname=\"Helvetica\", width=2]\n node [style = filled, fillcolor=\"honeydew1\"]\n edge [fontname=\"Courier\", splines=false, weight=2]\n\n # define nodes\n rbasic [label = \"R Basic\"]\n matleave[label = \"Filter & Select data\n(Maternal leave)\"]\n tptheft [label = \"Pivot Analysis\n(Taipei Theft)\"]\n \n subgraph cluster_1{\n label=\"base to dplyr\"\n style=filled; fillcolor=\"lightgrey\";\n dplyr_ml [label = \"Filter & Select data with dplyr\n(Maternal leave)\"]\n dplyr_tp [label = \"Pivot Analysis with dplyr\n(Taipei Theft)\"]\n }\n \n # define edge\n node [fillcolor=\"azure1\"]\n rbasic -> matleave -> tptheft -> {dplyr_tp, dplyr_ml} -> dplyr \n node [fillcolor=\"gold\"]\n dplyr -> {join, ggplot, textmining, scraper}\n node [fillcolor=\"lightgrey\"]\n join -> DB\n node [fillcolor=\"yellow1\"]\n ggplot -> temporal -> geospatial\n node [fillcolor=\"skyblue\"]\n scraper -> json -> htmlparser\n node [fillcolor=\"pink1\"]\n textmining -> doclevel -> wordlevel -> keyness -> POS -> sentiment\n \n # define path over edge\n edge [constraint=false, penwidth=3, color=\"#ff000033\", weight=1, splines=curved]\n rbasic -> matleave -> tptheft -> dplyr_ml\n dplyr_ml -> dplyr_tp -> dplyr -> join -> ggplot\n ggplot -> temporal -> geospatial -> textmining\n textmining -> doclevel -> wordlevel -> keyness -> scraper\n}\n","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

</div>
<h3>References</h3>
Expand Down
Loading

0 comments on commit 4fc1d58

Please sign in to comment.