forked from RePsychLing/SMLP2022
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathjulius_tutorial.qmd
146 lines (107 loc) · 3.28 KB
/
julius_tutorial.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: "Introduction to Chain and DataFrameMacros packages"
author: "Julius Krumbiegel"
---
This script uses a subset of data reported in Fühner, Golle, Granacher, & Kliegl (2021). Physical fitness in third grade of primary school: A mixed model analysis of 108,295 children and 515 schools.
All children were between 6.0 and 6.99 years at legal keydate (30 September) of school enrollement, that is they were in their ninth year of life in the third grade. To speed up we work with a reduced data set and less complex models than those in the reference publication. This illustrates also drawing a stratified subsample from a large data set.
```{julia}
using DataFrames
using Chain
using DataFrameMacros
using CSV
using Arrow
using Downloads
using StatsBase
using Dates
using MacroTools: prettify
datadir = joinpath(@__DIR__, "data")
```
## Readme for './data/fggk21.rds'
Number of scores: 525126
1. Cohort: 9 levels; 2011-2019
2. School: 515 levels
3. Child: 108295 levels; all children are between 8.0 and 8.99 years old
4. Sex: "Girls" (n=55,086), "Boys" (n= 53,209)
5. age: testdate - middle of month of birthdate
6. Test: 5 levels
+ Endurance (`Run`): 6 minute endurance run [m]; to nearest 9m in 9x18m field
+ Coordination (`Star_r`): star coordination run [m/s]; 9x9m field, 4 x diagonal = 50.912 m
+ Speed(`S20_r`): 20-meters sprint [m/s]
+ Muscle power low (`SLJ`): standing long jump [cm]
+ Muscle power up (`BPT`): 1-kg medicine ball push test [m]
7. score - see units
## Read the data
```{julia}
df = DataFrame(Arrow.Table(joinpath(datadir, "fggk21.arrow")))
describe(df)
```
## Extract a stratified subsample
We extract a random sample of 5 children from the Sex (2) x Test (5) cells of the design. Cohort and School are random.
```{julia}
dat = @chain df begin
@transform(:Sex2 = :Sex == "Girls" ? "female" : "male")
@groupby(:Test, :Sex)
combine(x -> x[sample(1:nrow(x), 5), :])
end
```
## Three macros: @transform, @groupby, and @chain -- one at a time
### transform and @transform -- also note the ternary operator for ifelse
### long
```{julia}
transform(
dat,
:Sex => ByRow(bla -> bla == "female" ? "girl" : "boy") => :Sex2,
)
```
```{julia}
transform(dat, :age => (col -> col .+ 1) => :ageplus)
```
### short - as used above
```{julia}
@transform(dat, :Sex2 = :Sex == "female" ? "girl" : "boy")
```
```{julia}
@transform(dat, :ageplus = :age + 1)
```
```{julia}
@transform(dat, @c :age .- mean(:age)) # @c = columnwise
```
### version 1 - traditional Julia style
```jl
df1 = DataFrame(Arrow.Table(Downloads.download(url)));
describe(df1)
```
## groupby and @groupby
```{julia}
@groupby(dat, :Age = round(Int, :age))
```
## @chain --
### reference to a file
```{julia}
url = "https://github.com/RePsychLing/SMLP2021/raw/main/notebooks/data/fggk21.arrow"
```
### version 1 - traditional Julia style
```{julia}
df1 = DataFrame(Arrow.Table(Downloads.download(url)));
describe(df1)
```
### version 2 - with Chain
```{julia}
df2 = @chain url begin
Downloads.download
Arrow.Table
DataFrame
@groupby(:Test, :Sex)
combine(x -> x[sample(1:nrow(x), 2), :])
@aside CSV.write("test.csv", _)
end;
describe(df2)
```
### look behind the scene
```{julia}
prettify(@macroexpand(@chain url begin
Downloads.download
Arrow.Table
DataFrame
end))
```