va_cs.qmd

---
title: "GDOC Recidivism Analysis"
---

## Introduction

Our DOC captures 10 Evidence Based Recidivism Reduction (EBRR) programs listed by the Federal Bureau of Prisons and the individuals involved:

* Anger Management
* The Bureau Rehabilitation and Values Enhancement Program
* Basic Cognitive Skills
* The Resolve Program
* Residential Drug Abuse Program
* Dialectical Behavior Therapy
* Sex Offender Treatment Program
* Challenge Program
* Mental Health Step Down Program
* Steps Toward Awareness, Growth, and Emotional Strength Program

Your DOC Research Department tracks clients served and staff involved in running and supporting these programs.

```{r setup, echo=FALSE, warning=FALSE, message=FALSE}
#libraries
{{< include libraries.R >}}

#data setup
{{< include data_setup.R >}}

###################################
####write out dataframes to CSV####
###################################
write.csv(roster,"roster.csv", row.names = FALSE)
write.csv(roster.update,"rosterupdate.csv", row.names = FALSE)
write.csv(staff,"staff.csv", row.names = FALSE)
write.csv(staff.update,"staffupdate.csv", row.names = FALSE)

```

```{r toggle, echo=FALSE}
#toggle data and year format
{{< include re_report.R >}}
{{< include execute.R >}}
{{< include toggle.R >}}

#colors
date1c  <- "deepskyblue1"
date2c  <- "darkolivegreen3"
staffc  <- "brown"
hlinew1 <- "orange"
hlinew2 <- "darkgrey"
```

```{r remove, echo=FALSE}
#programs to remove per the CWC report
{{< include rm_pgms.R >}}
```

## Data Exploration

```{r explore1, echo=FALSE, include=FALSE}
#what's in our data
print(dfSummary(roster, varnumbers = FALSE, valid.col = FALSE), 
      method = "render", footnote = NA)

#capture number of columns for printing in text
numcol <- ncol(roster)
```

Let's take a look at this EBRR program data. The name of our data is ``r dataname1``. Trying to gather anything from raw data row by row can be painful. We need to explore and synthesize what variables/columns we have, and get a quick summary of what they all look like. We know that GDOC has 10 programs. How can we find out more?

::: {.panel-tabset .nav-pills}

## Quick look

```{r look}
datatable(roster, rownames=FALSE, options=list(pageLength=5, dom='ltip'))
```

## Summary

```{r explore2}
#| ref-label: 'explore1'
```

:::

From what the summary above shows us, it appears we have `r numcol` variables in the ``r dataname1`` dataset. What further exploring, cleaning, and manipulation is required for us to successfully produce results for Director Summers?

## Data Cleaning and Manipulation

### Duplicates

What other pieces of information might be relevant to what we need to know about the data? Since it appears to be person-level data from our data exploration summary, let's check to make sure that there aren't any duplicate observations.

```{r explore4}
#are there any duplicates?
roster[duplicated(roster) | duplicated(roster, fromLast=TRUE),]

#how many duplicates?
dupct <- length(unique(
  roster[duplicated(roster) | duplicated(roster, fromLast=TRUE),]
  ))
```

<br />

It's a really good thing we checked! From the table above it appears we have `r dupct` duplicate observations/rows in our data. Let's remove them and keep exploring!

```{r nodup}
#| code-fold: show
#deduplicate across all columns
{{< include dedup.R >}}

#check for dups again
roster.nodup[duplicated(roster.nodup) | duplicated(roster.nodup, fromLast=TRUE),]

```

<br />

Alright! No more duplicates! What else could require cleaning that we haven't thought of?

### Recoding

We need to take a closer look at our other variables that may help us report out what the GDOC director needs. Let's start with our `programs`, `dt`, and `ret`.

::: {.panel-tabset .nav-pills}

## PROGRAMS

```{r explore5}
#count total number of programs
define_keywords(title.freq = "PROGRAMS values")
print(freq(roster.nodup$programs, report.nas = FALSE, cumul = FALSE, display.type = FALSE), 
      method = "render", footnote = NA, Variable = "")

#count number of programs
prgnum <- n_distinct(roster.nodup$programs)
```

## DT

```{r explore6}
#check out weird date values
yeardt <- as.factor(year(roster.nodup$dt))
define_keywords(title.freq = "DT values")
print(freq(yeardt, report.nas = FALSE, cumul = FALSE, display.type = FALSE), 
      method = "render", footnote = NA, Variable = "")
```

## RET

```{r explore7}
#check out weird return values
define_keywords(title.freq = "RET values")
print(freq(roster.nodup$ret, report.nas = FALSE, cumul = FALSE, display.type = FALSE), 
      method = "render", footnote = NA, Variable = "")
```

:::

Hm - it looks like there are more than 10 programs; `r prgnum` programs to be exact. That doesn't match what you know about your GDOC EBRR programs! Could there be something wrong with the data? It looks like there are also some errors in your data across `programs`, `ret`, and `dt`! 

We'll probably have to make some assumptions on our data. For example, `ret` must be our variable that indicates whether an individual enrolled in an EBRR program returned to prison within 365 days of their release date. While the majority of the values are 0s and 1s, a select few are greater than 1 or less than 0. Clean them up and check your work so you can accurately report all EBRR programs and their associated recidivism rates.

::: {.panel-tabset .nav-pills}

## PROGRAMS Clean

```{r cleanroster1}
#clean program names
{{< include roster_clean.R >}}

#review cleaned program names
roster.clean |>
  count(programs_clean)
```

## DT Clean

```{r cleanroster2}
#review date values
roster.clean |>
  count(year(dt))
```

## RET Clean

```{r cleanroster3}
#review out weird return values
roster.clean |>
  count(ret)

```

:::

Much better! 10 programs (`programs_clean`) as expected for our DOC, and cleaned dates (`dt`) and returns (`ret`)! 

### Calculating rates

Now that we have a clean dataset, we can finally calculate recidivism rates for all of our programs. Since we appear to have release dates spanning two years from the `dt` column, from `r min(roster.clean$dt)` to `r max(roster.clean$dt)`, perhaps we should calculate recidivism rates overall and by release year.

::: {.panel-tabset .nav-pills}

## Overall

```{r explore8}
##create dataset of numerators and denominators
#recidivism rates overall
{{< include rates.R >}}

#verify that join did not lose any observations
triplecheck <- anti_join(roster2.1, roster2.2, by = "programs_clean")


#print out overall rates
roster2.1 |>
  arrange(programs_clean) |>
  select(programs_clean,recid_rate_all) |> datatable(rownames=FALSE, colnames=c('Program', 'Overall Recidivism Rate'), 
                          options=list(pageLength=10, dom='t'))
```

## By Year

```{r explore9}
#CROSSTALK by year rates
shared_roster2 <- SharedData$new(roster2.2 |>
  arrange(year,programs_clean) |>
  select(year,programs_clean,recid_rate_year))

filter_checkbox("year", "Select Year", shared_roster2, ~year, inline=FALSE)
datatable(shared_roster2, rownames=FALSE, colnames=c('Year', 'Program', 'Recidivism Rate'), 
                          options=list(pageLength=10, dom='tip'))
```

:::

### Staff data

Great work! Now let's take a look at our program staffing! Our DOC captures 10 Evidence Based Recidivism Reduction (EBRR) programs listed by the Federal Bureau of Prisons and the individuals involved. The name of our data is ``r dataname2``.

```{r explorestaff1}
#what's in our data
print(dfSummary(staff, varnumbers = FALSE, valid.col = FALSE), 
      method = "render", footnote = NA)

#capture number of columns for printing in text
numcolst <- ncol(staff)
```

<br />

It appears we only have `r numcolst` variables in the program staffing data. Let's keep exploring! It appears to be person-level data **again**! Why don't we check for duplicates just in case.

```{r explorestaff4}
#| code-fold: show
#are there any duplicates?
staff[duplicated(staff) | duplicated(staff, fromLast=TRUE),]
```

<br />

```{r staffval, include=FALSE}
#count total number of programs
prgnum.stf <- n_distinct(staff$prg)
```

Phew! No duplicates. That was a close one. 

Looking closer at the summary, yet again we have data with more than 10 programs; `r prgnum.stf` to be exact. And there appear to be some errors in the data (again!?)! Clean them up so you can accurately report all EBRR programs and their associated program staff, and let's see how many staff we have by program! We'll be able to use this in our final report to our Director.

::: {.panel-tabset .nav-pills}

## PRG

```{r explorestaff5}
#count total number of programs
define_keywords(title.freq = "PRG values")
print(freq(staff$prg, report.nas = FALSE, cumul = FALSE, display.type = FALSE), 
      method = "render", footnote = NA, Variable = "")
```

## PRG Clean

```{r cleanstaff}
#clean program names
{{< include staff_clean.R >}}

#check cleaned program names
staffcheck <- staff.clean |>
  count(programs_clean,prg)

#print staffing
staff.clean |>
  count(programs_clean)

```

:::

## Reporting Results

### Data Visualizations

We have to get out those results **now!** Let's combine the program staff and recidivism rates data so we can print out a table for our Director. Create some tables and put them into a format the Director will appreciate.

::: {.panel-tabset .nav-pills}

## Tables: Overall

```{r merge}
#create table dataset
{{< include tabledata.R >}}

#verify join was successful
joincheck <- anti_join(roster2, staff2, by = ("programs_clean"))

#print out overall rates and staff
{{< include finaltable_report.R >}} 
reportit |> 
  datatable(extensions = 'Buttons', rownames=FALSE, colnames=c('Program', 'Recidivism Rate', 'Staffing'), 
                          options=list(pageLength=10, dom='Bt',
  buttons = list(
          list(extend = "csv", text = "Download Data", filename = "data",
               exportOptions = list(
                 modifier = list(page = "all")))
)
)
)
```

## Tables: By Year

```{r merge2}
#CROSSTALK by year rates and staff
tabout2 <- SharedData$new(tabout |> 
  select(c(year, program_official, recid_rate_year, num_staff)))

filter_checkbox("year", "Select Year", tabout2, ~year, inline=FALSE)
datatable(tabout2, extensions = 'Buttons', rownames=FALSE, colnames=c('Year', 'Program', 'Recidivism Rate', 'Staffing'), 
                          options=list(pageLength=10, dom='Btip',
 buttons = list(
          list(extend = "csv", text = "Download Table View", filename = "view_year",
               exportOptions = list(
                 modifier = list(page = "current")
               )
          ),
          list(extend = "csv", text = "Download Data", filename = "data",
               exportOptions = list(
                 modifier = list(page = "all")))
)
)
)
```

:::

::: {.callout-important}
## Important

Save this for the final report!
:::

These tables are fantastic! But I recall that our Director is a bit of a "visual" person. Can we turn these into some pretty charts?

::: {.panel-tabset .nav-pills}

## Plots: Overall

```{r basicviz1, warning=FALSE}
#basic bar chart of overall recidivism rate by program
ggplot(tabout |>
         filter(year == date1)
       ,aes(x=programs_clean, y=recid_rate_all)) +
  geom_bar(stat="identity")
```

## Plots: By Year

```{r basicviz2, warning=FALSE}
#basic bar chart of recidivism rate by year by program
ggplot(tabout,aes(x=programs_clean, y=recid_rate_year,fill=year)) +
         geom_bar(position="dodge", stat="identity")

```

:::

Oh I think we could do better than that!

::: {.panel-tabset .nav-pills}

## Plot

```{r dataviz1, warning=FALSE}
#build bar chart of recidivism rates across programs
#information to plot, pick dates
dates <- as.numeric(c(date1,date2)) #what years of data do you want to plot?

#custom title header of plot
titledates <- ifelse(length(dates)>=2 & date1 != date2, paste0(date1," - ",date2),
                     ifelse((dates==date1 | dates==date2) & ALL.BY, as.character(dates),
                            ifelse(!ALL.BY, date1, "")))

#which years/programs are missing data?
prg.NA <- tabout |> 
  filter(is.na(recid_rate_year)) |>
  pull(programs_clean)

#caption text about missing program data
{if(length(prg.NA)!=0) cond.text <- capture.output(cat("The following programs were missing data in some years:", unique(toupper(prg.NA)), sep=" ")) else cond.text <- ""}

#plot it! this will plot recidivism rates with overlaid staffing text
rr <- ggplot(tabout |> 
               filter(if(ALL.BY) year %in% dates else year == date2) |>
               mutate(recid_rate = case_when(ALL.BY  ~ recid_rate_year,
                                             !ALL.BY ~ recid_rate_all))
             ,aes(x=programs_clean, y=recid_rate, fill=year)) +
  geom_bar(position = "dodge",stat = "identity", na.rm=TRUE) +
  geom_text(aes(label=ifelse(year==dates[2],paste(num_staff,"staff"),"")), vjust=-0.3, color = staffc, na.rm=TRUE) +
  scale_fill_manual(values=c(date1c,date2c)) +
  ylim(0,1) +
  ylab("Recidivism Rate") +
  xlab("EBRR Programs") +
  ggtitle(paste0("Recidivism Rates across EBRR programs\n",titledates)) +
  theme_classic() +
  #remove legend if plotting overall (not by year)
  {if(!ALL.BY) theme(legend.position="none")}+
  #only print caption if a program is missing data
  labs(caption = cond.text) +
  theme(plot.caption=element_text(hjust=0))

#display
rr
```

```{r dataviz1print, echo=FALSE}
suppressMessages(download_this(rr, button_label = "Download Plot", class = "button_large", output_name = paste0("recidivism_rate_",date1,"_to_",date2)))
```

## Interactive Plot

```{r, warning=FALSE}
#keep or hide legend depending on overall or by years
{if (ALL.BY) cond.leg <- T else cond.leg <- F}

hc_setup <- highchart() |> 
  hc_tooltip(formatter = JS("function(){return(this.point.tooltip)}")) |>
  hc_title(text = paste0("Recidivism Rates across EBRR programs\n",titledates)) |>
  hc_xAxis(title = list(text = "EBRR Programs"), type = "category", labels = list(style = list(width = 200))) |>
  hc_yAxis(title = list(text = "Recidivism Rate"), max = 1) |>
  hc_legend(enabled = cond.leg) |>
  hc_caption(text = cond.text) |>
  hc_add_dependency(name = "modules/exporting.js") |>
  hc_exporting(enabled = TRUE,
               chartOptions  = list(
                 chart = list(
                   backgroundColor = 'white')),
               buttons       = list(
                 contextButton = list(
                   menuItems = list("downloadPNG", "downloadSVG"))))
#overall
{if (!ALL.BY)
hc_setup |>
  hc_add_series(data = tabout |> 
                  filter(year == date2) |>
                  mutate(tooltip = paste0("<b>", program_official, "</b><br>",
                                   "Recidivism Rate: ",recid_rate_all, "<br>",
                                   "Staffing: ", num_staff)), 
                hcaes(x=program_official, y=recid_rate_all), 
                color = "lightblue", 
                type  = "bar")
}

#by year
{if (ALL.BY)
hc_setup |> 
  hc_add_series(data = tabout |> 
                  filter(year %in% dates) |>
                  mutate(tooltip = paste0("<b>", program_official, "</b><br>",
                                   "Recidivism Rate: ",recid_rate_year, "<br>",
                                   "Staffing: ", num_staff)), 
                hcaes(x=program_official, y=recid_rate_year, group=year), 
                color = c("lightblue","darkgreen"), 
                type  = "bar")
}
```

:::

::: {.callout-important}
## Important

Save this for the final report!
:::

### CWC Damned Lies and Statistics

```{r advreport, echo=FALSE, include=FALSE}
#remove 5 of the 10 programs because the advocacy group was sneaky
adv <- tabout |>
  filter(!(programs_clean %in% rm.pgms) &
             year == date1) #dates repeat the same information, so just pick one date to average over
#calculate ADVOCACY rate, which will be inserted into document text
adv_rate <- round(mean(adv$recid_rate_all,na.rm=TRUE)*100,1)
cat(capture.output(cat(paste0(adv_rate,"%"), "average recidivism rate overall for the following programs:", unique(tabout[which(!tabout$programs_clean %in% rm.pgms),]$programs_clean), sep=" ")))

```

This was amazing work; our Director is so happy! But wait! Oh no!! The Center Wing Coalition advocacy group just published [a report](cwc_report.html) that EBRR programs' recidivism rates are at an all time high of `r adv_rate`% with a report that claims to have used **your** DOC's reported data on EBRR program recidivism rates! Find out what's going on, and fast!

```{r unweighted}
#manage the data to produce recidivism rates
{{< include cwc_unw.R >}}

#verify join was successful
doublecheck <- anti_join(roster2, staff2, by = ("programs_clean"))

#print values
print(paste0(unw.a*100,"%", " average recidivism rate overall"))
print(paste0(unw.d1*100,"%", " average recidivism rate in ",date1))
print(paste0(unw.d2*100,"%", " average recidivism rate in ",date2))

```

Hm - something still doesn't line up. We need to keep investigating and find out why our numbers aren't matching up! 

```{r}
#| code-fold: show
#| ref-label: 'remove'
```


```{r giveemwhattheywant}
#| ref-label: 'advreport'
```

### Data-Informed Reporting

Alright - there's the number the advocacy group reported. But what's missing? Our Director is not going to be satisfied with just replicating the Center Wing Coalition results! What if we considered calculating a weighted recidivism rate?

```{r weighted1}
#manage the data to produce recidivism rates
#total clients served (all years, year1, year2)
{{< include cwc_w.R >}}

#print values
print(paste0(w.a*100,"%", " average recidivism rate (weighted) overall"))
print(capture.output(cat(paste0(w.a5*100,"%"), "average recidivism rate (weighted) overall for the following programs:", unique(tabout[which(!tabout$programs_clean %in% rm.pgms),]$programs_clean), sep=" ")))
```

Alright! If we just weight our data then we see that the average overall recidivism rate across the five programs that the advocacy group highlighted is only `r w.a5*100`%. Great work!

Now let's report it through some fancy data visualization work.

::: {.callout-note}
Change the highlighted code below (`ALL.BY`, `CWC`, and plot colors) to update your output.
:::

{{< include va_cs_webr.qmd >}}

::: {.callout-tip}
## Download your plot!

A `Download Image` button will appear when you hover over the plot.
:::

## More Data Viz!

Let's prepare our data to do some really fun data viz! What are some other engaging ways we could plot recidivism rates for leadership and our stakeholders pooled overall for these programs?

```{r dataviz4, echo=FALSE, warning=FALSE, message=FALSE, include=FALSE}
#this code will run if plotting data for multiple years, otherwise nothing will be produced (i.e., ALL.BY <- T)

#manipulate data for plotting
tabout.date1 <- tabout |>
  filter(year==date1) |>
  select(c(recid_rate_year, programs_clean, recid_rate_all)) |>
  rename(recid_rate_date1 = recid_rate_year)
tabout.date2 <- tabout |>
  filter(year==date2) |>
  select(c(recid_rate_year, programs_clean)) |>
  rename(recid_rate_date2 = recid_rate_year)
tabout.dates <- inner_join(tabout.date1, tabout.date2, by = "programs_clean") |>
  select(programs_clean, recid_rate_date1, recid_rate_date2, recid_rate_all)

#make some really cool horizontal floating dot charts!
#overwrite value of rates to overall if ALL.BY
{if(!ALL.BY) tabout.dates$recid_rate_date1 <- tabout.dates$recid_rate_all}

#plot two years or one year depending on ALL.BY setting
{if(ALL.BY) plotit <- c(tabout.dates[which(tabout.dates$programs_clean=="stages"),]$recid_rate_date1, tabout.dates[which(tabout.dates$programs_clean=="stages"),]$recid_rate_date2) else plotit <- tabout.dates[which(tabout.dates$programs_clean=="stages"),]$recid_rate_date1}

#remove label legend if by year
{if(ALL.BY) titledates2 <- c(as.factor(date1),as.factor(date2)) else titledates2 <- ""}

#plot!
gg_dot <- tabout.dates |>
  # rearrange the factor levels for programs by rates for date1
  arrange(recid_rate_date1) |>
  mutate(discipline = fct_inorder(programs_clean)) |>
  
  ggplot() +
  # remove axes and superfluous grids
  theme_classic() +
  theme(axis.title = element_blank(),
        axis.ticks.y = element_blank(),
        axis.line = element_blank()) +
  
  # add a dummy point for scaling purposes
  geom_point(aes(x = 0.7, y = programs_clean), 
             size = 0, col = "white") + 
  
  # add the horizontal programs_clean lines
  geom_hline(yintercept = 1:length(tabout.dates$programs_clean), col = "grey80") +
  
  # add a point for each date1 recidivism rate
  geom_point(aes(x = recid_rate_date1, y = programs_clean), 
             size = 11, col = date1c) +

  # add a point for each date2 recidivism rate
 {if(ALL.BY) geom_point(aes(x = recid_rate_date2, y = programs_clean),size = 11, col = date2c)} + 

  # round each date2 recidivism rate
  {if(ALL.BY) geom_text(aes(x = recid_rate_date2, y = programs_clean, label = paste0(round(recid_rate_date2, 2))), col = "black")} +

  # round each date1 recidivism rate
  geom_text(aes(x = recid_rate_date1, y = programs_clean, 
                label = paste0(round(recid_rate_date1, 2))),
            col = "white") +

  # add a label above the first two points
  geom_text_repel(aes(x = x, y = y, label = label, col = label), force_pull = 50,
            data.frame(x     = plotit, 
                       y     = length(tabout.dates$programs_clean) + 2, 
                       label = titledates2), size = 6) +
  scale_color_manual(values = c(date1c, date2c), guide = "none") +
  
  # manually specify the x-axis
  scale_x_continuous(breaks = c(0, 0.25, 0.5, 0.75, 1), 
                     labels = c("0","0.25", "0.50", "0.75", "1")) +
  # manually set the spacing above and below the plot
  scale_y_discrete(expand = c(0.2, 0)) 

#add titles/captions
gg_dot + 
  {if (ALL.BY) ggtitle("Recidivism Rates across EBRR programs\n") else ggtitle(paste0("Recidivism Rates across EBRR programs\n",titledates))} +
  #only print caption if a program is missing data
  labs(caption = cond.text) +
  theme(plot.caption=element_text(hjust=0))

```

### Plotting Overall

::: {.panel-tabset .nav-pills}

## Dots

```{r, include=FALSE}
ALL.BY <- F
```

```{r dataviz5, warning=FALSE}
#| ref-label: 'dataviz4'
```

## Lollipops

```{r dataviz6, warning=FALSE, message=FALSE}
##horizontal lollipop chart
ggplot(tabout, aes(x=programs_clean, y=recid_rate_all)) +
  geom_segment( aes(x=programs_clean, xend=programs_clean, y=0, yend=recid_rate_all), color=date1c) +
  geom_point( color=staffc, size=4, alpha=0.6) +
  theme_light() +
  coord_flip() +
  xlab("EBRR Programs") +
  ylab("Recidivism Rate") +
  theme(
    panel.grid.major.y = element_blank(),
    panel.border = element_blank(),
    axis.ticks.y = element_blank()
  ) + 
  ggtitle(paste0("Recidivism Rates across EBRR programs\n",titledates)) +
  theme(plot.caption=element_text(hjust=0)) +
  #only print caption if a program is missing data
  labs(caption = cond.text)
```

## More Lollipops!

```{r dataviz7, warning=FALSE, message=FALSE}
##horizontal lollipop chart w/weighted average
ggplot(tabout, aes(x=programs_clean, y=recid_rate_all)) +
  geom_segment(aes(x=programs_clean, xend=programs_clean, y=w.a, yend=recid_rate_all), color=date1c) +
  geom_point(color=staffc, size=4, alpha=0.6) +
  geom_hline(yintercept=w.a, linetype = "dashed", color = hlinew1, size = 1) +
  geom_label(aes(label=paste0("Weighted avg: ",w.a), x=w.a, vjust = -9, hjust = 0.75), fill=hlinew1,
                  data = tabout |>
               filter(programs_clean == last & year == date2)) +
  theme_light() +
  coord_flip() +
  xlab("EBRR Programs") +
  ylab("Recidivism Rate") +
  theme(
    panel.grid.major.y = element_blank(),
    panel.border = element_blank(),
    axis.ticks.y = element_blank()
  ) + 
  ggtitle(paste0("Recidivism Rates across EBRR programs\n",titledates)) +
  #only print caption if a program is missing data
  labs(caption = cond.text) +
  theme(plot.caption=element_text(hjust=0))


```

:::

### Plotting by Year

What about displaying these rates by release year?

::: {.panel-tabset .nav-pills}

## Dots

```{r, include=FALSE}
ALL.BY <- T
```

```{r dataviz8, warning=FALSE}
#| ref-label: 'dataviz4'
```

## Lines

```{r dataviz9, warning=FALSE, message=FALSE}
#plot!
gg_line <- tabout.dates |>
  # add a variable for when rates are higher in date1 than in date2 (for colours)
  mutate(date1high = recid_rate_date1 > recid_rate_date2) |>
  ggplot() +
  # add a line segment that goes from date1 to date2 for each program
  geom_segment(aes(x     = 1,                xend = 2, 
                   y     = recid_rate_date1, yend = recid_rate_date2,
                   group = programs_clean,
                   col = date1high), 
               size = 1.2) +
  # set the colors
  scale_color_manual(values = c(date1c, date2c), guide = "none")  +
  # remove all axis stuff
  theme_classic() + 
  theme(axis.line  = element_blank(),
        axis.text  = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank()) +
  # add vertical lines that act as axis for date1
  geom_segment(x    = 1, 
               xend = 1, 
               y    = min(tabout.dates$recid_rate_date1, na.rm=T) - 0.1,
               yend = max(tabout.dates$recid_rate_date1, na.rm=T) + 0.125,
               col  = "grey70", size = 0.5) +
  # add vertical lines that act as axis for date2
  geom_segment(x    = 2, 
               xend = 2, 
               y    = min(tabout.dates$recid_rate_date1, na.rm=T) - 0.1,
               yend = max(tabout.dates$recid_rate_date1, na.rm=T) + 0.125,
               col  = "grey70", size = 0.5) +
  # add the labels above their axes
  geom_text(aes(x = x, y = y, label = label),
            data = data.frame(x = 1:2, 
                              y = max(tabout.dates$recid_rate_date2, na.rm=T) + 0.05,
                              label = c(date1, date2)),
            col = "grey30",
            size = 6)  +
  # add the label and rate for each program next the date1 axis
  geom_text_repel(aes(x     = 1 - 0.03, 
                      y     = recid_rate_date1, 
                      label = paste0(programs_clean, ", ", round(recid_rate_date1, 2))),
             force_pull   = 0,
             nudge_y      = 0.05, nudge_x = -0.075,
             direction    = "y",
             hjust        = 1,
             segment.size = 0.2,
             max.iter = 1e4, max.time = 1) +
  # add the rate next to each point on the date2 axis
  geom_text(aes(x = 2 + 0.08, 
                y = recid_rate_date2, 
                label = paste0(round(recid_rate_date2, 2))),
            col = "grey30") +
  # set the limits of the x-axis so that the labels are not cut off
  scale_x_continuous(limits = c(0.5, 2.1)) + 
  
  # add the white outline for the points at each rate for date1
  geom_point(aes(x = 1, 
                 y = recid_rate_date1), size = 4.5,
             col = "white") +
  # add the white outline for the points at each rate for date2
  geom_point(aes(x = 2, 
                 y = recid_rate_date2), size = 4.5,
             col = "white") +
  
  # add the actual points at each rate for date1
  geom_point(aes(x = 1, 
                 y = recid_rate_date1), size = 4,
             col = "grey60") +
  # add the actual points at each rate for date2
  geom_point(aes(x = 2, 
                 y = recid_rate_date2), size = 4,
             col = "grey60") 
  
gg_line +
  ggtitle("Recidivism Rates across EBRR programs\n") +
  #only print caption if a program is missing data
  labs(caption = cond.text) +
  theme(plot.caption=element_text(hjust=0))
```

## Interactive Bars

```{r, warning=FALSE, message=FALSE}
highchart() |>
  hc_add_series(data = tabout |> 
                  filter(year %in% dates) |>
                  mutate(recid_rate = ifelse(year == date1, -1*recid_rate_year, recid_rate_year),
                         tooltip    = paste0("<b>", program_official, "</b><br>",
                                             "Recidivism Rate: ", abs(recid_rate), "<br>",
                                             "Staffing: ", num_staff)), 
                hcaes(x=program_official, y=recid_rate, group=year), 
                color = c("lightblue","darkgreen"), 
                type  = "bar", 
                showInLegend = F) |> 
  hc_plotOptions(bar = list(stacking = "normal")) |>
  # format the labels on the x-axis (y-axis per HC)
  hc_yAxis(labels = list(formatter = htmlwidgets::JS( 
           "function() {return Math.abs(this.value); /* all labels to absolute values */
            }"
           )), title = list(text = "Recidivism Rate"), min = -1, max = 1) |>
  hc_tooltip(formatter = JS("function(){return(this.point.tooltip)}")) |>
  hc_xAxis(title = list(text = "EBRR Programs"), type = "category", labels = list(style = list(width = 200))) |>
  hc_caption(text = cond.text) |>
  hc_title(   text = date1, align = "center", x = 0, y = 20, margin = 0,
              style = list(fontSize = "12px", color = "lightblue")) |>
  hc_subtitle(text = date2, align = "center", x = 250, y = 20, margin = 0, 
              style = list(fontSize = "12px", color = "darkgreen"))
```

:::

## Final Report

This exploratory document has been really useful for our internal purposes! But what if we want to get all of the pertinent info into a single report for your Director in a format they can actually digest; something similar to the original report?

::: {.callout-tip collapse=true}
## Expand to view the R Markdown that produces the PDF/DOCX
```{r rendered, echo=FALSE, output=TRUE}
#code that is rendered
print(noquote(scan("finaltable.Rmd", what=character(), skip=0, nlines=98, sep='\n')))
```
:::

```{r tablefinal, include=FALSE}
#render final report for director

##create PDF output
quarto_render("finaltable.Rmd", output_format = "pdf")

##create word doc output
render(       "finaltable.Rmd", output_file   = "finaltable.docx")

#insert image
sample.doc                      <- read_docx("finaltable.docx")                                                             #read in rendered doc without image
sample.doc$officer_cursor$which <- 6                                                                                        #set position to place image
sample.doc                      <- body_add_img(sample.doc, src = "rrfinal.png", width = 6.4, height = 4.8, pos = "before") #place image
print(sample.doc, target = "finaltable.docx")                                                                               #recreate output
```

![](finaltable.pdf){width=100% height=1100}

```{r tablefinaldoc, echo=FALSE}
suppressMessages(download_file("finaltable.docx", button_label = "Download Word DOCX", class = "button_large", output_name = "finaltable"))
```

## R Session

```{r, collapse=TRUE}
#for reproducibility
si <- sessioninfo::session_info()
si$packages$library <- NULL
si$platform$pandoc <- NULL
si
```