Skip to content

Latest commit

 

History

History
131 lines (84 loc) · 57 KB

README.md

File metadata and controls

131 lines (84 loc) · 57 KB

Building this document

To build this README, run build_readme.R. Talks data is in csv talks_table.csv

Workshops

Leon Eyrich Jessen (Technical University of Denmark)
Artificial Neural Networks in R with Keras and TensorFlow

Abstract

This workshop is introductory and open to everyone assuming basic R/Data Science skills. Please note, the workshop is very hands-on oriented, so expect to get your fingers dirty! The aim will be an introduction to ANNs in R. ANNs form the basic unit of deep learning and are immensely powerful in predictive modelling, but not without pitfalls. In this workshop, we will be working with conceptually understanding what an ANN is, how we train an ANN and how predictions are subsequently made. We will also touch upon parameters, hyper-parameters and how to handle data all in context of model over-fitting. All of the aforementioned will be done using TensorFlow via Keras for R.

Link to Workshop Material

Mike Stackhouse (Atorus), Nathan Kosiba (Atorus)
Multilingual Markdown with R and Python Using Reticulate
Abstract

We will be presenting an overview of the interoperability between Python and R for the R user community at R/Pharma 2020. This workshop will highlight how statistical programmers can leverage the power of both R and Python in their daily processes. Participants will get hands on experience working with some of the best aspects of both R and Python, and how these two languages can work together within R Markdown.

Link to Workshop Material      Workshop Recording

Stefano Mangiola (Walter and Eliza Hall of Medical Research), Maria Doyle (Peter MacCallum Cancer Center)
Tidy Transcriptomics
Abstract

In this workshop we will present how to perform analysis of RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions. We can achieve this for bulk RNA sequencing data with the tidybulk, tidyHeatmap and tidyverse packages. We will also touch on packages for tidy single-cell transcriptional analyses. These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data.

Recommended pre-requisites

  • Basic knowledge of RStudio
  • Some familiarity with tidyverse syntax
  • Background Reading Introduction to R for Biologists

Link to Workshop Material

Andy Nicholls (Glaxosmithkline), Marly Gotti (Biogen)
Implementing a Risk-based Approach to R Validation
Abstract

In this workshop we will walk through an implementation of the R Validation Hub's white paper: A Risk-based Approach for Assessing R Package Accuracy within a Validated Infrastructure (https://www.pharmar.org/white-paper/). The workshop will explore two core themes:

  1. R Packages Risk Assessment

  2. Testing

In part 1, we will use a small set of pre-selected R packages to see how the R Validation Hub's Risk Assessment Application and the riskmetric R package can be used to create risk assessment reports for an R package.

In part 2, we will discuss how testing can be used to reduce the risk for those packages with high risk. In particular, we will discuss the testing philosphy with respect to software validation and demonstrate how the 'testthat' package can be used to perform the necessary steps to test traceability requirements.

Prior knowledge of the basic structure of R packages is required for the second part of this workshop.

Link to Workshop Material      Workshop Recording

Alison Hill (RStudio), Tom Mock (RStudio)
RMarkdown
Abstract

A four-hour workshop that will take you on a tour of how to get from data to manuscript using R Markdown. You'll learn:

  • The basics of Markdown and knitr
  • How to add tables for different outputs
  • Workflows for working with data
  • How to include and style graphics

Link to Workshop Material

Daniel Lee (Generable)
Stan
Abstract

This is a 3-hour workshop on Stan (https://mc-stan.org). The overall goal of the workshop will be to make the best use of time to answer as many Stan-related questions as possible. The level of the workshop will be intermediate to advanced, but anyone is welcome to join.

The workshop will be taught by Daniel Lee. Daniel is one of the original Stan developers (started in 2011). He's been involved in the whole stack: language, CmdStan, RStan, PyStan, continuous integration, setting up the forums, StanCon, and more. He's had a lot of experience with debugging computational issues, the crossover between statistical models and computational, understanding how all the pieces fit in together, and knowing a lot of different ways to accomplish the same thing in the Stan language.

The format of this workshop won't be a straight online lecture. I'm personally tired of Zoom meetings; I don't think the intro course I teach works in this format. Instead, we'll have a blend of an instructor-led example, a masterclass, and an AMA. Please come with questions or ask them as we go along. Here's a rough plan (but we can deviate from this):

  1. Brief introduction to Stan. Goal: understand what Stan is, what the inferences are (and why they're different), and agree on terminology.

  2. Walkthrough an example (survival or PK/PD model). Show differences between posterior distributions and point estimates. Maybe discuss quality of MCMC sampling.

  3. One-on-one with a participant. Walk through their problem, attempt at a solution, walk through the different modeling choices we're making, how to structure simulated data, etc.

  4. One-on-one with another participant.

  5. Questions / Wrap up.

Link to Workshop Material

Will Landau (Eli Lilly)
targets / Reproducible Pipelines
Abstract

Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets R package keeps results up to date and reproducible while minimizing the number of expensive tasks that actually run. Targets learns how your pipeline fits together, skips costly runtime for steps that are already up to date, runs the rest with optional implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the output matches the underlying code and data. In other words, the package saves time while increasing our ability to trust the conclusions of the research. Targets surpasses the most burdensome permanent limitations of its predecessor, drake, to achieve greater efficiency and provide a safer, smoother, friendlier user experience. This hands-on workshop teaches targets using a realistic case study. Participants begin with the R implementation of a machine learning project, convert the workflow into a targets-powered pipeline, and efficiently maintain the output as the code and data change.

R proficiency: intermediate and above required.

Link to Workshop Material

Rich Iannone (RStudio)
gt
Abstract

Link to Workshop Material

David Granjon (Novartis), John Coene (World Economic Forum)
Unleash Shiny
Abstract

In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny's greatest strengths is that it allows producing web applications solely from R code, meeting client's more delicate expectations will often involve going beyond R code and work with HTML, CSS, and JavaScript.

We recognize that R developers tend not to be familiar with the latter as they generally do not have significant background in web development, these may therefore appear daunting at first.

This workshop aims to put attendees at ease with inviting those web technologies into their shiny applications so they can exceed client's expectations. The workshop will comprise three parts.

Part 1 hones in on the development of a new template on top of Shiny with the {htmltools} package. Workshop attendees will have the opportunity to collaborate with the RinteRface team on the {shinybulma} project (https://github.com/RinteRface/shinybulma).

Part 2 delves into bi-directional communication in Shiny: how the R server communicates with the front-end and vice versa, how the input/output system works.

Part 3 ends the workshop by exposing all the less known functions/methods that are however likely to help you in your Shiny journey!

Prerequisites for the workshop:

  • Be proficient with Shiny
  • Basic knowledge about R6
  • Be proficient with package development
  • JavaScript/CSS skills may help but are not mandatory

[Link to Workshop Material](https://rinterface.com/shiny/talks/RPharma2020/)

Devin Pastoor (Metrum Research Group), Kyle Baron (Metrum Research Group)
A PK & PBPK Modelling Workflow in R: Simulation, Optimization & Visualization
Abstract

After a brief introduction to mrgsolve (https://mrgsolve.github.io), we will discuss concepts and applications for using the package in R to simulate from pharmacokinetic (PK) and physiologically-based PK (PBPK) models, estimate parameters given a model and data, and visualize simulation results with a Shiny app. We will establish a basic framework for running optimization in R and work hands-on examples using different optimizers, including local and global search algorithms. Building on this framework, we will also illustrate related workflows including global and local sensitivity analysis. Finally, we will develop and deploy a Shiny app using Rstudio Connect, allowing interaction with the model and optimization results by non-modeling stakeholders.

[Link to Workshop Material](https://github.com/metrumresearchgroup/r-pharma-pkpd-2020)

Emil Hvitfeldt (University Southern California)
Predictive modeling with text using tidy data principles
Abstract

Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems? Do you need a flexible framework for handling text data that allows you to engage in tasks from exploratory data analysis to supervised predictive modeling? This tutorial is geared toward an R user with intermediate familiarity with R, RStudio, the basics of regression and classification modeling, and tidyverse packages such as dplyr and ggplot2. This person is comfortable with the main functions from dplyr and ggplot2 and is now ready to learn how to analyze and model text using tidy data principles. This R user has some experience with statistical modeling (such as using lm() and glm()) for prediction and classification and wants to learn how to build models with text.

[Link to Workshop Material](https://textmodels4pharma.netlify.app/)

Talks - Day 1

Max Kuhn (RStudio), Simon Couch (Reed College)
Stack'Em High! Ensembles Using tidymodels

Abstract

Slides

Andy Nicholls (Glaxosmithkline)
The R Validation Hub: Implementing a Risk-Based Approach to R
Abstract

Slides

Laure Cougnaud (OpenAnalytics), Michela Pasetto (OpenAnalytics), Arne de Roeck (Galapagos)
Interactive medical oversight reporting in R
Abstract

Medical oversight during a clinical trial is an extensive and time-consuming process. To safeguard patient safety, medical monitors need to review and explore raw safety data interactively, using standard visualizations as well as specific analyses tailored to the disease and the clinical study.

The creation of semi-automated reports in R could facilitate this operation. The reports include interactive visualizations (with the plotly package) and interactive descriptive statistics tables and listings (with the DT package) for safety review of the patients.

Template reports (based on Rmarkdown) incorporating standard analyses are integrated within an R package. The reports are set up via YAML configuration files to allow non-R users to customize the report for his/her specific study. Such report is created from datasets in CDISC standard SDTM or ADaM format, and delivered in the form of linked self-contained html pages.

The creation of the report documentation (in the R package) and the validation of the input parameters in the config files is automated and provided with the JSON schema format.

The medical oversight tool is integrated with functionalities to generate patient profiles, CSR-ready in-text tables, and enables comparison of results between multiple interim data batches delivered in the course of the clinical trial.

The tool will be demonstrated on a publicly available dataset.

Slides

Daniel Sabanés Bové (Hoffmann-La Roche Ltd)
Implementing Mixed Models with Repeated Measures (MMRM) in R and Shiny for Regulatory Purposes
Abstract

MMRMs are often used as the primary analysis of continuous endpoints in longitudinal clinical trials (see e.g. Mallinckrod et al, 2008). Essentially, an MMRM is a specific linear mixed effects model that includes (at least) an interaction of treatment arm and categorical visit variables as fixed effects. The covariance structure of the residuals can have different forms, and often an unstructured (i.e. saturated parametrization) covariance matrix is preferred. This structure can be represented by random effects in the mixed model. All of this has typically been implemented in proprietary software, such as SAS, as its PROC MIXED routine is generally seen as a gold standard for mixed models. However, this does not allow the use of interactive web applications to explore the clinical study data in a flexible way. Furthermore, fitting such proprietary software into workflows such as automatic document generation is not convenient. Therefore, we wanted to implement MMRM in R. Several challenges had to be solved, such as finding the right R-packages for this purpose. We finally settled on {lme4} in combination with {lmerTest}, which could match results in SAS up to numerical precision. Convergence of estimates can be an issue and multiple optimization algorithms are therefore tried in parallel to enhance robustness. Extracting the covariance matrix estimate from {lme4} results was solved as well as finding model fit statistics that match SAS results. We use our own {rtables} to produce tables and {ggplot2} for plots. We developed a Shiny module in our internal framework for exploratory web applications. Further validation in the next months will allow us to use the R implementation for regulatory purposes, with greater flexibility and efficiency than before.

Slides

Ellis Hughes (Fred Hutch Cancer Research Center)
R Package Validation Framework
Abstract

In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention). Housed within Fred Hutch, SCHARP is an instrumental partner in the research and clinical trials surrounding HIV prevention and vaccine development. Part of SCHARP's work involves analyzing experimental biomarkers and endpoints which change as the experimental question, analysis methods, antigens measured, and assays evolve. Maintaining a validated code base that is rigid in its output format, but flexible enough to cater a variety of inputs with minimal custom coding has proven to be important for reproducibility and scalability. SCHARP has developed several key steps in the creation, validation, and documentation of R packages that take advantage of R's packaging functionality. First, the programming team works with leadership to define specifications and lay out a roadmap of the package at the functional level. Next, statistical programmers work together to develop the package, taking advantage of the rich R ecosystem of packages for development such as roxygen2, devtools, usethis, and testthat. Once the code has been developed, the package is validated to ensure it passes all specifications using a combination of testthat and rmarkdown. Finally, the package is made available for use across the team on live data. These procedures set up a framework for validating assay processing packages that furthers the ability of Fred Hutch to provide world-class support for our clinical trials.

Slides

Bo Wang (Novartis)
{metashiny}: build Shiny apps with a Shiny app
Abstract

metashiny is an R package that provides a point-and-click interface to quickly design, prototype, and deploy essential Shiny applications without having to write one single line of R code. The core idea behind metashiny is to parametrize Shiny modules, which are reusable units of Shiny logic with their own namespace. Instead of modifying a module to fit various analytical needs, metashiny strives to build a module template that encompasses a wide range of popular Shiny logic, then uses a "meta"-Shiny interface to collect user requirements and customize the Shiny modules using these inputs as parameters. The customized Shiny modules are embedded in the "meta"-Shiny for preview, may be downloaded in a self-contained, functioning Shiny directory and may be deployed to a Shiny Server with minimal configuration. metashiny may be very useful in the initial design phase of Shiny products. Finally, an important feature for non-R users is it eliminates the need of learning Shiny code and the R environment, thus enables analytical colleagues from all backgrounds to explore the fantastic power of Shiny.

Slides

Adrian Olszewski (2KMM)
Numerical validation as a critical aspect in bringing R to the Clinical Research
Abstract

Validation of the R statistical package has become a hot topic since 2015, when the FDA issued the Statistical Software Clarifying Statement, stating officially that no specific software is required for submissions, and that any tool can be used if only it is reliable and documented appropriately. It instantly brought the attention of the pharmaceutical industry. Individual attempts to fulfil validation requirement and bring R to the controlled environments were made by a number of companies independently. In addition, combined efforts of the biggest pharmaceutical companies resulted in launching the R Validation Hub project. While most of the initiatives seem to focus on documentation and package quality assessment, relying on the results of unit tests delivered "as is" by the authors of R packages, we at 2KMM CRO set different priorities, driven by the importance of exhaustive numerical validation done in the first place. Without that, there is a risk that all the efforts on documentation and quality assurance will pertain to routines which results differ from those obtained with other trusted software in a way that cannot be adequately justified. While we do not undermine the importance of documentation and early unit testing, we believe that numerical validation, going far beyond running those tests, is mandatory to achieve satisfying level of reliability. We would like to share our findings in this area, including: the choice of reference input data and results used during the validation, sources of discrepancies between R and other software, interpretation and acceptance of the results.

Slides

Jaejoon Song (US Food and Drug Administration)
geoMapr: A Shiny Application for Enriched Analysis of Drug Utilization Data
Abstract

The crisis of opioid abuse and overdose in the United States has involved unprecedented levels of opioid prescriptions and opioid-related mortality. Greater understanding of current trends in prescription opioid utilization may help prevent new cases of abuse, addiction, and overdose. The U.S. Food and Drug Administration (FDA, the Agency) is expanding its capacity for proactive pharmacovigilance of drug abuse, in addition to other drug safety signals. In post-market safety surveillance, pharmacy dispensing data provide valuable insights to the Agency for oversight of drug utilization. The drug dispensing data include the number of product dispensings aggregated over a time frame (e.g., months) by geographical locations (e.g., states, core-based statistical areas). One promising approach to enhance pharmacovigilance using these data would be through data enrichment: geographically referenced public data sources covering detailed information on demographic, socioeconomic, and healthcare service can be overlaid to proprietary, nationally projected data for prescription drug dispensing. Our project, funded by the Center for Drug Evaluation and Research (CDER) Safety Research Interest Group (SRIG) program, seeks to develop a data analysis pipeline and software for generating real-world evidence (RWE) that will monitor changes in prescription opioid use and guide proactive pharmacovigilance of drug abuse. The software will provide tools to augment proprietary, nationally projected data for prescription drug dispensing with other geographically referenced, publicly available, demographic, socioeconomic, or healthcare service data. The software will generate RWE including user-interactive data visualization, spatio-temporal modeling, and machine learning for identifying factors potentially associated with drug utilization, misuse, and abuse.

Slides

Bella Feng (KitePharma, A Gilead Company)
Becoming Multilingual
Abstract

As stated in my 2018 R/Pharma presentation "Becoming Bilingual in SAS and R" I believe in problem-solving using different data science tools. This talk is about my team's efforts at using different data science tools (SAS R and Python) to harmonize data from 10+ clinical studies to build a robust and automated data mart that will eventually integrate biomarker data from clinical studies and real world data(RWD). (1) SAS data dictionary and ODS are first used because of two reasons: Firstly ADaM datasets are in sas7bdat format. Secondly Data dictionary and ODS are powerful tools that R or Python have not well-established package. (2) R is used for its visualization power and Shiny and Rstudio's Reticulate tools for integration of Python into R projects. (3) Python is used for its fuzzywuzzy package and potentially NLTK package. In this project we are particularly pleased and impressed by Rstudio's work on seamlessly integrating Python tools into R projects. This project showcases the use case of combining the three programming languages in Clinical Data Integration space. It also provides a POC(proof of concept) for integrating Kite internal data with external data and RWD data. It is also future looking in the sense that it prepares us to deal with future wearable device data that innovative technology and precision medicine will bring into Oncology treatment scene.

Slides

Talks - Day 2

Julia Silge (Rstudio)
Data visualization for real-world machine learning

Abstract

Visual representations of data inform how machine learning practitioners think, understand, and decide. Before charts are ever used for outward communication about a ML system, they are used by the system designers and operators themselves as a tool to make better modeling choices. Practitioners use visualization, from very familiar statistical graphics to creative and less standard plots, at the points of most important human decisions when other ways to validate those decisions can be difficult. Visualization approaches are used to understand both the data that serves as input for machine learning and the models that practitioners create. In this talk, learn about the process of building a ML model in the real world, how and when practitioners use visualization to make more effective choices, and considerations for ML visualization tooling.

Slides

Christina Fillmore (GSK)
Using R to Create an End to End Process for Predicting Delays in Recruitment from Covid-19
Abstract

Supporting data-driven decisions in the planning of clinical trials during the current pandemic involves extensive integration of heterogenous data sources, sophisticated predictive modelling, and custom visualization to communicate the predictions to decision makers. We used R to rapidly deliver end-to-end planning tools for GSK in this difficult time. We built a pipeline to integrate, clean and, crucially - test, a variety of internal and external datasets. This data then fed into a patient recruitment model and, finally, into a SQL-powered shiny app for interactive visualizations. The creation of the planning tool required bringing together statisticians, data scientists and clinical operations in an intense collaboration, powered by R.

Slides

Baldur Magnusson (Novartis)
With great graphs comes great power
Abstract

Effective visual communication is a core competency for pharmacometricians, statisticians, and, more generally, any quantitative scientist. It is essential in every step of a quantitative workflow, from scoping to execution and communicating results and conclusions. With this competency, we can better understand data and influence decisions toward appropriate actions. Without it, we can fool ourselves and others and pave the way to wrong conclusions and actions. The goal of this talk is to convey this competency through three laws of effective visual communication for the quantitative scientist: have a clear purpose, show the data clearly, and make the message obvious.

Slides

Charlotta Fruchtenicht (F. Hoffmann-La Roche)
visR - a package for effective visualization in Pharma
Abstract

The visR project for effective graphics in drug development visR is an open collaborative effort to develop solutions for effective visual communication with a focus on reporting medical and clinical data. The aim of the collaboration is to develop a user-friendly, fit for purpose, open source package to simplify the use of good graphical principles for effective visual communication of typical analyses of interventional and observational data encountered in clinical drug development.

Slides

Jeremy Wildfire (Gilead)
safetyGraphics v2.0 - Open Source Collaboration in Pharma using R and Shiny
Abstract

Slides

Carson Sievert (RStudio)
Styling Shiny & R Markdown with bootstraplib & thematic
Abstract

Slides

Michael Rimler (Glaxosmithkline)
Clinical Reporting Using R at GSK
Abstract

The pharmaceutical industry has witnessed a growing interest in open source languages such as R and Python as an alternative to SAS for many activities related to clinical research. Hop on board for a whistle-stop tour of our efforts within GSK Biostatistics to integrate R programming into the clinical reporting pipeline. Hear how our journey started, where we are now, and what challenges and opportunities lie ahead.

Slides

Mike Mehan (Illumina Inc)
Leveraging Nested Data Frames for Analytical Studies (note title change, our legal dept requested this change)
Abstract

The development of laboratory developed tests (LDTs) and in vitro diagnostics (IVDs) requires the execution of studies to determine the analytical performance of the assay. Examples of analytical studies include limit of detection, intermediate precision, and stability studies. These studies often require similar analyses to be repeated multiple times on replicates or different sample types. The results of these analyses need to be stored in data structures that are easily accessible to the lead analyst as well as additional team members responsible for validating the work. Nested data frames are a powerful and flexible data structures that are well suited for these requirements. This talk will show how storing all of the steps of an analysis pipeline in a nested data frame allows analysts to utilize the well-established functionality of the tidyverse family of packages for efficient analysis and summarization of the data. It will also discuss how nested data frames are well suited for reproducibility and traceability, which are vital to documenting analytical performance. Reproducibility is often achieved by writing R notebooks in an environment that maintains package version consistency (e.g. docker, RStudio Server). Using nested data frames as the underlying data structure within these frameworks provides a transparent and modular method for storing the results of an existing analysis and providing easily accessible data for downstream analysis.

Slides

J. Kyle Wathen (Gilead)
PREP - Packages fRom tEmPlates - An R Package to Streamline Development of Shiny Apps and R Packages
Abstract

In the recent years, R Shiny apps have gained considerable momentum and have been utilized to develop many useful dashboards and user interfaces (UI) that allow non-programmers access to innovative tools. Due to the ease of development of Shiny apps and lack of complex examples, R developers often create a new shiny app in a single app.r file that contains both the ui and server code/ As a project grows, and capabilities expand in the app, a common practice is to separate the code into two files, one for the server object and one for the ui object. While these approaches may suffice for simple applications, they can lead a developer or team of developers down a path to an application with many lines of code (e.g. 15,000+) in a single file that can be extremely difficult to debug, test, maintain or expand. This approach can also lead to a file with a mixture of UI/server related code in the same files as complex computational code.

In this talk, I will present the {PREP} (Packages fRom tEmPlates) package that was created to help teams streamline development of R Shiny apps and R packages using an approach that follows software development best practices. The PREP package adds new project types to R Studio to help streamline new project creation and development. There are three PREP project type options: 1) a Shiny app as a package, 2) a Shiny app or 3) R package that is setup with the unit testing framework included utilizing {testthat} and is intended to contain all the complex computational functionality. Both Shiny app options are organized using modules with a consistent default theme, ability to switch between color theme options and example code for commonly implemented tasks. By developing the complex computations in the R packages and the Shiny app as separate projects, teams can utilize each person's skill set better and simplify the testing thus making a more robust final product. By developing the Shiny app with modules, teams can avoid extremely long single files and allow for sharing customized controls within different pages, make it much easier of using source control technology like GitHub. In addition, the PREP package includes functions to add new tabs and modules to the Shiny app and create new functions with testing setup in the computational package to avoid multiple steps of creating files for new functions and testing. PREP is designed to be used by new package/Shiny developers and is highly customizable for expert users without adding a dependency to your final product.

[Slides](https://github.com/rinpharma/2020_presentations/tree/master/talks_folder/2020-Wathen-PREP.pptx)

Thomas Tensfeldt (Pfizer Research and Development)
openNCA Pharmacokinetic data repository and Non-compartmental Analysis System
Abstract

Non-compartmental pharmacokinetic analysis (NCA) is used in the characteristization of drugs absorption, distribution and elimination in the body. Software that implements NCA is available from commercial and non-commercial, open-source, sources.

openNCA is a Pfizer, Inc in-house developed desktop application with enterprise capabilities designed to provide a PK bioanalysis result repository as well as an NCA computation routines. The system is built with modern technologies including Javascript/Typescript, Angular, Electron, Elasticsearch, Modeshape, Splunk, docker and a substantial R code base that implements system functions, configuration, analysis, reporting and user defined functionality.

openNCA capabilities include:

-Repository/Library/Metadata stores -Data Loading/Merging/Validation -Integration with Clinical Trial operational data -Integration with Patient Information Management System -Data Access controls -Data Transformation -NCA Analysis -RMarkdown and LaTeX Reporting -Shiny Apps -Quality Control -Workflow, Data, Transformation and Analysis Lineage -Navigation and Search -Reporting Event management -Publishing/Data Sharing

Design considerations for openNCA include reproducibility, security/integrity, extensibility, discoverability and traceability. Extensibility is a cornerstone characteristic that is enabled through extensive utilization of the application of R scripts and Shiny apps to configure the system functions. The openNCA computation engine R package (https://github.com/tensfeldt/openNCA) for NCA analyses enables some unique capabilities and forms one module of the system and is open-sourced under the MIT license.

openNCA, both the R driven application and NCA computation R package, provides an example of an industrial application of R and is represents the in-kind contribution from Pfizer Inc to the intial prototype project of the Pharmaceutical Open Source Software Consortium (POSSC: https://www.possc.org/) to promote industrial support for open-source software development and innovation for the Clinical Pharmacology and Pharmacometrics discipline.

Slides

Maya Gans (Cytel), Marly Gotti (Biogen)
tidyCDISC an Open Source Platform in R to Analyze Clinical Trial Data
Abstract

The current paradigm for analyzing clinical trial data is cumbersome: it is an inefficient, slow, and expensive process. Several rounds of iterations between the main programmer and the validation programmer are usually needed to thoroughly explore the data. Furthermore, clinicians and statisticians often would like to explore the data themselves but lack a robust and flexible platform to carry out this data exploration. For instance, they may need to inspect an endpoint for patients with certain genetic markers, to analyze biomarker data, or to create table summaries. To meet these needs, we created tidyCDISC, an open source Shiny application that can be used to generate custom tables, statistics, and figures. The application has three modules: a drag and drop table generator, a graphical population explorer, and a patient history explorer. We've taken a modularized approach to our package to ensure the application can be easily expanded upon to include further analyses and figures. By sharing our application as an open source solution, we hope to help other scientists with similar problems as well as promote collaboration in the pharmaceutical industry.

Slides

Mike Stackhouse (Atorus Research, LLC)
Tplyr: An Intuitive Framework for Building Clinical Safety Summaries
Abstract

In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories: Counting for event based variables or categories, shifting to describe changes in state, and descriptive statistics to summarize continuous variables. For many of the tables that go into a clinical submission, at least when considering safety outputs, the tables are made up of a combination of these approaches. Consider a demographics table. When you look at the table, you can begin breaking the output down into smaller, redundant, components. These components can be viewed as 'layers', and the table as a whole is constructed by stacking those layers. Tplyr uses this concept to provide an intuitive framework to building clinical safety summaries.

Slides

Kyle Baron (yspec: an R Package to Create and Deploy Data Set Specification Objects in a Modeling and Simulation Workflow)
Metrum
Abstract

Slides

Talks - Day 3

Douglas Robinson (Novartis)
Would John, Paul, George or Ringo have been famous if it were not for The Beatles

Abstract

The Beatles rose to music fame in the 1960's and became a worldwide phenomenon. With millions of screaming fans and selling over 600 million records, they are often cited as one of the most influential rock bands in history. One reason for their fame was their ability to communicate in the middle of songs without using words and without missing a single beat. This led me to consider some of the best collaborations I have been a part of, which are those where the team is in complete alignment and information flows easily from one team member to another. As analysts, it is our job to enhance the ability of teams to communicate with the best tool at our disposal; graphics. Just like Paul and John, our graphics need to communicate without speaking to convey information and help teams make critical decisions about clinical trials.

Novartis leverages the potential of R - Shiny to develop interactive tools that engage users to explore their clinical trial data with ease. Although several programs have been impacted with this technology, the goal of reaching the entire drug development portfolio is still a work in progress. This talk will describe our experiences with R - Shiny with some examples. Finally, it should be stressed that creating effective Shiny Apps requires thought, as well as adherence to strong graphical principles. In this vein, we will provide and describe our Graphical Principles Cheat Sheet(TM) that covers many aspects and considerations one should follow when devising either static or dynamic graphics.

Slides

Viral B. Shah (Julia Computing), Vijay Ivaturi (Pumas-AI)
Julia in Pharma
Abstract

Julia is a modern programming language that provides the ease of use of R with the speed of C++. Julia has been in development for over 11 years. Research on Julia originated at MIT in 2009. Julia is powered by multiple dispatch - a generalization of both, object oriented programming and functional programming. Julia's multiple dispatch makes it easy to write programs at a high level of abstraction while simultaneously getting high performance. This has led to Julia being used by over 10,000 companies and 1,500 universities worldwide.

Pumas, developed in Julia, integrates mechanistic pharmacometric models with Scientific Machine Learning and neural networks. In a recent case study, we demonstrated 175x speedup for a QSP workload. Pumas is designed for every type of analysis scientists perform throughout the drug development lifecycle in one seamless environment. Leveraging Julia's parallel capabilities, Pumas leverages distributed computing and GPUs and runs on the cloud through the JuliaHub platform. These workflows leverage the Julia's database, statistics, and visualization functionality in a single package.

[Slides](https://github.com/rinpharma/2020_presentations/tree/master/talks_folder/2020-Shah-Julia_in_Pharma.pdf)

Kathleen Zeglinski (csl)
Seamless Visualisation of Complex Genomic Variations in GMOs and Edited Cell Lines Using gmoviz
Abstract

Genetically modified organisms (GMOs) and cell lines are widely used models to estimate the efficacy of drugs and understand mechanism of actions in biopharmaceutical research. As part of characterising these models, DNA sequencing technology and bioinformatics analyses are used systematically to study their genomes. Therefore, large volumes of data are generated and various algorithms are applied to analyse this data, which introduces a challenge on representing all findings in an informative and concise manner. Scientific visualisation can be used to facilitate the explanation of complex genomic editing events such as integration events, deletions, insertions, etc. However, current visualization tools tend to focus on numerical data, ignoring the need to visualise editing events on a larger yet biologically-relevant scale. Thus, we have developed gmoviz, an R package designed to extend traditional bioinformatics workflows used for genomic characterization with powerful visualization capabilities based on the Circos plotting framework. The circular layout used in gmoviz's plots enables users to succinctly display genome-wide information about complex genomic editing events along with contextual biological information to improve the interpretation of findings. The gmoviz package has been developed by utilizing the many features of the Bioconductor ecosystem in order to support several genomic file formats and to seamlessly generate publication-quality figures. Finally, a complex transgenic mouse model, which harbours human gene knock-in, gene knock-outs, segmental insertion, deletion and concatemerisation events, has been used to illustrate the functionality of gmoviz.

[Slides](https://github.com/rinpharma/2020_presentations/tree/master/talks_folder/2020-Zeglinski-gmoviz.pptx)

Radhika Etikala (Statistical Center for HIV/AIDS Research & Prevention (SCHARP) at Fred Hutchinson Cancer Research Center), Xuehan Zhang (Emily) (Statistical Center for HIV/AIDS Research & Prevention (SCHARP) at Fred Hutchinson Cancer Research Center)
Using R Markdown to Generate Clinical Trials Summary Reports
Abstract

The scope of the paper is to show how to produce a statistical summary report along with explanatory text using R Markdown in RStudio. Programmers write a lot of reports that describe the results of data analyses. There should be a clear and automatic path from data and code to the final report. R Markdown is ideal for this as it is a system for combining code and text into a single document. It is also an efficient, user-friendly tool for producing reports that do not need constant updating. RStudio is often used in the Pharmaceutical and Healthcare industries for analysis and data visualization, and the R Markdown tool can also be leveraged for creating reports and datasets for submission to regulatory agencies.

This paper presents an RStudio program that demonstrates how to use R Markdown to generate a statistical table showing adverse events (AE) by system organ class (or preferred term) and severity grade along with text that explains the table. Collecting AE data and performing analysis of AEs is a common and critical part of Clinical Trials. A well-developed reporting system such as one generated with R Markdown, provides a solid foundation and an efficient approach towards a better understanding of what the data represent.

Slides

Will Landau (Eli Lilly and Company)
Reproducible computation at scale in R with targets
Abstract

Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets R package keeps results up to date and reproducible while minimizing the number of expensive tasks that actually run. The targets package learns how your pipeline fits together, skips costly runtime for steps that are already up to date, runs the rest with optional implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the output matches the underlying code and data. In other words, the package saves time while increasing our ability to trust the conclusions of the research. In addition, it surpasses the most burdensome permanent limitations of its predecessor, drake, to achieve greater efficiency and provide a safer, smoother, friendlier user experience. This talk debuts targets with an example COVID-19 clinical trial simulation study.

Slides

Dariusz Ratman (Roche)
Building and Managing Unified R Environments for Data Science and Software Development
Abstract

R and Biocondutor are important tools supporting scientific workflows across early Research and Development at Roche/Genentech. We have a broad R users community, which includes Data Scientists, Software Developers and consumers of Data Products developed with R. The presentation will explain the guiding principles behind the creation and management of computational environments for Research. The first part will show how we provide shared R environments, which enable result reproducibility and provide access to custom compute resources for interactive data analysis workflows. While the second part will demonstrate how we create corresponding environments for software development, including a brief overview of the tooling and infrastructure, which streamlines the development, testing and deployment of R packages and Shiny applications.

Slides

Richard Wyss (Brigham and Women's Hospital and Harvard Medical School)
Automated Data-Adaptive Analytics to Improve Robustness of Confounding Control when Estimating Treatment Effects in Electronic Healthcare Databases
Abstract

Routinely-collected healthcare databases generated from insurance claims and electronic health records have tremendous potential to provide information on the real-world effectiveness and safety of medical products. However, unmeasured confounding stemming from non-randomized treatments and poorly measured comorbidities remains the greatest obstacle to utilizing these data sources for real-world evidence generation. To reduce unmeasured confounding, data-driven algorithms can be used to leverage the large volume of information in healthcare databases to identify proxy variables for confounders that are either unknown to the investigator or not directly measured in these data sources (proxy confounder adjustment). Evidence has shown that data-driven algorithms for proxy confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. Consequently, there has been a recent explosion in the development of data-driven methods for high-dimensional proxy confounder adjustment. In this talk, I will discuss recent advancements in data-driven methods for high-dimensional proxy confounder adjustment and their implementation within the R computing environment. I will discuss challenges in assessing the validity of alternative analytic choices to tailor analyses to the given study to improve validity and robustness when estimating treatment effects in healthcare databases.

Slides

Yilong Zhang (Merck & Co.), Siruo Wang (Johns Hopkins Bloomberg School of Public Health, MD, USA), Simiao Ye (Merck & Co.), Madhusudhan Ginnaram (Merck & Co.), Keaven M. Anderson (Merck & Co.)
r2rtf - a Lightweight R Package to Produce Tables and Figures in RTF Format
Abstract

The use of open-source R is evolving in drug discovery, research and development for study design, data analysis, visualization, and report generation in the pharmaceutical industry. The ability to produce tables, listings and figures (TLFs) in customized rich text format (RTF) using R is crucial to enhance the workflow of using Microsoft Word to assemble analysis results. We developed an R package, r2rtf, that standardizes the approach to generate highly customized TLFs in RTF format. The r2rtf package provides flexibility to customize table appearance for table title, subtitle, column header, footnote, and data source. The table size, border type, color, and line width can be adjusted in each cell as well as column width, row height, text format, font size, text color, alignment, etc. The control of the format can be row or column vectorized by leveraging the vectorization in R. Furthermore, r2rtf provides pagination, section grouping, multiple tables concatenations for complicated table layouts. In this paper, we provide an overview r2rtf workflow with examples for both required and optional easy-to-use functions. Code examples are provided to create customized RTF tables and figures with highlighted features. The open-source r2rtf R package is available at: https://github.com/Merck/r2rtf.

Slides

Sophie Sun (Novartis)
Subgroup Benchmarking Framework
Abstract

Identification of subgroups with increased or decreased treatment effect is a challenging topic with several traps and pitfalls. In this project, we would like to establish good practices for subgroup identification, by building a simulation platform that allows for assessment and comparison of different quantitative subgroup identification strategies. Based on that we would like to provide guidance on different technical approaches. In addition, we would like to provide guidance on a recommended workflow for subgroup identification efforts to ensure best practices are used.

Slides

Hannah Diehl (MIT), Andy Stein (Novartis), Niladri Roy Chowdhury (Novartis), Tamara Broderick (MIT)
The "See"-Value App: Visual Decision Making for Drug Development
Abstract

Statistical graphics play an important role in exploratory data analysis, model checking and diagnostics. The lineup protocol (Buja et. al 2009) enables statistical significance testing using visualizations, bridging the gap between exploratory and inferential statistics. We created an R-shiny App that facilitates the user to generate these lineups by using preloaded examples or by uploading their own data. The user can then act as a human judge to select the plot which he/she think has the real data and see if a correct choice is made. If a correct choice is made, it would be enough evidence to believe that the real plot is significantly different from the "null" plots. The app also calculates the "see"-value based on the selections made by multiple independent users which can be used to decide statistical significance. The app supports different types of analysis using continuous, binary or time-to-event response and continuous or categorical predictors.

Slides

Leon Eyrich Jessen (Department of Health Technology, Section for Bioinformatics, Technical University of Denmark)
It's a trap! A data science case story
Abstract

Predictive modeling is a powerful tool, which amongst other things can be applied for prioritising drug candidates. Limiting the search space needed for target exploration, can reduce costs markedly partly eliminating lab time and expensive kits. Predictive modeling is however not without pitfalls... In this short talk, I will present a (fictive) data science case story, outlining one major challenge in predictive modeling, while demonstrating how to address said challenge.

Slides

Stefan Pinkert (Merck)
X-Omics Platform
Abstract

Introduction to the X-Omics Platform (XOP), a digital biomarker research platform for bioinformaticians and other scientist at Merck KGaA. XOP is a validated system for storing, processing, and analyzing "omics" data, including RNASeq, DNASeq (whole-exome and whole-genome), digital pathology datasets, and eventually proteomics and other datatypes.

Slides

Sean Lopp (Rstudio)
RStudio Pharma Updates
Abstract

RStudio will share a number of exciting product updates specifically built for pharmaceutical companies

Slides