Skip to content

Commit

Permalink
Merge branch 'summit2024'
Browse files Browse the repository at this point in the history
  • Loading branch information
epijim committed Aug 23, 2024
2 parents f79d1c3 + 7a95043 commit 576627a
Show file tree
Hide file tree
Showing 7 changed files with 304 additions and 21 deletions.
3 changes: 3 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,12 @@ book:
- os-depends.qmd
- shiny-csr.qmd
- case-os.qmd
- smallmidpharma.qmd
- datatrials.qmd
- contributors.qmd
- references.qmd


csl: jama.csl
bibliography: references.bib

Expand Down
43 changes: 39 additions & 4 deletions change-management.qmd
Original file line number Diff line number Diff line change
@@ -1,12 +1,47 @@
# Change management

## 2024 notes

Chairs: Cassie Murcray and Dror Berel

### Summary: Transitioning from SAS to R in the Pharmaceutical Industry

Over the past two years, there has been a significant shift in the pharmaceutical industry from SAS to R, driven by the rising costs of SAS licenses and the influx of new talent trained in R and other open-source tools. Despite these trends, the industry’s conservative nature, particularly in a highly regulated environment, often results in a reluctance to change well-established practices.

As of 2024, this transition is well underway, with several companies already setting timelines for a complete migration to R. This includes replicating legacy SAS code in R for ongoing and long-term studies, as well as opting not to renew SAS licenses. While some SAS programmers find the transition to R more intuitive, others may face significant challenges.

### Supporting SAS Programmers in Transitioning to R

To facilitate the learning process for SAS programmers, various strategies can be employed:

1. **Traditional Training and Mentorship**: Programs such as Posit Academy and mentoring from experienced R programmers are essential. Large organizations often establish Centers of Excellence, where a designated "R floating buddy" mentors SAS programmers throughout the various stages of learning R. This mentorship should be conducted with patience and empathy, recognizing the challenges of adapting to a new programming language with different principles and syntax.
2. **Strategic Learning Approaches**:
- **Address Key Pain Points**: Focus on demonstrating how specific challenges are effectively resolved with R, highlighting the value of the R-based solution, and celebrating small wins to foster continued learning.
- **Simplify the Learning Ecosystem**: Introduce a simple set of R packages, such as the tidyverse, before gradually introducing more advanced concepts. Avoid overwhelming learners with multiple equivalent approaches.
- **Gradual Progression**: Start with basic concepts and gradually introduce more advanced topics like version control, beginning with individual contributions and progressing to collaborative work.

### Managing the Transition

The successful shift from SAS to R requires active management by senior leadership, including clear directives, timelines, and ongoing support. It is crucial to allocate time for learning while maintaining productivity on ongoing projects.

### Action Items

To support this transition, the following resources should be developed:

- **Cheat Sheets**: Create reference guides with common code examples translated from SAS to R.
- **Cross-Tool Comparison**: Develop a Comparative Analysis of Methods and Implementations in SAS, R, and Python (CAMIS) to help programmers understand the default parameters and methods used in each tool.

This transition will not happen organically; it requires deliberate management and a structured approach to ensure successful adoption of R across the industry.

## 2023 notes

Chairs: Matthew Kumar and Cassie Burns

# Question
### Question

Where are we on getting data analysts and data scientists that work with clinical data on board (in particular, those delivering CSRs and submission packages)? What are the challenges - what has been overcome?

# Who are our people?
### Who are our people?

Prefaced both sessions by asking individuals to define the *our* in *our people*;

Expand Down Expand Up @@ -58,7 +93,7 @@ Prefaced both sessions by asking individuals to define the *our* in *our people*

- Having leadership advocacy is vital at the end of the day.

# Theme: Emerging Talent
### Theme: Emerging Talent

- Newer talent is increasingly trained in open-source approaches and languages, with fewer exposed to proprietary tools.

Expand All @@ -77,7 +112,7 @@ Prefaced both sessions by asking individuals to define the *our* in *our people*

- General trends suggest companies are demanding a secondary language in addition to proprietary software (not necessarily R), but knowledge of at least two languages indicates an individual could reasonably learn R.

# Theme: Other Points and Considerations
### Theme: Other Points and Considerations

- Questions to consider include: *What kind of training will people need in the future state?* *How should the support be arranged to enable the future state, potentially with IT and DEV involvement?*

Expand Down
79 changes: 79 additions & 0 deletions datatrials.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Can we do data better?

## 2024

Chairs: Stephanie Lussier and Doug Kelkhoff

### `databases`

A lot of thought currently about databases, but not a lot of companies using it in primary data flows (although it is used in curated trial data for secondary use, e.g. Novartis' Data42 and Roche's EDIS).

### Blockers

- Dependence on CROs who deliver SAS datasets generated by SAS code is a factor.
- Often fear from IT groups about the cloud, which is sometimes confusing when platforms like medidate are already cloud-based and other companies already have STDM/ADaM in AWS S3/cloud.
- Unclear justification for changes, particularly what are we getting from databases for current STDM/ADaM primary use; existing systems are mostly functional.
- Challenges with concurrent data access by multiple teams in some file based approaches, leading to errors.

### an approach around tortoiseSVN

- One company had been using tortoiseSVN for a while, and is considering moving to snowflake.
- Pros: Integration with version control and modern cloud storage solutions.
- Cons:
- Higher entry threshold for users.
- Gap in a user friendly GUI
- Storing data in 'normal' version control rather than tools designed for data versioning rapidly leads to bloated repositories.

### Version Control and Data Storage

- Alignment code versioning in Git; data versioning in tools like S3 versioning
- S3 can be accessed as a mounted drive (e.g. Lustre) and the S3 API.

### Denodo as Data Fabric Mesh

One company uses Denodo as a data fabric mesh; users interact via Denodo,
which serves as an API layer. No direct interaction with the source data by users.

### Nontabular Data

- Not common for statistical programmers working on clinical trial data.

### CDISC Dataset JSON vs. Manifest JSON

Writing CDISC JSON is super slow and potentially not sufficient for regular working data.

### Popularity and Concerns with Parquet Datasets

- Admiral tool generates Parquet directly; others convert from SAS to Parquet.
- Questions about the longevity and maintenance requirements of Parquet as it's a blob (vs a 'human readable' format like CSV/JSON)

### Handling Legacy Data

- Suggest stacking legacy data into a database if for secondary data use

### Change Management

- For statistical programming, direct instruction to new systems is necessary.
- Emphasize direct support over broad training.
- Simplify systems for users to reduce friction.
- Consider a GUI similar to Azure.
- Focus on reducing the user burden.

### Different Data Use Cases

Differences in data use (e.g., Shiny App vs. regulatory documents).
Dashboards directly accessing EDC without needing snapshots.

### **Summary**

Uncertain value in moving from CDISC data standards to databases.
Limited interest and action in this area across the organization.
Not a high priority given other ongoing organizational changes.
Ongoing shift away from SAS-based datasets and file storage to cloud-based systems, with increasing use of Parquet.

### **Action Items**

- SCE whitepaper - mark bynum from J&J
- Is there actual value / gain in databases?
- Not the best investment relative to other non-data changes going on across organization (e.g. R, containers, etc)

24 changes: 21 additions & 3 deletions index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,24 @@ this document being created.

::: {.callout-note icon=false}

## {{< fa users >}} F2F Roundtables in Chicago
## {{< fa users >}} 2024 F2F Roundtables in Seattle

~120 leaders from 40+ companies met F2F in Seattle for a series of
discussions on the most pressing topics for late-stage reporting in R.

The discussion was crowdsourced via a github discussion, and led to the
following topics:

- [How to move people happy with SAS over to an R backboned open source future?](/change-management.html)
- [How best to engage and enable small/mid-size pharma to use open source tools](/smallmidpharma.html)
- [Future core competency of clinical statistical programmer with AI/LLM](/llms.html)
- [Can we do data storage and processing better in clinical trials?](/datatrials.html)

:::

::: {.callout-note icon=false}

## {{< fa users >}} 2023 F2F Roundtables in Chicago

~60 leaders from 40+ companies met F2F in Chicago for a series of
discussions on the most pressing topics for late-stage reporting in R.
Expand All @@ -29,8 +46,6 @@ following topics:

## {{< fa graduation-cap >}} References

[Promotional site for round-tables](https://rinpharma.github.io/positconf-roundtables-2023/)

[R Validation Hub update](https://pharmar.github.io/events-rpharma2023/)

[Doug's slides on the shared validated repo](https://pharmar.github.io/events-positconf2023/#/title-slide)
Expand All @@ -39,4 +54,7 @@ following topics:

[PHUSE Open Source guidance](https://phuse-org.github.io/E2E-OS-Guidance/)

[2023 planning repo](https://rinpharma.github.io/positconf-roundtables-2023/)


:::
105 changes: 102 additions & 3 deletions llms.qmd
Original file line number Diff line number Diff line change
@@ -1,14 +1,113 @@
# LLMs/AA/AI opportunities?
# AI opportunities

## 2024

### LLMs/AA/AI opportunities?

Chairs: Vincent Shen and Melanie Hullings

### 1. Current AI Usage and Adoption

- **Academia**:
- Cautious approach with AI project review committees and approvals
- Concern on safety safety, trust, and legal implications
- Challenges in education, balancing AI use without it becoming a crutch
- **Pharma**:
- Varying levels of adoption across companies
- Legal and trust issues often limiting factor
- Emphasis on human-in-the-loop approaches
- Some companies offer a wide range of AI tools and models
- **Smaller Pharma/Biotech**:
- More open to AI and developing custom tools
- Applications in genomics data querying, report generation, and biological discovery

### 2. Specific Use Cases and Tools

- Writing a first draft that is then reviewed by human experts
- DSUR report generation automation
- Querying public databases and grant-writing assistance
- Code conversion (e.g., R to Python)
- Tools mentioned: Copilot, rtutor, Chattr
- Fine-tuning of open-source models for specific tasks (e.g., R package chatbot)
- Prototypes of AI agents for data analysis
- Manufacturing use case: API access to all kinds of GenAI models / RAG-based application to search from historic logs on certain process
- LLM/GenAI for drug discovery (on genetic structures)

### 3. Challenges and Limitations

- Legal and regulatory concerns, including new EU law with assigned risk levels
- Resistance to using clinical data with LLMs
- Hosting issues for AI models and applications
- Need for better tools in data processing and manipulation
- Potential dangers of using AI without understanding the underlying processes

### 4. Implementation and Cultural Shifts

- Need for workforce training on responsible AI use
- Varying levels of AI adoption across companies require guides and training
- Importance of leadership support, IT infrastructure, and legal guidance
- Need for standardization and policies (e.g., documentation of AI-generated code)

### 5. Opportunities and Benefits

- Time-saving potential, especially for those with basic programming knowledge
- Knowledge management improvements
- Potential for automating routine tasks and reports
- Use of RAG (Retrieval-Augmented Generation) for various applications

### 6. Future of Statistical Programming

- Main benefit of AI is increased efficiency in programming tasks
- Leadership questioning the impact on workforce size and composition
- Current stage: Proof of concept tools, full impact still uncertain
- Evolution of programmer roles:
- Shift from coding from scratch to code review and oversight
- Expansion into new areas within clinical data analysis domain
- Transition from coding to solution architecture
- Need to redefine essential aspects of the AP (Analysis Programmer) role as tasks become automated
- Statistical Programmer job will evolve but not be eliminated
- Increased efficiency allows for focus on more complex analytical tasks

### 7. AI Model Development and Evaluation

- Transition from general GPT models to fine-tuned, domain-specific models
- Distinct approaches needed for coding vs. RAG/document tasks
- Importance of evaluating RAG effectiveness, potentially using LLMs for this purpose

### Next Steps

1. **Establish AI<>R Working Group**
- Goal would be to develop an open-source R package bot but would need to figure out how to fine-tune, host, collaborate, etc.
- Address model storage and deployment challenges
2. **Enhance Education and Standardization**
- Create guidelines for responsible AI use in statistical programming
- Develop industry-wide best practices and policies
3. **Advance Use Cases and Infrastructure**
- Validate AI tools for specific tasks (e.g., DSUR report generation)
- Develop secure frameworks for AI use with clinical data
4. **Redefine Roles and Processes**
- Analyze impact of AI on statistical programming roles
- Integrate AI into workflows while maintaining human oversight
5. **Improve Knowledge Management and Collaboration**
- Implement systems for sharing AI solutions across organizations
- Foster partnerships for developing industry-specific AI models
6. **Develop AI Evaluation Methods**
- Create standardized processes for QC of AI outputs
- Improve methods for assessing AI-generated code quality

## 2023

### LLMs/AA/AI opportunities?

Chairs: Paulo Bargo & Ning Leng

# Question
### Question

What should we be doing to leverage advances in LLMs/AA/AI impact? (at the drug development through to developer efficiency levels)

::: {.callout-warning}

## Missing notes
### Missing notes

Content is still coming, an email will be shared once the site is complete.

Expand Down
44 changes: 44 additions & 0 deletions smallmidpharma.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Small/mid pharma & OS

Chair: Katie Igartua and Kas Yousefi

### Proposal

A pressing topic I think we should discuss is how best to engage and enable small/mid-sized pharma to use open source Ecosystems.

It can be sometimes overwhelming and would be helpful to have an idea where to start to join the open source journey and utilize the resources and approaches already available.

Some questions:

1. How to go about setting up an open source Environment? What are the hurdles? How can the crowd-sourced groups help?
2. Could small pharma use R for IB/DSUR? Does this need to be in a validated environment?
3. How do other small pharma companies uses open-source environments? and what would be areas for collaboration?

### Expected impact

I believe this topic would be a benefit because small pharma often needs to be innovative, but due to resource limits may rely on traditional approaches and heavily rely on CROs. Having stronger small pharma presence could help in new resources and approaches for open-sourced solutions

### Prior discussions/work

Some prior work includes......

- BBSW Panel Discussion for Open Source Tools in Small/mid sized pharma
- [How best to engage and enable small/mid-size pharma to use open source tools · rinpharma rinpharma-summit-2024 · Discussion #17](https://github.com/rinpharma/rinpharma-summit-2024/discussions/17)

# Round table

- Open sourcing adoption/contribution barrier
- risk taking when you only have one product
- cost: small companies often fully outsource regulatory work to CROs, hard to justify additional investment on infra or open source work for non regulatory tasks
- IT resource: posit installation for small pharma - small company may not have the right in house IT talent to even get posit running
- Are you ready to be a shiny dev ops person & R admin?
- What are use cases to use open source? mainly in non validated env
- data review, monitoring, visualization, patient profile, DSUR
- IDCC uses shiny for IDMC close sessions (instead of long pdf)
- Asks to the community
- can large pharma open source their infra config? such as AMS yaml file?
- from the perspective at the intersection of IT, DS, stat
- what about a “single” opinionated workflow (pharmaverse is seen more as a comprehensive tool box)
- from an independent body such as r consortium?
- or can individual big pharma publish the whole workflow
- GxP set up: timing tradeoff - is it good to set up early or late? setting up early can put on restrictions that is hard to change later. balance of flexibility, best practice and cost
Loading

0 comments on commit 576627a

Please sign in to comment.