From 5f9855d790f5ddb69f8e5b49c987662446ee2e99 Mon Sep 17 00:00:00 2001 From: James Black Date: Fri, 23 Aug 2024 22:55:30 +0200 Subject: [PATCH 1/7] repo link is 2023 --- index.qmd | 3 +++ 1 file changed, 3 insertions(+) diff --git a/index.qmd b/index.qmd index 305ab9c..7205b5e 100644 --- a/index.qmd +++ b/index.qmd @@ -39,4 +39,7 @@ following topics: [PHUSE Open Source guidance](https://phuse-org.github.io/E2E-OS-Guidance/) +[2023 planning repo](https://rinpharma.github.io/positconf-roundtables-2023/) + + ::: From adb58d4463b4a0d2f5560906fd45bb19ea60bed1 Mon Sep 17 00:00:00 2001 From: James Black Date: Fri, 23 Aug 2024 22:55:45 +0200 Subject: [PATCH 2/7] add 2024 notes to change management --- change-management.qmd | 43 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 39 insertions(+), 4 deletions(-) diff --git a/change-management.qmd b/change-management.qmd index 3e2a9a0..f549691 100644 --- a/change-management.qmd +++ b/change-management.qmd @@ -1,12 +1,47 @@ # Change management +## 2024 notes + +Chairs: Cassie Murcray and Dror Berel + +### Summary: Transitioning from SAS to R in the Pharmaceutical Industry + +Over the past two years, there has been a significant shift in the pharmaceutical industry from SAS to R, driven by the rising costs of SAS licenses and the influx of new talent trained in R and other open-source tools. Despite these trends, the industry’s conservative nature, particularly in a highly regulated environment, often results in a reluctance to change well-established practices. + +As of 2024, this transition is well underway, with several companies already setting timelines for a complete migration to R. This includes replicating legacy SAS code in R for ongoing and long-term studies, as well as opting not to renew SAS licenses. While some SAS programmers find the transition to R more intuitive, others may face significant challenges. + +### Supporting SAS Programmers in Transitioning to R + +To facilitate the learning process for SAS programmers, various strategies can be employed: + +1. **Traditional Training and Mentorship**: Programs such as Posit Academy and mentoring from experienced R programmers are essential. Large organizations often establish Centers of Excellence, where a designated "R floating buddy" mentors SAS programmers throughout the various stages of learning R. This mentorship should be conducted with patience and empathy, recognizing the challenges of adapting to a new programming language with different principles and syntax. +2. **Strategic Learning Approaches**: + - **Address Key Pain Points**: Focus on demonstrating how specific challenges are effectively resolved with R, highlighting the value of the R-based solution, and celebrating small wins to foster continued learning. + - **Simplify the Learning Ecosystem**: Introduce a simple set of R packages, such as the tidyverse, before gradually introducing more advanced concepts. Avoid overwhelming learners with multiple equivalent approaches. + - **Gradual Progression**: Start with basic concepts and gradually introduce more advanced topics like version control, beginning with individual contributions and progressing to collaborative work. + +### Managing the Transition + +The successful shift from SAS to R requires active management by senior leadership, including clear directives, timelines, and ongoing support. It is crucial to allocate time for learning while maintaining productivity on ongoing projects. + +### Action Items + +To support this transition, the following resources should be developed: + +- **Cheat Sheets**: Create reference guides with common code examples translated from SAS to R. +- **Cross-Tool Comparison**: Develop a Comparative Analysis of Methods and Implementations in SAS, R, and Python (CAMIS) to help programmers understand the default parameters and methods used in each tool. + +This transition will not happen organically; it requires deliberate management and a structured approach to ensure successful adoption of R across the industry. + +## 2023 notes + Chairs: Matthew Kumar and Cassie Burns -# Question +### Question Where are we on getting data analysts and data scientists that work with clinical data on board (in particular, those delivering CSRs and submission packages)? What are the challenges - what has been overcome? -# Who are our people? +### Who are our people? Prefaced both sessions by asking individuals to define the *our* in *our people*; @@ -58,7 +93,7 @@ Prefaced both sessions by asking individuals to define the *our* in *our people* - Having leadership advocacy is vital at the end of the day. -# Theme: Emerging Talent +### Theme: Emerging Talent - Newer talent is increasingly trained in open-source approaches and languages, with fewer exposed to proprietary tools. @@ -77,7 +112,7 @@ Prefaced both sessions by asking individuals to define the *our* in *our people* - General trends suggest companies are demanding a secondary language in addition to proprietary software (not necessarily R), but knowledge of at least two languages indicates an individual could reasonably learn R. -# Theme: Other Points and Considerations +### Theme: Other Points and Considerations - Questions to consider include: *What kind of training will people need in the future state?* *How should the support be arranged to enable the future state, potentially with IT and DEV involvement?* From fcdd65c75db0294c120aeec139cdc7bc873add1e Mon Sep 17 00:00:00 2001 From: James Black Date: Fri, 23 Aug 2024 22:55:54 +0200 Subject: [PATCH 3/7] add ai notes --- llms.qmd | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 102 insertions(+), 3 deletions(-) diff --git a/llms.qmd b/llms.qmd index 3f61432..5420922 100644 --- a/llms.qmd +++ b/llms.qmd @@ -1,14 +1,113 @@ -# LLMs/AA/AI opportunities? +# AI opportunities + +## 2024 + +### LLMs/AA/AI opportunities? + +Chairs: Vincent Shen and Melanie Hullings + +### 1. Current AI Usage and Adoption + +- **Academia**: + - Cautious approach with AI project review committees and approvals + - Concern on safety safety, trust, and legal implications + - Challenges in education, balancing AI use without it becoming a crutch +- **Pharma**: + - Varying levels of adoption across companies + - Legal and trust issues often limiting factor + - Emphasis on human-in-the-loop approaches + - Some companies offer a wide range of AI tools and models +- **Smaller Pharma/Biotech**: + - More open to AI and developing custom tools + - Applications in genomics data querying, report generation, and biological discovery + +### 2. Specific Use Cases and Tools + +- Writing a first draft that is then reviewed by human experts +- DSUR report generation automation +- Querying public databases and grant-writing assistance +- Code conversion (e.g., R to Python) +- Tools mentioned: Copilot, rtutor, Chattr +- Fine-tuning of open-source models for specific tasks (e.g., R package chatbot) +- Prototypes of AI agents for data analysis +- Manufacturing use case: API access to all kinds of GenAI models / RAG-based application to search from historic logs on certain process +- LLM/GenAI for drug discovery (on genetic structures) + +### 3. Challenges and Limitations + +- Legal and regulatory concerns, including new EU law with assigned risk levels +- Resistance to using clinical data with LLMs +- Hosting issues for AI models and applications +- Need for better tools in data processing and manipulation +- Potential dangers of using AI without understanding the underlying processes + +### 4. Implementation and Cultural Shifts + +- Need for workforce training on responsible AI use +- Varying levels of AI adoption across companies require guides and training +- Importance of leadership support, IT infrastructure, and legal guidance +- Need for standardization and policies (e.g., documentation of AI-generated code) + +### 5. Opportunities and Benefits + +- Time-saving potential, especially for those with basic programming knowledge +- Knowledge management improvements +- Potential for automating routine tasks and reports +- Use of RAG (Retrieval-Augmented Generation) for various applications + +### 6. Future of Statistical Programming + +- Main benefit of AI is increased efficiency in programming tasks +- Leadership questioning the impact on workforce size and composition +- Current stage: Proof of concept tools, full impact still uncertain +- Evolution of programmer roles: + - Shift from coding from scratch to code review and oversight + - Expansion into new areas within clinical data analysis domain + - Transition from coding to solution architecture +- Need to redefine essential aspects of the AP (Analysis Programmer) role as tasks become automated +- Statistical Programmer job will evolve but not be eliminated +- Increased efficiency allows for focus on more complex analytical tasks + +### 7. AI Model Development and Evaluation + +- Transition from general GPT models to fine-tuned, domain-specific models +- Distinct approaches needed for coding vs. RAG/document tasks +- Importance of evaluating RAG effectiveness, potentially using LLMs for this purpose + +### Next Steps + +1. **Establish AI<>R Working Group** + - Goal would be to develop an open-source R package bot but would need to figure out how to fine-tune, host, collaborate, etc. + - Address model storage and deployment challenges +2. **Enhance Education and Standardization** + - Create guidelines for responsible AI use in statistical programming + - Develop industry-wide best practices and policies +3. **Advance Use Cases and Infrastructure** + - Validate AI tools for specific tasks (e.g., DSUR report generation) + - Develop secure frameworks for AI use with clinical data +4. **Redefine Roles and Processes** + - Analyze impact of AI on statistical programming roles + - Integrate AI into workflows while maintaining human oversight +5. **Improve Knowledge Management and Collaboration** + - Implement systems for sharing AI solutions across organizations + - Foster partnerships for developing industry-specific AI models +6. **Develop AI Evaluation Methods** + - Create standardized processes for QC of AI outputs + - Improve methods for assessing AI-generated code quality + +## 2023 + +### LLMs/AA/AI opportunities? Chairs: Paulo Bargo & Ning Leng -# Question +### Question What should we be doing to leverage advances in LLMs/AA/AI impact? (at the drug development through to developer efficiency levels) ::: {.callout-warning} -## Missing notes +### Missing notes Content is still coming, an email will be shared once the site is complete. From 097e9c8b2bd6a12f0172085c8bc4be05eb56fdcd Mon Sep 17 00:00:00 2001 From: James Black Date: Fri, 23 Aug 2024 22:57:06 +0200 Subject: [PATCH 4/7] add smallmid pharma --- smallmidpharma.qmd | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 smallmidpharma.qmd diff --git a/smallmidpharma.qmd b/smallmidpharma.qmd new file mode 100644 index 0000000..9e88b2d --- /dev/null +++ b/smallmidpharma.qmd @@ -0,0 +1,44 @@ +# Small/mid pharma & OS + +Chair: Katie Igartua and Kas Yousefi + +### Proposal + +A pressing topic I think we should discuss is how best to engage and enable small/mid-sized pharma to use open source Ecosystems. + +It can be sometimes overwhelming and would be helpful to have an idea where to start to join the open source journey and utilize the resources and approaches already available. + +Some questions: + +1. How to go about setting up an open source Environment? What are the hurdles? How can the crowd-sourced groups help? +2. Could small pharma use R for IB/DSUR? Does this need to be in a validated environment? +3. How do other small pharma companies uses open-source environments? and what would be areas for collaboration? + +### Expected impact + +I believe this topic would be a benefit because small pharma often needs to be innovative, but due to resource limits may rely on traditional approaches and heavily rely on CROs. Having stronger small pharma presence could help in new resources and approaches for open-sourced solutions + +### Prior discussions/work + +Some prior work includes...... + +- BBSW Panel Discussion for Open Source Tools in Small/mid sized pharma +- [How best to engage and enable small/mid-size pharma to use open source tools · rinpharma rinpharma-summit-2024 · Discussion #17](https://github.com/rinpharma/rinpharma-summit-2024/discussions/17) + +# Round table + +- Open sourcing adoption/contribution barrier + - risk taking when you only have one product + - cost: small companies often fully outsource regulatory work to CROs, hard to justify additional investment on infra or open source work for non regulatory tasks + - IT resource: posit installation for small pharma - small company may not have the right in house IT talent to even get posit running + - Are you ready to be a shiny dev ops person & R admin? +- What are use cases to use open source? mainly in non validated env + - data review, monitoring, visualization, patient profile, DSUR + - IDCC uses shiny for IDMC close sessions (instead of long pdf) +- Asks to the community + - can large pharma open source their infra config? such as AMS yaml file? + - from the perspective at the intersection of IT, DS, stat + - what about a “single” opinionated workflow (pharmaverse is seen more as a comprehensive tool box) + - from an independent body such as r consortium? + - or can individual big pharma publish the whole workflow +- GxP set up: timing tradeoff - is it good to set up early or late? setting up early can put on restrictions that is hard to change later. balance of flexibility, best practice and cost \ No newline at end of file From 8f8c567a0efa80c4ffab7c5fce92477fe9be4c1a Mon Sep 17 00:00:00 2001 From: James Black Date: Fri, 23 Aug 2024 22:57:29 +0200 Subject: [PATCH 5/7] add small pharma to memu --- _quarto.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_quarto.yml b/_quarto.yml index 51eadba..9e3f677 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -28,9 +28,11 @@ book: - os-depends.qmd - shiny-csr.qmd - case-os.qmd + - smallmidpharma.qmd - contributors.qmd - references.qmd + csl: jama.csl bibliography: references.bib From b846ad153084108b750fb682c14335eb43d3b5cc Mon Sep 17 00:00:00 2001 From: James Black Date: Fri, 23 Aug 2024 22:59:11 +0200 Subject: [PATCH 6/7] add notes on 2024 --- index.qmd | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/index.qmd b/index.qmd index 7205b5e..6866265 100644 --- a/index.qmd +++ b/index.qmd @@ -6,7 +6,23 @@ this document being created. ::: {.callout-note icon=false} -## {{< fa users >}} F2F Roundtables in Chicago +## {{< fa users >}} 2024 F2F Roundtables in Seattle + +~120 leaders from 40+ companies met F2F in Seattle for a series of +discussions on the most pressing topics for late-stage reporting in R. + +The discussion was crowdsourced via a github discussion, and led to the +following topics: + +- [How to move people happy with SAS over to an R backboned open source future?](/change-management.html) +- [How best to engage and enable small/mid-size pharma to use open source tools](/smallmidpharma.html) +- [Future core competency of clinical statistical programmer with AI/LLM](/llms.html) + +::: + +::: {.callout-note icon=false} + +## {{< fa users >}} 2023 F2F Roundtables in Chicago ~60 leaders from 40+ companies met F2F in Chicago for a series of discussions on the most pressing topics for late-stage reporting in R. @@ -29,8 +45,6 @@ following topics: ## {{< fa graduation-cap >}} References -[Promotional site for round-tables](https://rinpharma.github.io/positconf-roundtables-2023/) - [R Validation Hub update](https://pharmar.github.io/events-rpharma2023/) [Doug's slides on the shared validated repo](https://pharmar.github.io/events-positconf2023/#/title-slide) From 7a950434d6fb83d7385e8215cd29a7965b3aefb6 Mon Sep 17 00:00:00 2001 From: James Black Date: Fri, 23 Aug 2024 23:32:20 +0200 Subject: [PATCH 7/7] updates to shiny and data --- _quarto.yml | 1 + datatrials.qmd | 79 ++++++++++++++++++++++++++++++++++++++++++++++ index.qmd | 1 + validate-shiny.qmd | 27 +++++++++------- 4 files changed, 97 insertions(+), 11 deletions(-) create mode 100644 datatrials.qmd diff --git a/_quarto.yml b/_quarto.yml index 9e3f677..c222bd0 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -29,6 +29,7 @@ book: - shiny-csr.qmd - case-os.qmd - smallmidpharma.qmd + - datatrials.qmd - contributors.qmd - references.qmd diff --git a/datatrials.qmd b/datatrials.qmd new file mode 100644 index 0000000..293c03a --- /dev/null +++ b/datatrials.qmd @@ -0,0 +1,79 @@ +# Can we do data better? + +## 2024 + +Chairs: Stephanie Lussier and Doug Kelkhoff + +### `databases` + +A lot of thought currently about databases, but not a lot of companies using it in primary data flows (although it is used in curated trial data for secondary use, e.g. Novartis' Data42 and Roche's EDIS). + +### Blockers + +- Dependence on CROs who deliver SAS datasets generated by SAS code is a factor. +- Often fear from IT groups about the cloud, which is sometimes confusing when platforms like medidate are already cloud-based and other companies already have STDM/ADaM in AWS S3/cloud. +- Unclear justification for changes, particularly what are we getting from databases for current STDM/ADaM primary use; existing systems are mostly functional. +- Challenges with concurrent data access by multiple teams in some file based approaches, leading to errors. + +### an approach around tortoiseSVN + +- One company had been using tortoiseSVN for a while, and is considering moving to snowflake. +- Pros: Integration with version control and modern cloud storage solutions. +- Cons: + - Higher entry threshold for users. + - Gap in a user friendly GUI + - Storing data in 'normal' version control rather than tools designed for data versioning rapidly leads to bloated repositories. + +### Version Control and Data Storage + +- Alignment code versioning in Git; data versioning in tools like S3 versioning +- S3 can be accessed as a mounted drive (e.g. Lustre) and the S3 API. + +### Denodo as Data Fabric Mesh + +One company uses Denodo as a data fabric mesh; users interact via Denodo, +which serves as an API layer. No direct interaction with the source data by users. + +### Nontabular Data + +- Not common for statistical programmers working on clinical trial data. + +### CDISC Dataset JSON vs. Manifest JSON + +Writing CDISC JSON is super slow and potentially not sufficient for regular working data. + +### Popularity and Concerns with Parquet Datasets + +- Admiral tool generates Parquet directly; others convert from SAS to Parquet. +- Questions about the longevity and maintenance requirements of Parquet as it's a blob (vs a 'human readable' format like CSV/JSON) + +### Handling Legacy Data + +- Suggest stacking legacy data into a database if for secondary data use + +### Change Management + +- For statistical programming, direct instruction to new systems is necessary. +- Emphasize direct support over broad training. +- Simplify systems for users to reduce friction. +- Consider a GUI similar to Azure. +- Focus on reducing the user burden. + +### Different Data Use Cases + +Differences in data use (e.g., Shiny App vs. regulatory documents). +Dashboards directly accessing EDC without needing snapshots. + +### **Summary** + +Uncertain value in moving from CDISC data standards to databases. +Limited interest and action in this area across the organization. +Not a high priority given other ongoing organizational changes. +Ongoing shift away from SAS-based datasets and file storage to cloud-based systems, with increasing use of Parquet. + +### **Action Items** + +- SCE whitepaper - mark bynum from J&J +- Is there actual value / gain in databases? +- Not the best investment relative to other non-data changes going on across organization (e.g. R, containers, etc) + diff --git a/index.qmd b/index.qmd index 6866265..014e3eb 100644 --- a/index.qmd +++ b/index.qmd @@ -17,6 +17,7 @@ following topics: - [How to move people happy with SAS over to an R backboned open source future?](/change-management.html) - [How best to engage and enable small/mid-size pharma to use open source tools](/smallmidpharma.html) - [Future core competency of clinical statistical programmer with AI/LLM](/llms.html) +- [Can we do data storage and processing better in clinical trials?](/datatrials.html) ::: diff --git a/validate-shiny.qmd b/validate-shiny.qmd index e62f2db..640de52 100644 --- a/validate-shiny.qmd +++ b/validate-shiny.qmd @@ -1,21 +1,27 @@ # Validate shiny? +## 2024 + +Chairs: Devin Pastoor and Ellis Hughes + +## 2023 + Chairs: James Black and Harvey Lieberman -# Question +### Question We have a path to R package validation - but what about shiny apps? In what context would validation become relevant to shiny app code, and how can we get ahead of this topic to pave a way forward for interactive CSRs? -# Topics discussed +### Topics discussed -## Do we need to validate? +#### Do we need to validate? - Tiered approach / decision tree - Lowest is made by study team for study team. 2nd level is risk is unsupervised use, or specific contexts - e.g. making an app for dosing or safety. 3rd would be shiny CSR. - Is the results going directly from the app into a submission? - Don’t validate a shiny app - validate the static functions in the R packages. CSV may not be relevant for UIs (vs static R packages) -## What are we Testing and Why? +#### What are we Testing and Why? There is a clear difference of opinion throughout the industry, often led by quality groups. Some companies validate shiny apps as if they were distinct pieces of software, using their internal software validation procedures. These processes are often outdated and unsuitable, requiring timestamped user-testing and screen captures. @@ -23,11 +29,11 @@ Other companies solely consider packages, not even validating shiny apps, but va This brings up the question – **do we really need to validate shiny apps? Can we just validate the logic?** -## Who Does the Testing? +#### Who Does the Testing? Again, there is some difference between companies in who does the testing. Generally, the developer writes the tests but tests are performed either by the business or by the quality group. -## Use of Automation +#### Use of Automation Question posed to people present around the table: Does your company’s validation system allow for automation? Answers from the table: 8 companies = yes, 2 companies = no. Another 4 companies = no (offered by consultant who works with Pharma companies). Clearly a range of capabilities across the industry. @@ -36,14 +42,13 @@ Tools such as {shinytest2} are daunting to use. Can they be made more user frie It’s very challenging to validate a reactive graph. Automated processes have the ability to detect changes in a single pixel – is this desirable or undesirable? -## Types of Testing +#### Types of Testing There is a clear difference across companies in opinions as to the amount of unit testing vs UAT and end-user testing. Unit tests are easy to write but are do not demonstrate how an app works. {shinytest2} can be used for end-user testing but, as mentioned above, may be daunting to use, may not be acceptable within a quality organization and may not fit in current work practices. Unit tests are generally written as code is written. They are fast to write and fast to execute. End-to-end tests, however, are written once code is complete and tend to be slow to execute. - -## Robust UIs? +#### Robust UIs? - Good to have unit tests - often manual testing. Automated can easily get messed up as the code evolves. - We should use the git flow - e.g. protect master and disable manual deployments @@ -54,7 +59,7 @@ Unit tests are generally written as code is written. They are fast to write and - How to handle risk of UI problems if our focus is on the static code - e.g. misnamed reactive values so wrong values being shown, even if static R packages giving correct results. - Risk based is really important - e.g. for something like dark mode breaking, we need to know what requirements are high risk (e.g. table is correct) vs low risk (e.g. dark mode button) -# Ideas to improve the process +#### Ideas to improve the process - Validation tests as text files (ATDD/BDD from software engineering). - Frame in [Gherkin format](https://www.guru99.com/gherkin-test-cucumber.html) plus package of fixtures @@ -68,7 +73,7 @@ Unit tests are generally written as code is written. They are fast to write and - Can we talk to QA departments / QA senior leadership to get them to write up their thoughts / requirements? Ask “How can we make your job easier?” - Should we include QA and more IT at next year’s summit? -# Actions +#### Actions - Can we share some common high level guidance on stratifying risk in shiny shared across companies? (Pfizer has written this already internally). - Discuss if we should have an extension of R package whitepaper to cover shiny?