generated from quarto-ext/manuscript-template-vscode
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
304 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Can we do data better? | ||
|
||
## 2024 | ||
|
||
Chairs: Stephanie Lussier and Doug Kelkhoff | ||
|
||
### `databases` | ||
|
||
A lot of thought currently about databases, but not a lot of companies using it in primary data flows (although it is used in curated trial data for secondary use, e.g. Novartis' Data42 and Roche's EDIS). | ||
|
||
### Blockers | ||
|
||
- Dependence on CROs who deliver SAS datasets generated by SAS code is a factor. | ||
- Often fear from IT groups about the cloud, which is sometimes confusing when platforms like medidate are already cloud-based and other companies already have STDM/ADaM in AWS S3/cloud. | ||
- Unclear justification for changes, particularly what are we getting from databases for current STDM/ADaM primary use; existing systems are mostly functional. | ||
- Challenges with concurrent data access by multiple teams in some file based approaches, leading to errors. | ||
|
||
### an approach around tortoiseSVN | ||
|
||
- One company had been using tortoiseSVN for a while, and is considering moving to snowflake. | ||
- Pros: Integration with version control and modern cloud storage solutions. | ||
- Cons: | ||
- Higher entry threshold for users. | ||
- Gap in a user friendly GUI | ||
- Storing data in 'normal' version control rather than tools designed for data versioning rapidly leads to bloated repositories. | ||
|
||
### Version Control and Data Storage | ||
|
||
- Alignment code versioning in Git; data versioning in tools like S3 versioning | ||
- S3 can be accessed as a mounted drive (e.g. Lustre) and the S3 API. | ||
|
||
### Denodo as Data Fabric Mesh | ||
|
||
One company uses Denodo as a data fabric mesh; users interact via Denodo, | ||
which serves as an API layer. No direct interaction with the source data by users. | ||
|
||
### Nontabular Data | ||
|
||
- Not common for statistical programmers working on clinical trial data. | ||
|
||
### CDISC Dataset JSON vs. Manifest JSON | ||
|
||
Writing CDISC JSON is super slow and potentially not sufficient for regular working data. | ||
|
||
### Popularity and Concerns with Parquet Datasets | ||
|
||
- Admiral tool generates Parquet directly; others convert from SAS to Parquet. | ||
- Questions about the longevity and maintenance requirements of Parquet as it's a blob (vs a 'human readable' format like CSV/JSON) | ||
|
||
### Handling Legacy Data | ||
|
||
- Suggest stacking legacy data into a database if for secondary data use | ||
|
||
### Change Management | ||
|
||
- For statistical programming, direct instruction to new systems is necessary. | ||
- Emphasize direct support over broad training. | ||
- Simplify systems for users to reduce friction. | ||
- Consider a GUI similar to Azure. | ||
- Focus on reducing the user burden. | ||
|
||
### Different Data Use Cases | ||
|
||
Differences in data use (e.g., Shiny App vs. regulatory documents). | ||
Dashboards directly accessing EDC without needing snapshots. | ||
|
||
### **Summary** | ||
|
||
Uncertain value in moving from CDISC data standards to databases. | ||
Limited interest and action in this area across the organization. | ||
Not a high priority given other ongoing organizational changes. | ||
Ongoing shift away from SAS-based datasets and file storage to cloud-based systems, with increasing use of Parquet. | ||
|
||
### **Action Items** | ||
|
||
- SCE whitepaper - mark bynum from J&J | ||
- Is there actual value / gain in databases? | ||
- Not the best investment relative to other non-data changes going on across organization (e.g. R, containers, etc) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Small/mid pharma & OS | ||
|
||
Chair: Katie Igartua and Kas Yousefi | ||
|
||
### Proposal | ||
|
||
A pressing topic I think we should discuss is how best to engage and enable small/mid-sized pharma to use open source Ecosystems. | ||
|
||
It can be sometimes overwhelming and would be helpful to have an idea where to start to join the open source journey and utilize the resources and approaches already available. | ||
|
||
Some questions: | ||
|
||
1. How to go about setting up an open source Environment? What are the hurdles? How can the crowd-sourced groups help? | ||
2. Could small pharma use R for IB/DSUR? Does this need to be in a validated environment? | ||
3. How do other small pharma companies uses open-source environments? and what would be areas for collaboration? | ||
|
||
### Expected impact | ||
|
||
I believe this topic would be a benefit because small pharma often needs to be innovative, but due to resource limits may rely on traditional approaches and heavily rely on CROs. Having stronger small pharma presence could help in new resources and approaches for open-sourced solutions | ||
|
||
### Prior discussions/work | ||
|
||
Some prior work includes...... | ||
|
||
- BBSW Panel Discussion for Open Source Tools in Small/mid sized pharma | ||
- [How best to engage and enable small/mid-size pharma to use open source tools · rinpharma rinpharma-summit-2024 · Discussion #17](https://github.com/rinpharma/rinpharma-summit-2024/discussions/17) | ||
|
||
# Round table | ||
|
||
- Open sourcing adoption/contribution barrier | ||
- risk taking when you only have one product | ||
- cost: small companies often fully outsource regulatory work to CROs, hard to justify additional investment on infra or open source work for non regulatory tasks | ||
- IT resource: posit installation for small pharma - small company may not have the right in house IT talent to even get posit running | ||
- Are you ready to be a shiny dev ops person & R admin? | ||
- What are use cases to use open source? mainly in non validated env | ||
- data review, monitoring, visualization, patient profile, DSUR | ||
- IDCC uses shiny for IDMC close sessions (instead of long pdf) | ||
- Asks to the community | ||
- can large pharma open source their infra config? such as AMS yaml file? | ||
- from the perspective at the intersection of IT, DS, stat | ||
- what about a “single” opinionated workflow (pharmaverse is seen more as a comprehensive tool box) | ||
- from an independent body such as r consortium? | ||
- or can individual big pharma publish the whole workflow | ||
- GxP set up: timing tradeoff - is it good to set up early or late? setting up early can put on restrictions that is hard to change later. balance of flexibility, best practice and cost |
Oops, something went wrong.