Skip to content

Overview of Domain

JadenLing edited this page Mar 5, 2024 · 23 revisions

Introduction

The core of the project is to make the HPC democratisable for Imaging researchers at WEHI. The project aims to propose a user friendly and easily accessible workflow for analysing 3D Lightsheet Image data using Milton HPC. The Bioimaging Team has attempted to run their scripts on Jupyter Notebook, Nextflow Tower and Python Flask - we have been looking at R/Shiny as Python Flask is preferred but cannot run in Milton.

Short-Term and Long-Term Workflow

The nature of the accessble workflow should be determined by the specific needs of the researchers. Workflow can be categorised into short-term and long-term.

Short-Term Workflow

This is characterised by its reactive nature. Here, the workflow is tailored to specific, immediate research needs, often requiring customisation and quick adaptations. The workflow is driven by the unique requirements of each research project, leading to varied and often non-repetitive tasks. See Research Scientist's Request for Custom Workflow

Long-Term Workflow

In contrast, this is proactive and more consistent. The workflow is typically established based on regular data coming from instruments, with only minor modifications needed over time. This leads to a more fixed, standardized process. See this page for more details on long term workflows.

R/Shiny App


The Shiny App is a library provided by RStudio, it is a platform that allows a web interface to be launched and accessed by a local host. However, we will be focused on launching a Shiny App on OnDemand. In this case, we have 2 ways to launch the web interface: RStudio or the Shiny App itself.

Check Out our Custom Platform here

Nextflow

Note: he following sections of Nextflow and Nextflow Tower is no longer relevant for current and future intakes. However, please read to get a better understanding of the project


Nextflow is an open-source workflow management system designed to facilitate the development, execution, and sharing of data-intensive and scalable scientific workflows. It allows researchers and data scientists to define complex computational pipelines as code, making it easy to automate and reproduce their data analysis tasks across various computing environments.

Key features of Nextflow include:

  • Declarative Workflow Language: Nextflow provides a domain-specific language (DSL) that allows users to define their workflows as code in a declarative manner. This DSL abstracts away the underlying computing infrastructure, making workflows portable and easily adaptable to different environments.
  • Scalability and Parallel Execution: Nextflow follows a data-driven model, where tasks are executed as soon as their inputs become available. This enables parallel execution of tasks, taking full advantage of available computing resources and making it suitable for processing large-scale datasets.
  • Support for Various Computing Environments: Nextflow allows workflows to be executed on diverse platforms, including local machines, high-performance computing (HPC) clusters, cloud providers (AWS, GCP, Azure), and containerized environments (Docker, Singularity). This flexibility enables seamless workflow deployment in different computational infrastructures.
  • Reproducibility and Versioning: Nextflow emphasizes reproducibility by allowing users to specify software dependencies and tool versions required for each task. Workflows can be version-controlled, ensuring that results are consistent across different executions.
  • Containerization Support: Nextflow integrates with container technologies like Docker and Singularity, enabling users to encapsulate software dependencies and ensure consistency across different execution environments.
  • Community and Collaboration: Nextflow has a growing community of users and contributors who share their workflows and collaborate on the improvement of the platform. This collaborative aspect fosters knowledge exchange and supports the reuse of existing workflows.

Getting Started With Nextflow

Follow the installation to install Nextflow. Here is an example pipeline. Refer to the documentation as you need.

Pros:

  • Flexibility and Portability: Nextflow allows users to define workflows as code, providing greater flexibility and portability across different computing environments.
  • Scalability: Nextflow's data-driven model and parallelization capabilities enable efficient processing of large-scale data on various platforms.
  • Containerization Support: Nextflow integrates with container technologies like Docker and Singularity, ensuring reproducibility across different environments.
  • Versatility: Nextflow is not limited to specific scientific domains, making it suitable for various data analysis tasks beyond bioinformatics.

Cons:

  • Command-Line Interface: Nextflow relies on a command-line interface, which might be less intuitive for users without programming experience.
  • Initial Learning Curve: Users need to learn the Nextflow DSL and command-line commands to develop and run workflows effectively.
  • Development Overhead: Creating workflows as code requires more initial development effort compared to the visual workflow design in Galaxy.
  • Limited Built-in Toolset: Unlike Galaxy, Nextflow does not come with an extensive set of pre-built tools; users need to script their own tools or use third-party tools.

Nextflow Tower


Nextflow Tower is the centralized command post for the management of Nextflow data pipelines. It brings monitoring, logging, and observability to distributed workflows and simplifies the deployment of pipelines on any cloud, cluster, or laptop.

Users can launch pre-configured pipelines with ease, while the flexible API provides programmatic integration to meet the needs of organizations building on Nextflow Tower. Workflow developers can publish pipelines to shared workspaces and administrators can set up and manage the infrastructure required to run data analysis at scale.


Feel free to update and this wiki as you find other useful resources.


Box of Archives

The Box of Archives encapsulates the previous students work that has now been archived. The BioImaging team has focused on three options, Jupyter Notebook, Python Flask and Nextflow Tower. These are the previous technologies touched on by previous interns as the end of Summer 23/24 intake.

Wiki Pages

Project Overview

  1. Home
  2. User Stories

2024 Semester 1 Implementation

  1. 2024 Semester 1 Onboarding Checklist
  2. Current Overview of Domain

Note: The current code implementation could be found here or if you follow the instructions in the Onboarding Checklist

The following Sections are outdated but are here if you are curious what previous intakes did

Previous Intakes

2023-24 Summer Intake

  1. 2023-24 Summer Intake Onboarding Checklist

2023 Semester 2 Implementation

  1. Galaxy App Wiki
Clone this wiki locally