Skip to content
This repository has been archived by the owner on Aug 20, 2024. It is now read-only.

Add array support #127

Closed
wants to merge 0 commits into from
Closed

Conversation

awgymer
Copy link
Collaborator

@awgymer awgymer commented Oct 25, 2023

This Draft PR demonstrates the feasibility of adding limited support for simple array fields to the fromSamplesheet method (they is already undocumented support for full jsonschema spec in the validateFile method).

The proposal for what to support would be:

  • only supported in samplesheet validation (I don't think CLI params have a syntax for specifying an array so not often used?)
  • simple item arrays only [spec]
  • only supported for input fields, not for meta
  • no support for this in TSV/CSV (how would that even logically work)
  • only formats already supported are supported within the array

The code so far has no tests in the plugin but has been demonstrated to work almost as it does for flat files with one major exception:

  • There is currently no support for exists checking with file/directory format fields within an array.

Array file/directory existence is now supported.

The following samplesheet.yaml:

- sample: mysample1_10
  fastq: 
    - input1_10_R1.fq.gz
    - input1_10_R2.fq.gz
  strandedness: forward
- sample: mysample1_11
  fastq: 
    - input1_11_R1.fq.gz
    - input1_11_R2.fq.gz
  strandedness: forward
- sample: mysample1_12
  fastq: 
    - input1_12_R1.fq.gz
    - input1_12_R2.fq.gz
  strandedness: forward
- sample: mysample1_13
  fastq: 
    - input1_13_R1.fq.gz
    - input1_13_R2.fq.gz
  strandedness: forward
- sample: mysample1_14
  fastq: 
    - input1_14_R1.fq.gz
    - input1_14_R2.fq.gz
  strandedness: forward

With the following schema.json:

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-validation/example/master/assets/schema_input.json",
    "title": "nf-validation example - params.input schema",
    "description": "Schema for the file provided with params.input",
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "sample": {
          "type": "string",
          "pattern": "^\\S+$",
          "errorMessage": "Sample name must be provided and cannot contain spaces"
        },
        "fastq": {
          "type": "array",
          "items": {
              "type": "string",
              "pattern": "^\\S+\\.f(ast)?q\\.gz$"
        },
          "errorMessage": "FastQ files cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
        },
        "strandedness": {
          "type": "string",
          "errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
          "enum": ["forward", "reverse", "unstranded"]
        }
      },
      "required": ["sample", "fastq", "strandedness"]
    }
  }

Gives the following in nextflow:

Channel.fromSamplesheet("input").view()
---
[mysample1_10, [input1_10_R1.fq.gz, input1_10_R2.fq.gz], forward]
[mysample1_11, [input1_11_R1.fq.gz, input1_11_R2.fq.gz], forward]
[mysample1_12, [input1_12_R1.fq.gz, input1_12_R2.fq.gz], forward]
[mysample1_13, [input1_13_R1.fq.gz, input1_13_R2.fq.gz], forward]
[mysample1_14, [input1_14_R1.fq.gz, input1_14_R2.fq.gz], forward]

Things to consider:

  • warnings if we detect CSV/TSV and "type": "array" (we don't currently do this and it technically is supported for validation)
  • warnings for "type": "array" with "prefixItems" (this is the tuple-type validation)
  • check that length and uniqueness constraints work as intended

N.B.: I tried to use polymorphism (is this the right term) to have different methods depending on the input type, but everything just went to the method declared for Object (the fallback) so if anyone can help me understand that I would appreciate it

@awgymer awgymer closed this Oct 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant