Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: support for colorspace sequence? #22

Open
biocyberman opened this issue Jan 22, 2014 · 5 comments
Open

feature request: support for colorspace sequence? #22

biocyberman opened this issue Jan 22, 2014 · 5 comments

Comments

@biocyberman
Copy link

I see quip has very practical use in ngs area. For colorspace sequence, I can generate BAM files and compress with quip. However it will be more useful if quip support colorspace sequences in CSFASTQ or preferably in XSQ file format.

This link contains some info about XSQ http://www.lifetechnologies.com/dk/en/home/technical-resources/software-downloads/xsq-software.html

This is what I am thinking: currently quip works with basespace "character set" (A, C, T, G, and N). If quip can be generalized to work on any character set, then it can work with colorspace character set (0, 1, 2, 3, and . "dot"), or in other applications. If that is not so easy to implement, it is still possible to translate colorspace sequence to "fake" base space sequence (i.e.tr '0123.' 'A,C,T,G,N') and do the rest as basespace seuqence. This will solve CSFASTQ file format case right away. For binary XSQ files, it is a bit more complicate but I think we will discuss about it afterward.

@dcjones
Copy link
Owner

dcjones commented Jan 22, 2014

Adding support for CSFASTQ wouldn't be hard at all. So far I haven't bothered since I personally don't work with colorspace data much, and no one has asked me to until now.

XSQ would be more difficult for a couple of reasons.

  1. XSQ is implemented in Java and quip is in C. It's not impossible to call java code from c, but it would be painful and ugly.

  2. More importantly, XSQ has a restrictive license. I'm not a lawyer, but I don't think I could legally use it as part of quip:

    3.2.5 You agree not to modify, sell, rent, transfer (except
    temporarily in the event of a computer malfunction), resell for
    profit, or distribute this license or the Software, or create
    derivative works based on the Software, or any part thereof or any
    interest therein.

I don't know why these guys think it's a good idea to put pointless restrictions on their software, but so it is.

That said, XSQ is based on an actual open format (HDF5), so writing my own parser isn't totally out of the question.

@biocyberman
Copy link
Author

Great to hear positive response for this request :-) You are right that XSQ is based on HDF5. I am actually testing some Python scripts I write to manipulate XSQ files based on h5py package (http://www.h5py.org/) which is a python interface of HDF5's C libraries. After taking a brief look at HDF5 documentation (http://www.hdfgroup.org/HDF5/doc/index.html) I believe that HDF5 supports Python, Java, C, and Fortran natively.

Regarding the restrictive license. I think it is for the XSQ tools that Lifetech releases, not the file format, nor the XS data files themselves. From what I understand, there is no issue with license to develop tools working with XSQ file: On page 533 of this Advanced User Guide:

What software accepts the XSQ file format?
Initially, only LifeScope ™ Software supports the new format. Life Technologies Corporation is working with third ‐ party developers to adapt their workflows to support the new chemistry and data format

And for your information, here are some XSQ tools on Github:
https://github.com/search?q=XSQ+solid&ref=cmdform

@biocyberman
Copy link
Author

Maybe I did not get the whole point you wanted to say. Yes you may need to write a parser to take care of data, metadata, and attributes inside an XSQ file by using HDF5 C libraries

@dcjones
Copy link
Owner

dcjones commented Jan 22, 2014

Ok, I see. If it's just a simple well-documented hdf5 schema, then it shouldn't be too hard.

@biocyberman
Copy link
Author

Awesome! If you need any sample XSQ file or CSFASTQ file please let me know. I will find out if I can send a minimal XSQ file with small size. I have many XSQ files over 8 GB, which are not good to be sent around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants