-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: support for colorspace sequence? #22
Comments
Adding support for CSFASTQ wouldn't be hard at all. So far I haven't bothered since I personally don't work with colorspace data much, and no one has asked me to until now. XSQ would be more difficult for a couple of reasons.
I don't know why these guys think it's a good idea to put pointless restrictions on their software, but so it is. That said, XSQ is based on an actual open format (HDF5), so writing my own parser isn't totally out of the question. |
Great to hear positive response for this request :-) You are right that XSQ is based on HDF5. I am actually testing some Python scripts I write to manipulate XSQ files based on h5py package (http://www.h5py.org/) which is a python interface of HDF5's C libraries. After taking a brief look at HDF5 documentation (http://www.hdfgroup.org/HDF5/doc/index.html) I believe that HDF5 supports Python, Java, C, and Fortran natively. Regarding the restrictive license. I think it is for the XSQ tools that Lifetech releases, not the file format, nor the XS data files themselves. From what I understand, there is no issue with license to develop tools working with XSQ file: On page 533 of this Advanced User Guide:
And for your information, here are some XSQ tools on Github: |
Maybe I did not get the whole point you wanted to say. Yes you may need to write a parser to take care of data, metadata, and attributes inside an XSQ file by using HDF5 C libraries |
Ok, I see. If it's just a simple well-documented hdf5 schema, then it shouldn't be too hard. |
Awesome! If you need any sample XSQ file or CSFASTQ file please let me know. I will find out if I can send a minimal XSQ file with small size. I have many XSQ files over 8 GB, which are not good to be sent around. |
I see quip has very practical use in ngs area. For colorspace sequence, I can generate BAM files and compress with quip. However it will be more useful if quip support colorspace sequences in CSFASTQ or preferably in XSQ file format.
This link contains some info about XSQ http://www.lifetechnologies.com/dk/en/home/technical-resources/software-downloads/xsq-software.html
This is what I am thinking: currently quip works with basespace "character set" (
A, C, T, G, and N
). If quip can be generalized to work on any character set, then it can work with colorspace character set (0, 1, 2, 3, and . "dot"
), or in other applications. If that is not so easy to implement, it is still possible to translate colorspace sequence to "fake" base space sequence (i.e.tr '0123.' 'A,C,T,G,N'
) and do the rest as basespace seuqence. This will solve CSFASTQ file format case right away. For binary XSQ files, it is a bit more complicate but I think we will discuss about it afterward.The text was updated successfully, but these errors were encountered: