-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Each line of the file must be less than 65,536 characters #345
Comments
It’s a limitation in the indexing integer size I believe. Is there a reason you can’t use multi line fasta seqs for this specific challenge.
I generally use hmmer esl-sfetch, blast indexes, or cdbfasta for my indexing and retrieval at this point but I understand the desire For native Perl only solutions.
Jason Stajich, PhD
[email protected]
…On Jul 1, 2020, 4:55 AM -0700, Jacques Dainat ***@***.***>, wrote:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Each line of the file must be less than 65,536 characters.
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/Root/Root.pm:447
STACK: Bio::DB::IndexedBase::_check_linelength /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm:757
STACK: Bio::DB::Fasta::_calculate_offsets /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/Fasta.pm:227
STACK: Bio::DB::IndexedBase::_index_files /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm:659
STACK: Bio::DB::IndexedBase::index_file /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm:487
STACK: Bio::DB::IndexedBase::new /home/lgs6452/.conda/envs/exonerate_env/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm:364
Someone knows why there is this limitation? Could we add a parch to fix the sequence to make it shorter in such case?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I made a toolkit (AGAT) and I have several scripts using |
@Juke34 the problem is that the index is built based on the location of the record delimiters in the file. (to add to @hyphaltip 's alternatives, I've been using |
@Juke34 BTW I should have mentioned back in August that a patch is more than welcome. |
I don’t see any obvious way to patch that. |
Kenji Yip Tong ***@***.***> writes:
@Juke34 Hi, just to double check as I am trying to incorporate AGAT in
a snakemake workflow. I get this error when using
"agat_sp_extract_sequences.pl". The solution to this type of error is
to reformat the fasta file from single to multi-line? Any suggestions
in how this can be done? (AGAT is a great suite btw).
You can use the shell's `fold` command, like so:
fold -w 80 file-with-very-long-lines.fasta > file-with-80-char-lines.fasta
This should work fine as long as the "description" line is below 80
characters (otherwise fold will also split that line). If that is an
issue for you, just use a long width, e.g. `fold -w 1000`.
|
Another day, another useful shell command I've never heard of
before--that's a good one!
(I learned about `comm` a few weeks ago and it's also awesome!)
Thanks,
Scott
…On Fri, Aug 16, 2024 at 1:14 PM David Miguel Susano Pinto < ***@***.***> wrote:
Kenji Yip Tong ***@***.***> writes:
> @Juke34 Hi, just to double check as I am trying to incorporate AGAT in
> a snakemake workflow. I get this error when using
> "agat_sp_extract_sequences.pl". The solution to this type of error is
> to reformat the fasta file from single to multi-line? Any suggestions
> in how this can be done? (AGAT is a great suite btw).
You can use the shell's `fold` command, like so:
fold -w 80 file-with-very-long-lines.fasta > file-with-80-char-lines.fasta
This should work fine as long as the "description" line is below 80
characters (otherwise fold will also split that line). If that is an
issue for you, just use a long width, e.g. `fold -w 1000`.
—
Reply to this email directly, view it on GitHub
<#345 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACDCPYOZM2UCSO3QK5P5F3ZRZMRXAVCNFSM6AAAAABMUTRYQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJUGE3DQNZVHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott
at scottcain dot net
GMOD Project Manager (http://gmod.org/)
216-392-3087
WormBase Developer (http://wormbase.org/)
Alliance of Genome Resources Group Leader (http://alliancegenome.org/)
Human Cancer Models Initiative Project Manager (
https://hcmi-searchable-catalog.nci.nih.gov/)
|
@carandraug Thanks. I ended up going for fastx_toolkit to do it (see here) applying it only if the fasta has a line with more that 65536 characters (checked for each file using 'wc -L'). But perhaps I should use fold instead as I wont need to have fastx_toolkit in my environment and it might be faster. Cheers. |
I use to use |
Someone knows why there is this limitation? Could we add a patch to fix the sequence to make it shorter in such case?
The text was updated successfully, but these errors were encountered: