-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am looking for a function that gives me an iterator over the read SequenceRecord's. #70
Comments
I'm not an author of the code here, but I dug around a bit, and the API does not ever create an iterator. There's a One could set up an iterator API like this (it's a sketch I just made up): use needletail::{errors::ParseError, parser::FastaReader};
struct FastaIter<'a, R: std::io::Read> {
inner: FastaReader<R>,
phantom: std::marker::PhantomData<&'a R>,
}
struct IntoFastaIter<'a, R: std::io::Read> {
fastaiter: FastaIter<'a, R>,
index: usize,
}
impl<'a, R> IntoIterator for FastaIter<'a, R>
where
R: std::io::Read,
{
type Item = Result<SequenceRecord<'a>, ParseError>;
type IntoIter = IntoFastaIter<'a, R>;
fn into_iter(self) -> Self::IntoIter {
IntoFastaIter {
fastaiter: self,
index: 0,
}
}
}
impl<'a, R> Iterator for IntoFastaIter<'a, R>
where
R: std::io::Read,
{
type Item = Result<SequenceRecord<'a>, ParseError>;
fn next(&mut self) -> Option<Self::Item> {
let inner = self.fastaiter.inner;
if inner.finished {
return None;
}
// Load some data in the buffer to start
if inner.position.line == 0 {
match fill_buf(&mut inner.buf_reader) {
Ok(n) => {
if n == 0 {
inner.finished = true;
return None;
}
}
Err(e) => {
return Some(Err(e.into()));
}
};
if inner.get_buf()[0] == b'>' {
inner.position.line = 1;
inner.position.byte = 0;
inner.buf_pos.start = 0;
inner.search_pos = 1;
} else {
return Some(Err(ParseError::new_invalid_start(
inner.get_buf()[0],
ErrorPosition {
line: inner.position.line,
id: None,
},
Format::Fasta,
)));
}
}
if !inner.buf_pos.is_new() {
inner.next_pos();
}
// Can we identify the start of the next record ?
let complete = inner.find();
if !complete {
// Did we get a record?
let got_record = match inner.next_complete() {
Ok(f) => f,
Err(e) => {
return Some(Err(e));
}
};
if !got_record {
return None;
}
}
if inner.buf_pos.seq_pos.is_empty() {
return Some(Err(ParseError::new_unexpected_end(
ErrorPosition {
line: inner.position.line,
id: None,
},
Format::Fasta,
)));
}
if inner.line_ending.is_none() {
inner.line_ending = inner.buf_pos.find_line_ending(inner.get_buf());
}
Some(Ok(SequenceRecord::new_fasta(
inner.get_buf(),
&inner.buf_pos,
&inner.position,
inner.line_ending,
)))
}
} But I need access to private members of the struct, so this would have to be implemented in needletail itself, which I haven't had time to do yet. |
Hey there, It's been a while since I've looked at that code but I don't think we can impl Iterator due to the parser behaviour: we are not allocating anything and borrowing from the internal buffer. I'd be happy to be proven wrong though if it's possible! |
Thank you for this package!
This is really helpful for me and also especial thanks for giving me the initial help to start with Rust.
I now try to use the par_bridge() functionality in Rust to process two fastq entries in a muti-processor way and not load all data into memory first.
ChatGPT sent me down this rabbit hole:
Of cause not using the needletail logics to parse through a fastq file. After creating a somewhat promising version of this kind of logics I get this error:
I have created the readers like that:
Is there something in your library that could give me the result I need for that. I assume it wants an iterator.
THANK YOU FOR YOUR HELP SO FAR!
The text was updated successfully, but these errors were encountered: