This library enables fast and safe streaming of byte data, in either Word8
or
Char
form. It is a core addition to the streaming
ecosystem and avoids the usual pitfalls
of combinbing lazy ByteString
s with lazy IO
.
This library is used by
streaming-attoparsec
to enable vanilla Attoparsec
parsers to work with streaming
"for free".
Modules from this library are intended to be imported qualified. To avoid
conflicts with both the bytestring
library and streaming
, we recommended Q
as the qualified name:
import qualified Streaming.ByteString.Char8 as Q
Like the bytestring
library, leaving off the Char8
will expose an API based
on Word8
. Following the philosophy of streaming
that "the best API is the
one you already know", these APIs are based closely on bytestring
. The core
type is ByteStream m r
, where:
m
: The Monad used to fetch further chunks from the "source", usuallyIO
.r
: The final return value after all streaming has concluded, usually()
as instreaming
.
You can imagine this type to represent an infinitely-sized collection of bytes,
although internally it references a strict ByteString
no larger than 32kb,
followed by monadic instructions to fetch further chunks.
To open a file of any size and count its characters:
import Control.Monad.Trans.Resource (runResourceT)
import qualified Streaming.Streaming.Char8 as Q
-- | Represents a potentially-infinite stream of `Char`.
chars :: ByteStream IO ()
chars = Q.readFile "huge-file.txt"
main :: IO ()
main = runResourceT (Q.length_ chars) >>= print
Note that file IO specifically requires the
resourcet
library.
In the example above you may have noticed a lack of Of
that we usually see
with Stream
. Our old friend lines
hints at this too:
lines :: Monad m => ByteStream m r -> Stream (ByteStream m) m r
A stream-of-streams, yet no Of
here either. The return type can't naively be
Stream (Of ByteString) m r
, since the first line break might be at the very
end of a large file. Forcing that into a single strict ByteString
would crash
your program.
To count the number of lines whose first letter is i
:
countOfI :: IO Int
countOfI = runResourceT
. S.length_ -- IO Int
. S.filter (== 'i') -- Stream (Of Char) IO ()
. S.concat -- Stream (Of Char) IO ()
. S.mapped Q.head -- Stream (Of (Maybe Char)) IO ()
. Q.lines -- Stream (ByteStream IO) IO ()
$ Q.readFile "huge-file.txt" -- ByteStream IO ()
Critically, there are several functions which when combined with mapped
can
bring us back into Of
-land:
head :: Monad m => ByteStream m r -> m (Of (Maybe Char) r)
last :: Monad m => ByteStream m r -> m (Of (Maybe Char) r)
null :: Monad m => ByteStream m r -> m (Of Bool) r)
count :: Monad m => ByteStream m r -> m (Of Int) r)
toLazy :: Monad m => ByteStream m r -> m (Of ByteString r) -- Be careful with this.
toStrict :: Monad m => ByteStream m r -> m (Of ByteString r) -- Be even *more* careful with this.
When moving in the opposite direction API-wise, consider:
fromChunks :: Stream (Of ByteString) m r -> ByteStream m r