Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many functions in D.B.Lazy lack INLINE(ABLE) annotations #335

Open
noughtmare opened this issue Dec 18, 2020 · 10 comments
Open

Many functions in D.B.Lazy lack INLINE(ABLE) annotations #335

noughtmare opened this issue Dec 18, 2020 · 10 comments

Comments

@noughtmare
Copy link
Contributor

noughtmare commented Dec 18, 2020

In particular the substring functions:

-- ---------------------------------------------------------------------
-- Substrings
-- | /O(n\/c)/ 'take' @n@, applied to a ByteString @xs@, returns the prefix
-- of @xs@ of length @n@, or @xs@ itself if @n > 'length' xs@.
take :: Int64 -> ByteString -> ByteString
take i _ | i <= 0 = Empty
take i cs0 = take' i cs0
where take' 0 _ = Empty
take' _ Empty = Empty
take' n (Chunk c cs) =
if n < fromIntegral (S.length c)
then Chunk (S.take (fromIntegral n) c) Empty
else Chunk c (take' (n - fromIntegral (S.length c)) cs)
-- | /O(n\/c)/ 'drop' @n xs@ returns the suffix of @xs@ after the first @n@
-- elements, or @[]@ if @n > 'length' xs@.
drop :: Int64 -> ByteString -> ByteString
drop i p | i <= 0 = p
drop i cs0 = drop' i cs0
where drop' 0 cs = cs
drop' _ Empty = Empty
drop' n (Chunk c cs) =
if n < fromIntegral (S.length c)
then Chunk (S.drop (fromIntegral n) c) cs
else drop' (n - fromIntegral (S.length c)) cs
-- | /O(n\/c)/ 'splitAt' @n xs@ is equivalent to @('take' n xs, 'drop' n xs)@.
splitAt :: Int64 -> ByteString -> (ByteString, ByteString)
splitAt i cs0 | i <= 0 = (Empty, cs0)
splitAt i cs0 = splitAt' i cs0
where splitAt' 0 cs = (Empty, cs)
splitAt' _ Empty = (Empty, Empty)
splitAt' n (Chunk c cs) =
if n < fromIntegral (S.length c)
then (Chunk (S.take (fromIntegral n) c) Empty
,Chunk (S.drop (fromIntegral n) c) cs)
else let (cs', cs'') = splitAt' (n - fromIntegral (S.length c)) cs
in (Chunk c cs', cs'')
-- | Similar to 'P.takeWhile',
-- returns the longest (possibly empty) prefix of elements
-- satisfying the predicate.
takeWhile :: (Word8 -> Bool) -> ByteString -> ByteString
takeWhile f = takeWhile'
where takeWhile' Empty = Empty
takeWhile' (Chunk c cs) =
case findIndexOrEnd (not . f) c of
0 -> Empty
n | n < S.length c -> Chunk (S.take n c) Empty
| otherwise -> Chunk c (takeWhile' cs)
-- | Similar to 'P.dropWhile',
-- drops the longest (possibly empty) prefix of elements
-- satisfying the predicate and returns the remainder.
dropWhile :: (Word8 -> Bool) -> ByteString -> ByteString
dropWhile f = dropWhile'
where dropWhile' Empty = Empty
dropWhile' (Chunk c cs) =
case findIndexOrEnd (not . f) c of
n | n < S.length c -> Chunk (S.drop n c) cs
| otherwise -> dropWhile' cs
-- | Similar to 'P.break',
-- returns the longest (possibly empty) prefix of elements which __do not__
-- satisfy the predicate and the remainder of the string.
--
-- 'break' @p@ is equivalent to @'span' (not . p)@ and to @('takeWhile' (not . p) &&& 'dropWhile' (not . p))@.
--
break :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString)
break f = break'
where break' Empty = (Empty, Empty)
break' (Chunk c cs) =
case findIndexOrEnd f c of
0 -> (Empty, Chunk c cs)
n | n < S.length c -> (Chunk (S.take n c) Empty
,Chunk (S.drop n c) cs)
| otherwise -> let (cs', cs'') = break' cs
in (Chunk c cs', cs'')

But there might also be other functions that can benefit.

@Bodigrim
Copy link
Contributor

Do such functions benefit from inlining? Could this improvement be measured?

@noughtmare
Copy link
Contributor Author

noughtmare commented Dec 18, 2020

@Bodigrim I have made a benchmark here: https://gist.github.com/noughtmare/f2478b9ea7a466d33b3f0185dc51f0dd. That indicates that inlining can play a pretty big role (3x performance improvement) with the dropWhile function. I assume the other functions are similar. So I think they should at least be marked INLINEABLE, but I do not know whether they should be marked INLINE. The strict equivalents are annotated with INLINE, so we could just mirror that for the lazy variants.

EDIT: Now that I look at the strict versions more closely, I think that they are shorter, so it makes more sense to mark them with INLINE. The lazy versions are a bit larger, maybe that is the reason that they have not been marked INILINE. But I think that due to the way that these functions are written in a split style with a non-recursive wrapper and a recursive local function also means that it makes sense to make them inline. Although, I am not an expert on this so I would love some feedback from someone who has more experience.

@Bodigrim
Copy link
Contributor

INLINEABLE itself allows specialization (but dropWhile is already monomorphic), but does not make GHC more inclined to inline. If we see a clear benefit from inlining, it should be marked as INLINE.

Care to take a look at Core as well? I'd like to understand what exactly contributes to x3 speed up.

@noughtmare
Copy link
Contributor Author

noughtmare commented Dec 18, 2020

As I understand it INLINEABLE would still make it possible to inline it if it is imported in another module. Without INLINEABLE I don't think the function is available for inlining at all in other modules.

@noughtmare
Copy link
Contributor Author

noughtmare commented Dec 19, 2020

Care to take a look at Core as well? I'd like to understand what exactly contributes to x3 speed up.

I have looked a bit at it and it seems that the inline version compiles the dropWhile predicate to a fast unboxed inline case expression, while the version that is not inlined needs to box every character and pass it to a separate predicate function.

@Bodigrim
Copy link
Contributor

Ah, I see. This is probably because of

{-# RULES
"ByteString specialise takeWhile (x /=)" forall x.
takeWhile (x /=) = fst . breakByte x
"ByteString specialise takeWhile (/= x)" forall x.
takeWhile (/= x) = fst . breakByte x
"ByteString specialise takeWhile (x ==)" forall x.
takeWhile (x ==) = fst . spanByte x
"ByteString specialise takeWhile (== x)" forall x.
takeWhile (== x) = fst . spanByte x
#-}

Could you try the similar set of rules for lazy dropWhile? It could be a more conservative option than inlining, without a risk of blowing code size.

@noughtmare
Copy link
Contributor Author

So the inline version does not actually take advantage of those rules. Here is the relevant part of the core dump of the inline version:

$wgo4_sciy ww_sciw w_scit
  = case >=# ww_sciw dt2_d9O8 of {
      __DEFAULT ->
        case readWord8OffAddr# (plusAddr# dt_d9O6 ww_sciw) 0# w_scit of
        { (# ipv_a9Rj, ipv1_a9Rk #) ->
        case chr# (word2Int# ipv1_a9Rk) of {
          __DEFAULT -> jump $wgo4_sciy (+# ww_sciw 1#) ipv_a9Rj;
          ' '# -> jump $w$j_scir ipv_a9Rj ww_sciw
        }
        };
      1# -> jump exit_X1m w_scit

You see that it reads off a byte and then directly compares it to the space character, but it does not use one of those rewrite rules. Here is the uninlined version:

$wgo4_schD ww_schB w_schy
  = case >=# ww_schB dt2_d9O5 of {
      __DEFAULT ->
        case readWord8OffAddr# (plusAddr# dt_d9O3 ww_schB) 0# w_schy of
        { (# ipv_a9Rj, ipv1_a9Rk #) ->
        case f_a8ll (C# (chr# (word2Int# ipv1_a9Rk))) of {
          False -> jump $w$j_schw ipv_a9Rj ww_schB;
          True -> jump $wgo4_schD (+# ww_schB 1#) ipv_a9Rj
        }
        };
      1# -> jump exit_X1p w_schy

You can see that it reads off a byte and then passes it to the f_a8ll function which is the predicate argument of dropWhile.

@Bodigrim
Copy link
Contributor

OK, then I suggest we do both, bringing lazy version in line with the strict one: mark dropWhile as INLINE [1] and add rewrite rules.

@Bodigrim
Copy link
Contributor

Ideally both strict and lazy dropWhile (/= ' ') should boil down to a memchr call in Core.

@noughtmare
Copy link
Contributor Author

noughtmare commented Dec 19, 2020

Yes, and I also found out that the D.B.Char8.dropWhile doesn't have the same rules as D.B.dropWhile and it is annotated with INLINE [1], so it also won't inline before applying the rules. In my benchmark I use the Char8 versions, so not even the strict version boils down to memchr.

But anyway the added w2c also means that the rule won't fire, so rules need to be added to the D.B.Char8 (and maybe also D.B.Lazy.Char8) separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants