Many functions in D.B.Lazy lack INLINE(ABLE) annotations #335

noughtmare · 2020-12-18T19:57:54Z

In particular the substring functions:

Lines 676 to 755 in 684047c

    
           -- --------------------------------------------------------------------- 
        
           -- Substrings 
        
           -- | /O(n\/c)/ 'take' @n@, applied to a ByteString @xs@, returns the prefix 
        
           -- of @xs@ of length @n@, or @xs@ itself if @n > 'length' xs@. 
        
           take :: Int64 -> ByteString -> ByteString 
        
           take i _ | i <= 0 = Empty 
        
           take i cs0         = take' i cs0 
        
             where take' 0 _            = Empty 
        
                   take' _ Empty        = Empty 
        
                   take' n (Chunk c cs) = 
        
                     if n < fromIntegral (S.length c) 
        
                       then Chunk (S.take (fromIntegral n) c) Empty 
        
                       else Chunk c (take' (n - fromIntegral (S.length c)) cs) 
        
           -- | /O(n\/c)/ 'drop' @n xs@ returns the suffix of @xs@ after the first @n@ 
        
           -- elements, or @[]@ if @n > 'length' xs@. 
        
           drop  :: Int64 -> ByteString -> ByteString 
        
           drop i p | i <= 0 = p 
        
           drop i cs0 = drop' i cs0 
        
             where drop' 0 cs           = cs 
        
                   drop' _ Empty        = Empty 
        
                   drop' n (Chunk c cs) = 
        
                     if n < fromIntegral (S.length c) 
        
                       then Chunk (S.drop (fromIntegral n) c) cs 
        
                       else drop' (n - fromIntegral (S.length c)) cs 
        
           -- | /O(n\/c)/ 'splitAt' @n xs@ is equivalent to @('take' n xs, 'drop' n xs)@. 
        
           splitAt :: Int64 -> ByteString -> (ByteString, ByteString) 
        
           splitAt i cs0 | i <= 0 = (Empty, cs0) 
        
           splitAt i cs0 = splitAt' i cs0 
        
             where splitAt' 0 cs           = (Empty, cs) 
        
                   splitAt' _ Empty        = (Empty, Empty) 
        
                   splitAt' n (Chunk c cs) = 
        
                     if n < fromIntegral (S.length c) 
        
                       then (Chunk (S.take (fromIntegral n) c) Empty 
        
                            ,Chunk (S.drop (fromIntegral n) c) cs) 
        
                       else let (cs', cs'') = splitAt' (n - fromIntegral (S.length c)) cs 
        
                              in (Chunk c cs', cs'') 
        
           -- | Similar to 'P.takeWhile', 
        
           -- returns the longest (possibly empty) prefix of elements 
        
           -- satisfying the predicate. 
        
           takeWhile :: (Word8 -> Bool) -> ByteString -> ByteString 
        
           takeWhile f = takeWhile' 
        
             where takeWhile' Empty        = Empty 
        
                   takeWhile' (Chunk c cs) = 
        
                     case findIndexOrEnd (not . f) c of 
        
                       0                  -> Empty 
        
                       n | n < S.length c -> Chunk (S.take n c) Empty 
        
                         | otherwise      -> Chunk c (takeWhile' cs) 
        
           -- | Similar to 'P.dropWhile', 
        
           -- drops the longest (possibly empty) prefix of elements 
        
           -- satisfying the predicate and returns the remainder. 
        
           dropWhile :: (Word8 -> Bool) -> ByteString -> ByteString 
        
           dropWhile f = dropWhile' 
        
             where dropWhile' Empty        = Empty 
        
                   dropWhile' (Chunk c cs) = 
        
                     case findIndexOrEnd (not . f) c of 
        
                       n | n < S.length c -> Chunk (S.drop n c) cs 
        
                         | otherwise      -> dropWhile' cs 
        
           -- | Similar to 'P.break', 
        
           -- returns the longest (possibly empty) prefix of elements which __do not__ 
        
           -- satisfy the predicate and the remainder of the string. 
        
           -- 
        
           -- 'break' @p@ is equivalent to @'span' (not . p)@ and to @('takeWhile' (not . p) &&& 'dropWhile' (not . p))@. 
        
           -- 
        
           break :: (Word8 -> Bool) -> ByteString -> (ByteString, ByteString) 
        
           break f = break' 
        
             where break' Empty        = (Empty, Empty) 
        
                   break' (Chunk c cs) = 
        
                     case findIndexOrEnd f c of 
        
                       0                  -> (Empty, Chunk c cs) 
        
                       n | n < S.length c -> (Chunk (S.take n c) Empty 
        
                                             ,Chunk (S.drop n c) cs) 
        
                         | otherwise      -> let (cs', cs'') = break' cs 
        
                                              in (Chunk c cs', cs'')

But there might also be other functions that can benefit.

Bodigrim · 2020-12-18T21:53:16Z

Do such functions benefit from inlining? Could this improvement be measured?

noughtmare · 2020-12-18T22:16:26Z

@Bodigrim I have made a benchmark here: https://gist.github.com/noughtmare/f2478b9ea7a466d33b3f0185dc51f0dd. That indicates that inlining can play a pretty big role (3x performance improvement) with the dropWhile function. I assume the other functions are similar. So I think they should at least be marked INLINEABLE, but I do not know whether they should be marked INLINE. The strict equivalents are annotated with INLINE, so we could just mirror that for the lazy variants.

EDIT: Now that I look at the strict versions more closely, I think that they are shorter, so it makes more sense to mark them with INLINE. The lazy versions are a bit larger, maybe that is the reason that they have not been marked INILINE. But I think that due to the way that these functions are written in a split style with a non-recursive wrapper and a recursive local function also means that it makes sense to make them inline. Although, I am not an expert on this so I would love some feedback from someone who has more experience.

Bodigrim · 2020-12-18T22:28:25Z

INLINEABLE itself allows specialization (but dropWhile is already monomorphic), but does not make GHC more inclined to inline. If we see a clear benefit from inlining, it should be marked as INLINE.

Care to take a look at Core as well? I'd like to understand what exactly contributes to x3 speed up.

noughtmare · 2020-12-18T22:29:29Z

As I understand it INLINEABLE would still make it possible to inline it if it is imported in another module. Without INLINEABLE I don't think the function is available for inlining at all in other modules.

noughtmare · 2020-12-19T11:34:28Z

Care to take a look at Core as well? I'd like to understand what exactly contributes to x3 speed up.

I have looked a bit at it and it seems that the inline version compiles the dropWhile predicate to a fast unboxed inline case expression, while the version that is not inlined needs to box every character and pass it to a separate predicate function.

Bodigrim · 2020-12-19T13:09:50Z

Ah, I see. This is probably because of

bytestring/Data/ByteString.hs

Lines 959 to 968 in 8c631df

    
           {-# RULES 
        
           "ByteString specialise takeWhile (x /=)" forall x. 
        
               takeWhile (x /=) = fst . breakByte x 
        
           "ByteString specialise takeWhile (/= x)" forall x. 
        
               takeWhile (/= x) = fst . breakByte x 
        
           "ByteString specialise takeWhile (x ==)" forall x. 
        
               takeWhile (x ==) = fst . spanByte x 
        
           "ByteString specialise takeWhile (== x)" forall x. 
        
               takeWhile (== x) = fst . spanByte x 
        
             #-}

Could you try the similar set of rules for lazy dropWhile? It could be a more conservative option than inlining, without a risk of blowing code size.

noughtmare · 2020-12-19T14:22:14Z

So the inline version does not actually take advantage of those rules. Here is the relevant part of the core dump of the inline version:

$wgo4_sciy ww_sciw w_scit
  = case >=# ww_sciw dt2_d9O8 of {
      __DEFAULT ->
        case readWord8OffAddr# (plusAddr# dt_d9O6 ww_sciw) 0# w_scit of
        { (# ipv_a9Rj, ipv1_a9Rk #) ->
        case chr# (word2Int# ipv1_a9Rk) of {
          __DEFAULT -> jump $wgo4_sciy (+# ww_sciw 1#) ipv_a9Rj;
          ' '# -> jump $w$j_scir ipv_a9Rj ww_sciw
        }
        };
      1# -> jump exit_X1m w_scit

You see that it reads off a byte and then directly compares it to the space character, but it does not use one of those rewrite rules. Here is the uninlined version:

$wgo4_schD ww_schB w_schy
  = case >=# ww_schB dt2_d9O5 of {
      __DEFAULT ->
        case readWord8OffAddr# (plusAddr# dt_d9O3 ww_schB) 0# w_schy of
        { (# ipv_a9Rj, ipv1_a9Rk #) ->
        case f_a8ll (C# (chr# (word2Int# ipv1_a9Rk))) of {
          False -> jump $w$j_schw ipv_a9Rj ww_schB;
          True -> jump $wgo4_schD (+# ww_schB 1#) ipv_a9Rj
        }
        };
      1# -> jump exit_X1p w_schy

You can see that it reads off a byte and then passes it to the f_a8ll function which is the predicate argument of dropWhile.

Bodigrim · 2020-12-19T14:26:01Z

OK, then I suggest we do both, bringing lazy version in line with the strict one: mark dropWhile as INLINE [1] and add rewrite rules.

Bodigrim · 2020-12-19T16:03:46Z

Ideally both strict and lazy dropWhile (/= ' ') should boil down to a memchr call in Core.

noughtmare · 2020-12-19T16:06:33Z

Yes, and I also found out that the D.B.Char8.dropWhile doesn't have the same rules as D.B.dropWhile and it is annotated with INLINE [1], so it also won't inline before applying the rules. In my benchmark I use the Char8 versions, so not even the strict version boils down to memchr.

But anyway the added w2c also means that the rule won't fire, so rules need to be added to the D.B.Char8 (and maybe also D.B.Lazy.Char8) separately.

sjakobi added the performance label Dec 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many functions in D.B.Lazy lack INLINE(ABLE) annotations #335

Many functions in D.B.Lazy lack INLINE(ABLE) annotations #335

noughtmare commented Dec 18, 2020 •

edited

Loading

Bodigrim commented Dec 18, 2020

noughtmare commented Dec 18, 2020 •

edited

Loading

Bodigrim commented Dec 18, 2020

noughtmare commented Dec 18, 2020 •

edited

Loading

noughtmare commented Dec 19, 2020 •

edited

Loading

Bodigrim commented Dec 19, 2020

noughtmare commented Dec 19, 2020

Bodigrim commented Dec 19, 2020

Bodigrim commented Dec 19, 2020

noughtmare commented Dec 19, 2020 •

edited

Loading

Many functions in D.B.Lazy lack INLINE(ABLE) annotations #335

Many functions in D.B.Lazy lack INLINE(ABLE) annotations #335

Comments

noughtmare commented Dec 18, 2020 • edited Loading

Bodigrim commented Dec 18, 2020

noughtmare commented Dec 18, 2020 • edited Loading

Bodigrim commented Dec 18, 2020

noughtmare commented Dec 18, 2020 • edited Loading

noughtmare commented Dec 19, 2020 • edited Loading

Bodigrim commented Dec 19, 2020

noughtmare commented Dec 19, 2020

Bodigrim commented Dec 19, 2020

Bodigrim commented Dec 19, 2020

noughtmare commented Dec 19, 2020 • edited Loading

noughtmare commented Dec 18, 2020 •

edited

Loading

noughtmare commented Dec 18, 2020 •

edited

Loading

noughtmare commented Dec 18, 2020 •

edited

Loading

noughtmare commented Dec 19, 2020 •

edited

Loading

noughtmare commented Dec 19, 2020 •

edited

Loading