LiquidHaskell integration #288

Bodigrim · 2020-09-26T12:17:42Z

Given recent improvements to LiquidHaskell, how difficult and how desirable would it be to integrate liquid-bytestring into bytestring package itself?

CC @ranjitjhala

The text was updated successfully, but these errors were encountered:

ranjitjhala · 2020-09-26T16:18:51Z

Very interesting idea! I think the best way is to try add the signatures over in a local branch and see how that works. Maybe I will try to do this at some point over the next few weeks..

vdukhovni · 2020-10-01T23:41:07Z

What new duties will that impose on ongoing maintainers of bytestring? How familiar will one have to be with liquid-haskell? Will one have to have access to the relevant toolchain? Will CI support performing the additional compile-time tests?

Is there any runtime cost?

Bodigrim · 2020-10-03T12:41:28Z

@vdukhovni good questions!

AFAIU there is no associated runtime cost. My understanding is that if we manage to install Z3 on CI (which is hopefully just apt-get install z3), Liquid Haskell as a GHC plugin should not require any additional toolchain.

From bytestring perspective I am willing to pay more maintenance costs if it helps to avoid embarassing off-by-one bugs like #204. But a hands-on study is needed before we make any decisions.

ranjitjhala · 2020-10-03T14:53:16Z

Yes these are exactly the right questions. There is no runtime cost as the LH additions should only be comments. I think I’ll try to see what it would look like to push the code through LH and then maybe some of these costs will be clearer!

…

On Sat, Oct 3, 2020 at 5:41 AM Bodigrim ***@***.***> wrote: @vdukhovni <https://urldefense.com/v3/__https://github.com/vdukhovni__;!!Mih3wA!V0cLeA8tecyHNZfNgMIKV_5-fe8pK0rL-PoSyw1_eHXsN-tvp92C9jPXRCIcqcRk$> good questions! AFAIU there is no associated runtime cost. My understanding is that if we manage to install Z3 on CI (which is hopefully just apt-get install z3 <https://urldefense.com/v3/__https://packages.ubuntu.com/focal/z3__;!!Mih3wA!V0cLeA8tecyHNZfNgMIKV_5-fe8pK0rL-PoSyw1_eHXsN-tvp92C9jPXRM3QiEE4$>), Liquid Haskell as a GHC plugin should not require any additional toolchain. From bytestring perspective I am willing to pay more maintenance costs if it helps to avoid embarassing off-by-one bugs like #204 <https://urldefense.com/v3/__https://github.com/haskell/bytestring/pull/204__;!!Mih3wA!V0cLeA8tecyHNZfNgMIKV_5-fe8pK0rL-PoSyw1_eHXsN-tvp92C9jPXRJjIXhG-$>. But a hands-on study is needed before we make any decisions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/haskell/bytestring/issues/288*issuecomment-703097566__;Iw!!Mih3wA!V0cLeA8tecyHNZfNgMIKV_5-fe8pK0rL-PoSyw1_eHXsN-tvp92C9jPXRHii91-q$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAMS4OEYRJZSNFFBSROE2UDSI4LYJANCNFSM4R23U5ZA__;!!Mih3wA!V0cLeA8tecyHNZfNgMIKV_5-fe8pK0rL-PoSyw1_eHXsN-tvp92C9jPXRDCn7UnQ$> .

vdukhovni · 2020-10-03T21:41:33Z

From bytestring perspective I am willing to pay more maintenance costs if it helps to avoid embarassing off-by-one bugs like #204. But a hands-on study is needed before we make any decisions.

Yes, this could catch some things like #204, though catching #204 itself would be rather difficult, it wasn't violating any bounds, it was rather refusing to use the entire provided buffer. The properties for that would be non-trivial to state, and would probably require extensive code refactoring... So I expect that #204 would still happen with LH, but other issues could be caught.

The main annoying footgun in ByteString is the legacy offset that users can forget to add to the pointer when using PS and withForeignPtr. Fortunately, with 0.11 the offset will always be zero. Otherwise, I think ByteString has a pretty good safety record. So it is not obvious that expending the cycles here is the best return on investment, there may be other libraries where the need is greater?

Bodigrim · 2020-10-03T21:57:02Z

Looking at Liquid Haskell annotations at https://github.com/ucsd-progsys/liquidhaskell/blob/develop/liquid-bytestring/src/Data/ByteString.spec, I would not say that they are terribly brain-damaging or particularly difficult to maintain. Actually, they look useful even without any prover machinery, just for documentation purposes.

Another point is that users of bytestring will be able to rely on exported Liquid Haskell annotations in their code as well.

ranjitjhala · 2020-11-23T23:06:13Z

Hi all,

I have been working on this for the past few days, and making solid progress.

However, LH raised an error that I'm not sure how to deal with.

Recall that the invariant for the representation of Lazy.ByteString requires
that each chunk be a non-empty strict bytestring.

https://github.com/haskell/bytestring/blob/master/Data/ByteString/Lazy/Internal.hs#L172-L178

-- | The data type invariant:
-- Every ByteString is either 'Empty' or consists of non-null 'S.ByteString's.
-- All functions must preserve this, and the QC properties must check this.
--
invariant :: ByteString -> Bool
invariant Empty                     = True
invariant (Chunk (S.BS _ len) cs) = len > 0 && invariant cs

Consider the implementation of Data.ByteString.foldl1

https://github.com/haskell/bytestring/blob/master/Data/ByteString/Lazy.hs#L465

The code says:

-- | 'foldl1' is a variant of 'foldl' that has no starting value
-- argument, and thus must be applied to non-empty 'ByteString's.
foldl1 :: (Word8 -> Word8 -> Word8) -> ByteString -> Word8
foldl1 _ Empty        = errorEmptyList "foldl1"
foldl1 f (Chunk c cs) = foldl f (S.unsafeHead c) (Chunk (S.unsafeTail c) cs)

Now, AFAICT, the S.unsafeTail will return the tail of the chunk c
but this may well be empty, and so the Chunk (S.unsafeTail c) ...
can violate the invariant?

Can someone explain what I am missing?

Should the Chunk instead use the "smart constructor" chunk?

Thanks!

Bodigrim · 2020-11-23T23:17:36Z

Yes, several function in Data.ByteString.Lazy violate the internal invariant. I discovered this, when @Cmdv was rewriting unsafeHead / unsafeTail to uncons: #304 (comment)

The violation is probably harmless, because folds keep invalid strings to themselves and do not return them outside. Nevertheless, it is still ugly and I'd prefer to fix it, if performance impact is negligible. I'd be happy to accept a PR, accompanied by careful analysis of Core and potentially benchmarks.

vdukhovni · 2020-11-24T03:04:26Z

I think this does not require a fix. The whole point of the invariant is to permit such "sloppy" code to not have to check pre-conditions. All's well, provided the non-canonical lazy bytestrings don't leak out of the library.

[ Similar observations might be made about code that handles the first element of a non-empty list, and then its tail. ]
The foldl here is internal to the lazy bytestring modules and iterates over the bytes of each chunk, never exposing a lazy bytestring with an empty chunk to any user-provided functions.

This optimisation saves redundant tests/branches in the by far most common case where chunks are longer than 1 byte.

Other cases may be "ugly", but this one is just fine IMHO.

vdukhovni · 2020-11-24T03:29:48Z

By the way, while looking at how empty chunks are handled in lines, I notice that the strictness comment is perhaps outdated:

{-
This function is too strict!  Consider,

> prop_lazy =
    (L.unpack . head . lazylines $ L.append (L.pack "a\nb\n") (error "failed"))
  ==
    "a"

fails.  Here's a properly lazy version of 'lines' for lazy bytestrings

    lazylines           :: L.ByteString -> [L.ByteString]
    lazylines s
        | L.null s  = []
        | otherwise =
            let (l,s') = L.break ((==) '\n') s
            in l : if L.null s' then []
                                else lazylines (L.tail s')

we need a similarly lazy, but efficient version.
-}

and yet, everything seems to be working as lazily as I'd expect:

$ ghci
GHCi, version 8.10.2: https://www.haskell.org/ghc/  :? for help
λ> import qualified Data.ByteString.Lazy.Char8  as L
λ> (L.unpack . head . L.lines $ L.append (L.pack "a\nb\n") (error "failed"))
"a"
λ> (L.unpack . head . tail . L.lines $ L.append (L.pack "a\nb\n") (error "failed"))
"b"
λ> (L.unpack . head . tail . tail . L.lines $ L.append (L.pack "a\nb\n") (error "failed"))
"*** Exception: failed
CallStack (from HasCallStack):
  error, called at <interactive>:4:72 in interactive:Ghci3

vdukhovni · 2020-11-24T03:40:38Z

It looks like that comment was added on 2008-01-04 and a commit to resolve the issue was added a day later on 2008-01-05, but the comment was never removed.

Bodigrim · 2020-11-24T22:03:58Z

@ranjitjhala do you possibly have a list of functions violating the invariant?

@vdukhovni for foldl1 the shortcut saves at most O(1) time (or less, because we have to evaluate S.foldl Empty). It is worth to benchmark, but I do not expect this to have any measurable impact.

Agreed about the obsolete lines comment, I'd appreciate a PR.

ranjitjhala · 2020-11-24T22:16:30Z

@Bodigrim -- I am still working through the modules, but I can gather up such a list. With LH the way this works is, you define the invariant as part of the constructor, so:

{-@ data ByteString
      = Empty
      | Chunk { lbHead :: {v:S.ByteString | 0 < bsLen v }, lbTail :: ByteString }
  @-}

and now LH will complain about all those places where the Chunk constructor is used with a possibly empty strict ByteString. Of course LH can prove that in the vast majority of places the constructor is "safe" but I am making a
list of the places where the invariant doesn't seem to hold.

Will post as soon as done!

Bodigrim added the discussion/rfc label Sep 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiquidHaskell integration #288

LiquidHaskell integration #288

Bodigrim commented Sep 26, 2020 •

edited

Loading

ranjitjhala commented Sep 26, 2020 •

edited

Loading

vdukhovni commented Oct 1, 2020 •

edited

Loading

Bodigrim commented Oct 3, 2020

ranjitjhala commented Oct 3, 2020 via email

vdukhovni commented Oct 3, 2020

Bodigrim commented Oct 3, 2020

ranjitjhala commented Nov 23, 2020 •

edited

Loading

Bodigrim commented Nov 23, 2020

vdukhovni commented Nov 24, 2020 •

edited

Loading

vdukhovni commented Nov 24, 2020

vdukhovni commented Nov 24, 2020 •

edited

Loading

Bodigrim commented Nov 24, 2020

ranjitjhala commented Nov 24, 2020

LiquidHaskell integration #288

LiquidHaskell integration #288

Comments

Bodigrim commented Sep 26, 2020 • edited Loading

ranjitjhala commented Sep 26, 2020 • edited Loading

vdukhovni commented Oct 1, 2020 • edited Loading

Bodigrim commented Oct 3, 2020

ranjitjhala commented Oct 3, 2020 via email

vdukhovni commented Oct 3, 2020

Bodigrim commented Oct 3, 2020

ranjitjhala commented Nov 23, 2020 • edited Loading

Bodigrim commented Nov 23, 2020

vdukhovni commented Nov 24, 2020 • edited Loading

vdukhovni commented Nov 24, 2020

vdukhovni commented Nov 24, 2020 • edited Loading

Bodigrim commented Nov 24, 2020

ranjitjhala commented Nov 24, 2020

Bodigrim commented Sep 26, 2020 •

edited

Loading

ranjitjhala commented Sep 26, 2020 •

edited

Loading

vdukhovni commented Oct 1, 2020 •

edited

Loading

ranjitjhala commented Nov 23, 2020 •

edited

Loading

vdukhovni commented Nov 24, 2020 •

edited

Loading

vdukhovni commented Nov 24, 2020 •

edited

Loading