Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LiquidHaskell integration #288

Open
Bodigrim opened this issue Sep 26, 2020 · 13 comments
Open

LiquidHaskell integration #288

Bodigrim opened this issue Sep 26, 2020 · 13 comments

Comments

@Bodigrim
Copy link
Contributor

Bodigrim commented Sep 26, 2020

Given recent improvements to LiquidHaskell, how difficult and how desirable would it be to integrate liquid-bytestring into bytestring package itself?

CC @ranjitjhala

@ranjitjhala
Copy link

ranjitjhala commented Sep 26, 2020

Very interesting idea! I think the best way is to try add the signatures over in a local branch and see how that works. Maybe I will try to do this at some point over the next few weeks..

@vdukhovni
Copy link
Contributor

vdukhovni commented Oct 1, 2020

What new duties will that impose on ongoing maintainers of bytestring? How familiar will one have to be with liquid-haskell? Will one have to have access to the relevant toolchain? Will CI support performing the additional compile-time tests?

Is there any runtime cost?

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Oct 3, 2020

@vdukhovni good questions!

AFAIU there is no associated runtime cost. My understanding is that if we manage to install Z3 on CI (which is hopefully just apt-get install z3), Liquid Haskell as a GHC plugin should not require any additional toolchain.

From bytestring perspective I am willing to pay more maintenance costs if it helps to avoid embarassing off-by-one bugs like #204. But a hands-on study is needed before we make any decisions.

@ranjitjhala
Copy link

ranjitjhala commented Oct 3, 2020 via email

@vdukhovni
Copy link
Contributor

From bytestring perspective I am willing to pay more maintenance costs if it helps to avoid embarassing off-by-one bugs like #204. But a hands-on study is needed before we make any decisions.

Yes, this could catch some things like #204, though catching #204 itself would be rather difficult, it wasn't violating any bounds, it was rather refusing to use the entire provided buffer. The properties for that would be non-trivial to state, and would probably require extensive code refactoring... So I expect that #204 would still happen with LH, but other issues could be caught.

The main annoying footgun in ByteString is the legacy offset that users can forget to add to the pointer when using PS and withForeignPtr. Fortunately, with 0.11 the offset will always be zero. Otherwise, I think ByteString has a pretty good safety record. So it is not obvious that expending the cycles here is the best return on investment, there may be other libraries where the need is greater?

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Oct 3, 2020

Looking at Liquid Haskell annotations at https://github.com/ucsd-progsys/liquidhaskell/blob/develop/liquid-bytestring/src/Data/ByteString.spec, I would not say that they are terribly brain-damaging or particularly difficult to maintain. Actually, they look useful even without any prover machinery, just for documentation purposes.

Another point is that users of bytestring will be able to rely on exported Liquid Haskell annotations in their code as well.

@ranjitjhala
Copy link

ranjitjhala commented Nov 23, 2020

Hi all,

I have been working on this for the past few days, and making solid progress.

However, LH raised an error that I'm not sure how to deal with.

Recall that the invariant for the representation of Lazy.ByteString requires
that each chunk be a non-empty strict bytestring.

https://github.com/haskell/bytestring/blob/master/Data/ByteString/Lazy/Internal.hs#L172-L178

-- | The data type invariant:
-- Every ByteString is either 'Empty' or consists of non-null 'S.ByteString's.
-- All functions must preserve this, and the QC properties must check this.
--
invariant :: ByteString -> Bool
invariant Empty                     = True
invariant (Chunk (S.BS _ len) cs) = len > 0 && invariant cs

Consider the implementation of Data.ByteString.foldl1

https://github.com/haskell/bytestring/blob/master/Data/ByteString/Lazy.hs#L465

The code says:

-- | 'foldl1' is a variant of 'foldl' that has no starting value
-- argument, and thus must be applied to non-empty 'ByteString's.
foldl1 :: (Word8 -> Word8 -> Word8) -> ByteString -> Word8
foldl1 _ Empty        = errorEmptyList "foldl1"
foldl1 f (Chunk c cs) = foldl f (S.unsafeHead c) (Chunk (S.unsafeTail c) cs)

Now, AFAICT, the S.unsafeTail will return the tail of the chunk c
but this may well be empty, and so the Chunk (S.unsafeTail c) ...
can violate the invariant?

Can someone explain what I am missing?

Should the Chunk instead use the "smart constructor" chunk?

Thanks!

@Bodigrim
Copy link
Contributor Author

Yes, several function in Data.ByteString.Lazy violate the internal invariant. I discovered this, when @Cmdv was rewriting unsafeHead / unsafeTail to uncons: #304 (comment)

The violation is probably harmless, because folds keep invalid strings to themselves and do not return them outside. Nevertheless, it is still ugly and I'd prefer to fix it, if performance impact is negligible. I'd be happy to accept a PR, accompanied by careful analysis of Core and potentially benchmarks.

@vdukhovni
Copy link
Contributor

vdukhovni commented Nov 24, 2020

I think this does not require a fix. The whole point of the invariant is to permit such "sloppy" code to not have to check pre-conditions. All's well, provided the non-canonical lazy bytestrings don't leak out of the library.

[ Similar observations might be made about code that handles the first element of a non-empty list, and then its tail. ]
The foldl here is internal to the lazy bytestring modules and iterates over the bytes of each chunk, never exposing a lazy bytestring with an empty chunk to any user-provided functions.

This optimisation saves redundant tests/branches in the by far most common case where chunks are longer than 1 byte.

Other cases may be "ugly", but this one is just fine IMHO.

@vdukhovni
Copy link
Contributor

By the way, while looking at how empty chunks are handled in lines, I notice that the strictness comment is perhaps outdated:

{-
This function is too strict!  Consider,

> prop_lazy =
    (L.unpack . head . lazylines $ L.append (L.pack "a\nb\n") (error "failed"))
  ==
    "a"

fails.  Here's a properly lazy version of 'lines' for lazy bytestrings

    lazylines           :: L.ByteString -> [L.ByteString]
    lazylines s
        | L.null s  = []
        | otherwise =
            let (l,s') = L.break ((==) '\n') s
            in l : if L.null s' then []
                                else lazylines (L.tail s')

we need a similarly lazy, but efficient version.
-}

and yet, everything seems to be working as lazily as I'd expect:

$ ghci
GHCi, version 8.10.2: https://www.haskell.org/ghc/  :? for help
λ> import qualified Data.ByteString.Lazy.Char8  as L
λ> (L.unpack . head . L.lines $ L.append (L.pack "a\nb\n") (error "failed"))
"a"
λ> (L.unpack . head . tail . L.lines $ L.append (L.pack "a\nb\n") (error "failed"))
"b"
λ> (L.unpack . head . tail . tail . L.lines $ L.append (L.pack "a\nb\n") (error "failed"))
"*** Exception: failed
CallStack (from HasCallStack):
  error, called at <interactive>:4:72 in interactive:Ghci3

@vdukhovni
Copy link
Contributor

vdukhovni commented Nov 24, 2020

It looks like that comment was added on 2008-01-04 and a commit to resolve the issue was added a day later on 2008-01-05, but the comment was never removed.

@Bodigrim
Copy link
Contributor Author

@ranjitjhala do you possibly have a list of functions violating the invariant?

@vdukhovni for foldl1 the shortcut saves at most O(1) time (or less, because we have to evaluate S.foldl Empty). It is worth to benchmark, but I do not expect this to have any measurable impact.

Agreed about the obsolete lines comment, I'd appreciate a PR.

@ranjitjhala
Copy link

@Bodigrim -- I am still working through the modules, but I can gather up such a list. With LH the way this works is, you define the invariant as part of the constructor, so:

{-@ data ByteString
      = Empty
      | Chunk { lbHead :: {v:S.ByteString | 0 < bsLen v }, lbTail :: ByteString }
  @-}

and now LH will complain about all those places where the Chunk constructor is used with a possibly empty strict ByteString. Of course LH can prove that in the vast majority of places the constructor is "safe" but I am making a
list of the places where the invariant doesn't seem to hold.

Will post as soon as done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants