-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Dhall.Map
module
#612
Conversation
This change drops the dependency on `insert-ordered-containers` in favor of a new `Dhall.Map` module which implies a limited subset of the functionality we need. In addition to simplifying the dependency tree this also improves the performance of the `deep-nested-large-record` benchmark by 2.4x: Before: ``` time 1.412 s (1.388 s .. 1.436 s) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.398 s (1.387 s .. 1.404 s) std dev 10.02 ms (2.502 ms .. 13.39 ms) variance introduced by outliers: 19% (moderately inflated) ``` After: ``` time 582.2 ms (552.5 ms .. 628.4 ms) 0.999 R² (0.999 R² .. 1.000 R²) mean 565.9 ms (560.3 ms .. 574.6 ms) std dev 8.901 ms (1.218 ms .. 11.75 ms) variance introduced by outliers: 19% (moderately inflated) ```
@Gabriel439 Thanks! Yeah, @FintanH tried this and we saved a significant amount of time (50–60s) on one of our benchmarks. Our branch that just eliminates insert-order completely (plain Map, no sorting) shaves like another 50–60s, but at a loss of the normalized ordering (and @FintanH may have more details to share, since he’s been doing all our benchmarking. |
@sellout: Just out of curiosity: what is the total running time? (or, equivalently, what is the fractional performance improvement?) |
@Gabriel439 @sellout Hey! So I did a couple of runs today and noticed the speedup was even better. It went from ~190s to ~100s. I've been adding some tweaks like inlining and I'm seeing some promising feedback. Just witnessed a run that was 55s 😁 |
So, actually, my numbers might be off – we have a lot of benchmark numbers floating around at the moment. But this branch gives us a 102s run. The previous time I see (including the Pinging @FintanH to verify. Other than this Map stuff (which is rapidly improving) our other bottlenecks look a lot like
the normalization cases are for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting! This is probably a big deal for dhall-to-cabal
too.
I wonder if we ought to be explicitly property-testing Dhall.Map
?
> union = unionWith (\v _ -> v) | ||
-} | ||
union :: Ord k => Map k v -> Map k v -> Map k v | ||
union (Map mL ksL) (Map mR ksR) = Map m ks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not define in terms of unionWith
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quasicomputational: This was just in case Data.Map.union
was more efficient than Data.Map.unionWith const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, good thinking. Having just looked at the code from containers
I'd be moderately surprised if they differ after inlining, but it's uncertain enough that duplicating is probably OK.
> intersection = intersectionWith (\v _ -> v) | ||
-} | ||
intersection :: Ord k => Map k a -> Map k b -> Map k a | ||
intersection (Map mL ksL) (Map mR _) = Map m ks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, why not define in terms of intersectionWith
?
src/Dhall/Map.hs
Outdated
ks = nub (map fst kvs) | ||
|
||
nub :: Ord k => [k] -> [k] | ||
nub = go Data.Set.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest renaming to nubOrd
so that no-one else gets this confused with Data.List.nub
.
This is done primarily to avoid a dependency on @insert-ordered-containers@ | ||
and also to improve performance | ||
-} | ||
data Map k v = Map (Data.Map.Map k v) [k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seq
instead of a list improves the asymptotic of union
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to add a benchmark first to double-check this because sometimes the better per-element constant factors of lists outweigh the asymptotic penalty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quasicomputational: So I added a new benchmark that exercises Dhall.Map.union
and the performance difference between []
and Seq
is indistinguishable:
With []
:
benchmarking issue 412
time 666.6 ms (643.0 ms .. 680.4 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 668.7 ms (664.5 ms .. 672.4 ms)
std dev 4.518 ms (3.886 ms .. 4.976 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking union performance
time 399.0 ms (378.1 ms .. 429.2 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 399.1 ms (393.0 ms .. 403.2 ms)
std dev 5.819 ms (2.832 ms .. 7.247 ms)
variance introduced by outliers: 19% (moderately inflated)
With Seq
:
benchmarking issue 412
time 669.0 ms (628.7 ms .. 702.0 ms)
1.000 R² (0.998 R² .. 1.000 R²)
mean 673.7 ms (668.5 ms .. 678.3 ms)
std dev 5.797 ms (2.593 ms .. 7.541 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking union performance
time 409.8 ms (369.2 ms .. 441.9 ms)
0.999 R² (0.996 R² .. 1.000 R²)
mean 402.8 ms (393.8 ms .. 409.7 ms)
std dev 9.250 ms (3.117 ms .. 11.40 ms)
variance introduced by outliers: 19% (moderately inflated)
... so I'll go with []
for now since it's simpler and leads to fewer conversions for other operations
src/Dhall/TypeCheck.hs
Outdated
instance Monoid b => Monoid (Process a b) where | ||
mempty = Process (pure mempty) | ||
|
||
mappend = (<>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to break on GHC 8.2 because Semigroup =/> Monoid
.
deriving (Data, Eq, Foldable, Functor, Show, Traversable) | ||
|
||
instance Ord k => Data.Semigroup.Semigroup (Map k v) where | ||
(<>) = union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might I suggest strategically avoiding defining Semigroup
and Monoid
for Map
? These forgetful instances often do the wrong thing for Data.Map.Map
; if you want the left-biased behaviour, it's often best to write union
explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quasicomputational: Yeah, I'm fine with removing these instances and using union
or a newly-added Dhall.Map.empty
explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, now I remember why I added it. Dhall.RecordType
depends on Map
being a Monoid
. I could add a newtype wrapper there just for this purpose, though
... as suggested by @quasicomputational
Ya, so I was able to go from that ~100s to ~55s with the changes here: https://github.com/FintanH/dhall-haskell/tree/fintan/new-map Unfortunately it seems we're still choking our more complicated dhall expression. You can get an idea of the hot points from the profile here (had to cut the processing short) and grep for "TYPECHECK": |
To shed some more light on the context of the above profile. There are a lot of unions in our AST and these grow every time we use a constructor. We think that's where the bottle neck is but how to optimize that is the million dollar question. |
... as suggested by @quasicomputational I also made a change to not expose the implementation of `RecordType` and `RecordInputType` since exposing the internals is not necessary and frees us to make changes like these safely in the future
... as suggested by @quasicomputational
@FintanH: Give me a million dollars and I can fix it 😉 In all seriousness, the next major step is just being more intelligent about where we are strict and lazy in normalization. The logical conclusion of this is implementing the algorithm described in this book, since that algorithm prevents duplication of this sort of work: https://dl.acm.org/citation.cfm?id=320040 Until that happens we might be able to play tricks by delaying expanding |
Hi, i've tried to build this pr with C:\dhall-haskell\src\Dhall\TypeCheck.hs:3622:12:
Ambiguous occurrence ‘<>’
It could refer to either ‘Data.Monoid.<>’, imported from ‘Data.Monoid’ at src\Dhall\TypeCheck.hs:28:21-24
or ‘Pretty.<>’, imported from ‘Data.Semigroup’ at src\Dhall\TypeCheck.hs:30:24-36 |
For lts-6 the fix seems to be removing |
Another possible caveat: |
src/Dhall/Map.hs
Outdated
and also to improve performance | ||
-} | ||
data Map k v = Map (Data.Map.Map k v) [k] | ||
deriving (Data, Eq, Foldable, Functor, Show, Traversable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use the Eq
instance? That's semi-dubious for this type, since we can reasonably want either strict equality of key order, or equality up to key order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used by Expr
needing an Eq
instance, since Record
and Union
are Map
s under the hood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually going to lead to changed behaviour, then: the Eq
instance for InsOrdHashMap
is equality up to key order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure that I follow. Would Eq
not check that the list is in order as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at the definition of the instances for InsOrdHashMap
: it totally tosses away the indices of the elements. That's the code Dhall's currently using, meaning Record
and Union
compare for equality up to key insertion order, but this PR is going to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see it only compares values inserted into the underlying HashMap
. Maybe I'm misunderstanding something? :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what the InsOrdHashMap
instance does. However, the derived Dhall.Map.Map
instance is going to check for key ordering as well. To match the previous behaviour, we'd need to write something like:
instance (Eq k, Eq v) => Eq (Map k v) where
Map a _ == Map b _ = a == b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right right, I've done that here actually 😄 https://github.com/FintanH/dhall-haskell/blob/fintan/new-map/src/Dhall/Map.hs#L55-L57
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FintanH @quasicomputational: I prefer to use the new order-sensitive Eq
instance. In other words, I view a Dhall.Map.Map
as an ordered association list that just happens to provide better time complexity for element lookup.
The main reason why is that this is more consistent with the standard, which no longer requires that record/union equality is order insensitive (See: dhall-lang/dhall-lang#223). Instead, whenever order-insensitivity is required the standard requires an explicit β-normalization step which (among other things) sorts the records/unions.
import Control.Exception (Exception) | ||
import Data.Data (Data(..)) | ||
import Data.Foldable (forM_, toList) | ||
import Data.Monoid ((<>)) | ||
import Data.Sequence (Seq, ViewL(..)) | ||
import Data.Semigroup (Semigroup(..)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an ambiguous occurrence between the <>
coming from here and the one above from Data.Monoid
. I'd recommend removing the one from Data.Monoid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also happens in src/Dhall.hs
. This seems to be just for GHC-8.2 btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ghc-7.10.3 too 😉
* Cleanup * Centralize all `Map`-related operations * Use custom `Map` implementation * Inline and specialise functions * Custom Eq. Use inlinable * Mark inlinable * Remove SCC annotations * mappend from Semigroup * mappend from Semigroup * Clean up of Map, Expr Eq, and TypeCheck
This is done primarily to avoid a dependency on @insert-ordered-containers@ | ||
and also to improve performance | ||
-} | ||
data Map k v = Map (Data.Map.Map k v) [k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could the constructor be exported? it will let users extend it with new ops and automatic class deriving. dhall-to-cabal
would need to add some operators and instances for Semigroup
and Monoid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid exporting the constructor so that we can easily change it if other representations end up being more efficient. I'd prefer to just add whatever dhall-to-cabal
needs, including the Semigroup
/Monoid
instance if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I write down the list of operations used by dhall-to-cabal here
src/Dhall.hs
Outdated
@@ -90,7 +90,6 @@ import Control.Exception (Exception) | |||
import Control.Monad.Trans.State.Strict | |||
import Data.Functor.Contravariant (Contravariant(..), (>$<)) | |||
import Data.Functor.Contravariant.Divisible (Divisible(..), divided) | |||
import Data.Monoid ((<>)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack build --stack-yaml stack-lts6.yaml
fails with
C:\dhall-haskell\src\Dhall.hs:153:16:
Not in scope: ‘<>’
Perhaps you meant one of these:
data constructor ‘Data.Sequence.:>’ (imported from Data.Sequence),
‘<|>’ (imported from Control.Applicative),
‘<*>’ (imported from Prelude)
Adding it again makes it work and the builds with stack.yaml
and stack-lts11.yaml
(ghc-8.2.2
) work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems in a previous version the module imported Data.Semigroup
, so it was necessary to remove Dhall.Monoid.(<>)
, but without any of them <>
is out of scope for previous versions of ghc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried searching stackage.org and it seems Data.Semigroup
wasn't available in base
at the time. Is there a reason we're supporting and old lts? If it's to support stack + a certain version of ghc we can just specify which compiler to use in the yaml file instead of the lts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am afraid that etlas, that uses dhall
through dhall-to-etlas is being compiled with lts-6
and afaik it can't be compiled with newer lts's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But adding import Data.Monoid ((<>))
seems to fix it, at least for lts-6
, lts-11
and lts-12
, could it cause some problem now or in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @rahulmutt (owner of etlas) could answer it better than me 😉
The main goal is make dhall the default configuration format for etlas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've currently got a stack build
going with etlas
and this yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, etlas is a subproject of eta itself and we should test the entire build too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. I'll move the discussion to those repos. In the meantime, I did a fix up of this branch for both stack yamls. I'll make a PR against this branch now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmap f (Note s e1) = Note s (fmap f e1) | ||
fmap f (ImportAlt e1 e2) = ImportAlt (fmap f e1) (fmap f e2) | ||
fmap f (Embed a) = Embed (f a) | ||
{-# INLINABLE fmap #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding a comment noting that this annotation is the whole reason why this instance isn't derived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance of compiling dhall-to-cabal files has improved significantly 🎆 : C:\dhall-to-cabal>timer "dhall-master < dhall-to-cabal.dhall > null"
00:00:02,44
00:00:23,93
C:\dhall-to-cabal>timer "dhall-newmap < dhall-to-cabal.dhall > null"
00:00:36,72
00:00:48,08 The new profile shows that cost has moved to
|
@jneira: Do those timings mean that this branch got slower compared to |
* Cleanup * Centralize all `Map`-related operations * Use custom `Map` implementation * Inline and specialise functions * Custom Eq. Use inlinable * Mark inlinable * Remove SCC annotations * bump up megaparsec version (#613) * mappend from Semigroup * mappend from Semigroup * Clean up of Map, Expr Eq, and TypeCheck * Add fixes to compilation of stack.yaml and lts-6
@Gabriel439 no, no, sorry for the poor stats but master took ~21 seconds and new_map ~11 seconds |
@FintanH: Alright, I think I've addressed all feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 it
@Gabriel439 is there any plan for implement the Map operations used by dhall-to-cabal (difference, filter,...)? |
@jneira: Oops! Yeah, I can add those |
@jneira: Alright, I've pushed a branch of I'm just going to add some doctests for |
This includes a change to `traverseWithKey_` since the test revealed that it didn't traverse the keys in their original order
@Gabriel439 Not sure if i should comment here but i've detected that performance of actual master version is slightly worse that the pr (at least my branch created from it) jneira@laptop MINGW64 /d/ws/eta/dhall/dhall-to-cabal
$ time dhall-newmap < dhall-to-cabal.dhall > nul
real 0m11.479s
user 0m0.000s
sys 0m0.031s
jneira@laptop MINGW64 /d/ws/eta/dhall/dhall-to-cabal
$ time dhall-master < dhall-to-cabal.dhall > nul
real 0m16.116s
user 0m0.000s
sys 0m0.015s Those results match the benchmarks? |
Profiles are significative:
vs
|
This change drops the dependency on
insert-ordered-containers
in favor of anew
Dhall.Map
module which implements a limited subset of the functionality weneed. In addition to simplifying the dependency tree this also improves
the performance of the
deep-nested-large-record
benchmark by 2.4x:Before:
After: