-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance could be better #40
Comments
Unfortunately, unless something can be done, this issue is probably going to force me to switch back to using |
Performance is 10 times worse than |
There's no problem with dlist, as far as I can see, so this profile doesn't tell us where the problem really lies. I tried replacing dlist with Data.Sequence from containers (which is a dependency of this package anyway), and this didn't affect performance significantly. After that, profiling says
|
It would be nice to make progress on this issue. Maybe the |
I've had a brief look at the Core of
I also thought that it was weird that
Maybe it would also be helpful to define I also noticed that this package uses |
https://wg21.link/index.yaml could be used for benchmarking. |
I've given this a spin, building with --- a/HsYAML.cabal
+++ b/HsYAML.cabal
@@ -108,7 +108,7 @@ library
if !impl(ghc >= 7.10)
build-depends: nats >= 1.1.2 && < 1.2
- ghc-options: -Wall
+ ghc-options: -Wall -fprof-late
executable yaml-test
hs-source-dirs: src-test
@@ -133,7 +133,7 @@ executable yaml-test
else
buildable: False
- ghc-options: -rtsopts
+ ghc-options: -rtsopts -fprof-late
test-suite tests
default-language: Haskell2010 I then profiled the following command:
…where Results:
I'm not quite sure what STG for the other top cost centers:
|
I think some allocations could be avoided by turning Lines 260 to 282 in be60400
It might be helpful to compress some of these fields into a bit field. I wonder how far we can get by tweaking the existing code though. It's clear that the priority has been to fully comply with the YAML spec. Getting good performance out of the same code might be rather tricky. |
One pandoc user has run into an issue with a large (100k line) bibliography in YAML format (for details see jgm/pandoc#6084). Prior to pandoc 2.8 (when we used the
yaml
package), this was handled fairly quickly, but now that we use HsYAML it takes 18 seconds to read the bibliography. I confirmed that the slowdown is due to HsYAML, by loading the file in a GHCI session asb
and tryingWhat are the performance expectations for HsYAML? Have you made efforts to optimize here? aeson claimed decoding speeds of 46M/sec on a slower machine than mine; this file is 3M. I wouldn't expect that YAML parsing could be as fast as JSON parsing, but it would be nice to get in the 4M/sec range (10x slower than aeson).
EDIT: 82G allocated with 1G max residency seems an awful lot to parse a 3M file!
Profiling reports these as the biggest cost centers:
Heap profiling shows that the DLists account for a lot of the allocation.
The text was updated successfully, but these errors were encountered: