Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

construct huffman table from symbol spans #97

Merged
merged 1 commit into from
Oct 22, 2023

Conversation

oliverlee
Copy link
Collaborator

@oliverlee oliverlee commented Sep 28, 2023

Define class template symbol_span that represents an inclusive range
from first to last. The symbol_bitsize constructor of huffman::table
now takes a range of pair-likes of symbol_span and bitsize, allowin
gthe same bitsize to be associated with a contiguous range of symbols.

For example, the static Huffman tree corresponding to the literal/length
alphabet in section 3.2.6 of the DEFLATE RFC can be defined as:

constexpr auto table =
huffman::table<std::uint16_t, 288>{
huffman::symbol_bitsize,
{{{ 0, 143}, 8},
{{144, 255}, 9},
{{256, 279}, 7},
{{280, 287}, 8}}};

resolves #91

Change-Id: I50f7ea63561de0bb785c85467cdb943829b65edf
Co-Authored-By: Gary Miguel [email protected]

@oliverlee oliverlee force-pushed the I50f7ea63561de0bb785c85467cdb943829b65edf branch 2 times, most recently from c3f3a71 to b8a3e0d Compare September 28, 2023 07:11
@codecov-commenter
Copy link

codecov-commenter commented Sep 28, 2023

Codecov Report

Merging #97 (ac578ee) into master (87d0405) will increase coverage by 2.14%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #97      +/-   ##
==========================================
+ Coverage   58.98%   61.12%   +2.14%     
==========================================
  Files          12       14       +2     
  Lines         885      939      +54     
==========================================
+ Hits          522      574      +52     
- Misses        363      365       +2     
Files Coverage Δ
...ffman/src/detail/flattened_symbol_bitsize_view.hpp 100.00% <100.00%> (ø)
huffman/src/encoding.hpp 100.00% <ø> (ø)
huffman/src/symbol_span.hpp 100.00% <100.00%> (ø)
huffman/src/table.hpp 95.61% <100.00%> (-1.53%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@oliverlee oliverlee force-pushed the I50f7ea63561de0bb785c85467cdb943829b65edf branch 3 times, most recently from 457b6d0 to 9440b84 Compare September 28, 2023 23:57
@oliverlee oliverlee changed the title wip nicer symbol-bitsize ctor construct huffman table from symbol spans Sep 28, 2023
@oliverlee oliverlee force-pushed the Ia125173ce91e2e189c23de0956d9eaa7aa51762c branch from 3cb36c5 to 8bac466 Compare October 5, 2023 00:54
@oliverlee oliverlee force-pushed the I50f7ea63561de0bb785c85467cdb943829b65edf branch 4 times, most recently from 69122c0 to c70f795 Compare October 5, 2023 01:09
@oliverlee oliverlee force-pushed the Ia125173ce91e2e189c23de0956d9eaa7aa51762c branch 2 times, most recently from 3a66b30 to fb781a1 Compare October 7, 2023 03:38
@oliverlee oliverlee force-pushed the I50f7ea63561de0bb785c85467cdb943829b65edf branch 2 times, most recently from 6a09bb7 to ec2ce26 Compare October 7, 2023 03:43
@oliverlee oliverlee force-pushed the Ia125173ce91e2e189c23de0956d9eaa7aa51762c branch from fb781a1 to 3079c1d Compare October 7, 2023 03:43
@garymm garymm force-pushed the I50f7ea63561de0bb785c85467cdb943829b65edf branch from ec2ce26 to efcadda Compare October 12, 2023 04:33
@garymm
Copy link
Owner

garymm commented Oct 12, 2023

I made a couple of very minor changes and pushed. We can discuss the other stuff next time we meet if you want.

huffman/src/symbol_span.hpp Outdated Show resolved Hide resolved
/// An inclusive span of symbols
///
template <symbol S>
class symbol_span : public std::ranges::view_interface<symbol_span<S>>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and ranges::views::iota?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It pretty much is the same as ranges::iota_view (ranges::views::iota is a function that creates an iota_view).

It just has a slightly different constructor.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the only difference being end is inclusive rather than exclusive?
If so seems like a lot of code for that, but OK.


namespace starflate::huffman::detail {

template <std::ranges::view V>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead add a little complexity to the table constructor and avoid this class?
Could we have a new tag for a constructor to differentiate the constructor that takes symbol-bitsize pairs and the one that takes symbol-range-bitsize pairs?

I'm honestly a bit lost in what's going on here, which doesn't mean it's wrong, but I want to explore if we can make it a bit easier for a noob like me to follow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure let's look into that. I'll give it some thought tomorrow or this weekend.

At a high level, we get a range of ranges:

[
  (a, b, c), // subrange 1
  (d, e),    // subrange 2
  (f, g, h), // subrange 3
  (i),       // subrange 4
]

and we want to flatten to:

[a, b, c, d, e, f, g, h, i]

This view provides a lazy flatten so we don't need to actually construct the flattened range of all the subranges.

Note that the subranges (symbol_spans) don't actually store all the elements but are instead are generators.

We're essentially doing this: https://docs.python.org/3/library/itertools.html#itertools.chain
for a collection of symbol_spans.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some thoughts about this and I'll try to write it out tomorrow.

Copy link
Collaborator Author

@oliverlee oliverlee Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, table constructors take a range and potentially a tag, and pass that range to the table_storage's constructor. The table_storage's constructor is responsible for taking that range and filling up the internal container with "valid enough" codes and symbols. Once the table_storage's constructor is complete, no elements can be added or removed, as members such as resize and emplace_back are not exposed upwards. After table_storage's constructor is complete, we get to the function body of table's constructor which is responsible for building the Huffman tree, setting skip fields, and then canonicalizing the codes. table_storage is responsible for adding elements, table is responsible for moving them around.

I was thinking about passing the symbol-span+count range down to the table_storage constructor and doing nested for loops. If we're able to reuse existing constructors, then we can focus on reviewing the correctness of the range adapter code, which admittedly requires understanding the range abstractions. But I think this will be pretty useful given that this library is focusing on handling streams of data and we'll likely run into other cases where a range abstraction is useful, even after switching to execution on the GPU. I am very much in the no raw loops camp.

I've added more documentation to the flattened_symbol_bitsize_view class template. I've also removed the to_code_symbol function and instead do the equivalent in flattened_symbol_bitsize_view::iterator::operator*(). Hopefully things are a bit clearer but let me know if there's anything I can explain or we can also try another approach.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that threw me off was the use of concepts to figure out whether to use this flattened_symbol_bitsize_view, and I was suggesting using a constructor tag intead. I was not objecting to the existence of this class.
But I guess from a caller's point of view this is nicer, and you've already figured it out so I guess we can keep it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misunderstood. But yeah, I'll generally try to keep the interface simpler for the caller at the expense of more complexity in the implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is your understanding of the view class? I don't mind going over it if there are things you don't understand.

@oliverlee oliverlee force-pushed the Ia125173ce91e2e189c23de0956d9eaa7aa51762c branch 2 times, most recently from f596820 to 5360f66 Compare October 13, 2023 06:16
Base automatically changed from Ia125173ce91e2e189c23de0956d9eaa7aa51762c to master October 13, 2023 15:47
@oliverlee oliverlee force-pushed the I50f7ea63561de0bb785c85467cdb943829b65edf branch from efcadda to 811d680 Compare October 18, 2023 04:56
@oliverlee oliverlee requested a review from garymm October 18, 2023 04:56
@garymm garymm force-pushed the I50f7ea63561de0bb785c85467cdb943829b65edf branch from 811d680 to 57ef4a6 Compare October 22, 2023 03:02
Define class template `symbol_span` that represents an inclusive range
from first to last. The `symbol_bitsize` constructor of `huffman::table`
now takes a range of pair-likes of `symbol_span` and bitsize, allowin
gthe same bitsize to be associated with a contiguous range of symbols.

For example, the static Huffman tree corresponding to the literal/length
alphabet in section 3.2.6 of the DEFLATE RFC can be defined as:

 constexpr auto table =
  huffman::table<std::uint16_t, 288>{
    huffman::symbol_bitsize,
    {{{  0, 143}, 8},
     {{144, 255}, 9},
     {{256, 279}, 7},
     {{280, 287}, 8}}};

resolves #91

Change-Id: I50f7ea63561de0bb785c85467cdb943829b65edf
Co-Authored-By: Gary Miguel <[email protected]>
@garymm garymm force-pushed the I50f7ea63561de0bb785c85467cdb943829b65edf branch from 57ef4a6 to ac578ee Compare October 22, 2023 03:06
@garymm garymm merged commit 217076e into master Oct 22, 2023
15 checks passed
@garymm garymm deleted the I50f7ea63561de0bb785c85467cdb943829b65edf branch October 22, 2023 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add constructor to huffman::table that takes symbols and bit lengths
3 participants