Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wavelet Tree Module #4813

Open
wants to merge 63 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
764cbc8
Update Wavelet.mdx
crafticat May 15, 2024
fd171f2
Update Wavelet.mdx
crafticat May 15, 2024
9103e44
Update Wavelet.mdx
crafticat May 16, 2024
0640ad1
Update Wavelet.mdx
crafticat Jun 19, 2024
70f4794
Update Wavelet.problems.json
crafticat Jun 19, 2024
8d77548
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 19, 2024
3c29d69
Requested changes
crafticat Jun 22, 2024
54abaaf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 22, 2024
d367691
Added image
crafticat Jun 22, 2024
0002d9e
Req changes v2.mdx
crafticat Jun 24, 2024
2978c67
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 24, 2024
a49cd77
Update Wavelet.mdx
crafticat Jun 30, 2024
6df8c0b
Update Wavelet.problems.json
crafticat Jun 30, 2024
0379489
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 30, 2024
a2ae4c1
Update Wavelet.mdx
crafticat Jul 7, 2024
76bd734
Update Wavelet.problems.json
crafticat Jul 7, 2024
22d3b84
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 7, 2024
dc752bd
Update Wavelet.mdx
crafticat Jul 9, 2024
8d1c959
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 9, 2024
f4ad194
Update Wavelet.mdx
crafticat Jul 9, 2024
a5df338
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 9, 2024
fd9509e
fix build
ryanchou-dev Jul 11, 2024
c5394c4
solution metadata
ryanchou-dev Jul 11, 2024
b5864e2
format
ryanchou-dev Jul 11, 2024
56a111c
Update Wavelet.mdx
crafticat Jul 25, 2024
10585a2
Update Wavelet.mdx
crafticat Jul 25, 2024
8eb4b89
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 25, 2024
f964801
Update Wavelet.mdx
crafticat Jul 26, 2024
f459084
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 26, 2024
5738143
Update Wavelet.problems.json
crafticat Aug 15, 2024
89c01e0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2024
27ca815
Update Wavelet.problems.json
crafticat Aug 15, 2024
0dbb8e1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2024
3466558
Update content/6_Advanced/Wavelet.mdx
crafticat Aug 21, 2024
cb82a0c
Update content/6_Advanced/Wavelet.mdx
crafticat Aug 21, 2024
1631205
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 21, 2024
75ebdca
Changed problem name.json
crafticat Oct 2, 2024
c49efd4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 2, 2024
fd77752
Changed description.mdx
crafticat Oct 3, 2024
03ddc49
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 3, 2024
5013297
Some minor changes.mdx
crafticat Oct 3, 2024
a3f89eb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 3, 2024
39519df
Changed variable name styles (to snake cases).mdx
crafticat Oct 6, 2024
3a1c37a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 6, 2024
0cd5cfa
Update content/6_Advanced/Wavelet.mdx
crafticat Oct 8, 2024
4ee7ff8
Update content/6_Advanced/Wavelet.mdx
crafticat Oct 8, 2024
43abe71
Update content/6_Advanced/Wavelet.mdx
crafticat Oct 8, 2024
8586c51
Some grammer and latex changes + added motivation
crafticat Oct 8, 2024
3774454
Structure changes + latex errors
crafticat Oct 21, 2024
d36e36f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 21, 2024
920e150
Update content/6_Advanced/Wavelet.mdx
crafticat Oct 25, 2024
e688cb5
Update content/6_Advanced/Wavelet.mdx
crafticat Oct 25, 2024
af86360
Update content/6_Advanced/Wavelet.mdx
crafticat Oct 25, 2024
f5d7c2a
Added code for updates.mdx
crafticat Nov 5, 2024
6f4a74a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
0c64b49
var names changed.mdx
crafticat Nov 5, 2024
ee1f440
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 5, 2024
967677c
Formatting errors and grammar changes .mdx
crafticat Nov 27, 2024
acb53cf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 27, 2024
1983f24
Update Wavelet.mdx
crafticat Nov 27, 2024
9702306
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 27, 2024
b4c4fbf
Latex changes + desc for updates.mdx
crafticat Nov 30, 2024
46be5f5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/5_Plat/DC-SRQ.problems.json
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,11 @@
{
"uniqueId": "cf-840D",
"name": "Destiny",
"url": "https://codeforces.com/problemset/problem/840/D",
"url": "https://codeforces.com/contest/840/problem/D",
"source": "CF",
"difficulty": "Hard",
"isStarred": false,
"tags": [],
"tags": ["Wavelet"],
"solutionMetadata": {
"kind": "autogen-label-from-site",
"site": "CF"
Expand Down
237 changes: 229 additions & 8 deletions content/6_Advanced/Wavelet.mdx
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
---
id: wavelet
title: 'Wavelet Tree'
author: Benjamin Qi
author: Benjamin Qi, Omri Jerbi
prerequisites:
- RURQ
description: "?"
description: Wavelet trees support efficient queries for the kth minimum element in a range"
frequency: 0
---

## Wavelet Tree
# Wavelet Tree
Wavelet trees are data structures that support efficient queries for the k-th minimum element in a range by maintaining a segment tree over values instead of indices.

<FocusProblem problem="waveletSam" />

Like a segment tree on values rather than indices.

<Resources>
<Resource
source = "IOI"
Expand All @@ -32,9 +30,232 @@ Like a segment tree on values rather than indices.
</Resource>
</Resources>

### Solution - Range K-th Smallest
Suppose you want to support the following queries:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this solution header deleted?

if you want to explain how a wavelet tree works before an explanatory problem, move it up

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example problem is essentially a second type of query, but to explain it more clearly, I need first to describe the first type of query

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should make that clear from the start 👀

also there should still be a solution header for the problem. you would probably move the first type of query and its explanation before this header.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, the structure should be as follows:

  • First, we will discuss a different type of query to clarify the second.
    = Explanation of the first type of query.
  • Solution header followed by the explanation of the second type of query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't have the solution header at the moment—if you think explaining the first query is necessary to understand the second query, i would suggest a structure like so:

  • solution header
  • before we solve this problem, let's consider a simpler ver. where ...
  • explanation of first query
  • bring it back to second query
  • explain
  • conclude the problem
  • implementation

also if you could integrate some of the info blocks in the module that'd be great, right now they kind of interrupt the flow of the content. they generally should only appear once at the beginning or end of a section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The info blocks were once part of the text, but since it is not necessary to read them to understand, I have moved them to info blocks. However, I believe they could provide them for those who need to recap or didn't full understand what was said.
Also the structure was changed.


- Given a range $l$, $r$ count the number of occurrences of value $x$.
- Given a range $l$, $r$ find the $k$ smallest element
crafticat marked this conversation as resolved.
Show resolved Hide resolved

With wavelet tree, you can easily support those queries in $log(M)$ time.
when M is the maximum value in the array.
crafticat marked this conversation as resolved.
Show resolved Hide resolved

## Wavelet tree structure

A wavelet tree is a binary tree where each node represents a range of values.
The root node covers the entire range, and each subsequent level splits the
range into two halves.

We are going to maintain a segment tree over values instead of indices. Each segment will contain indices whose values lie within the segment's range. We'll save those indices in a vector. Notice that an index can be in at most $log(M)$ segments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can combine these two paragraphs since rurq is a prereq

also might be worth leading the reader to why we'd want to segtree over values instead of indices (why?), but it's your choice on how you want to explain it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but it's a bit different. I want to emphasize that we are maintaining segments over values and explain what that means, so they won't confuse it with a standard segment tree.

as for your second comment I agree that could be useful information, I will try to add it.


<Spoiler title="Wavelet tree Visualization">

Let's say our array is: $[3,5,3,1,2,2,3,4,5,5]$
ryanchou-dev marked this conversation as resolved.
Show resolved Hide resolved
Each node has an array representing the indices of every number between l and r

![Wavelet Tree Visualization](./assets/diagram.png)
</Spoiler>

## Solving the first type of query
**Given a range l, r count the number of occurrences of value x.**
crafticat marked this conversation as resolved.
Show resolved Hide resolved

To calculate the number of occurrences from $𝑙$ to $𝑟$, we can use the following
formula:

$$
\begin{aligned}
\texttt{occurrences}(l, r) = \texttt{occurrences}(r) - \texttt{occurrences}(l)
\end{aligned}
$$

This reduces the problem to counting the number of occurrences in a prefix.

One way to solve the problem is to go to the leaf node
and perform a binary search for the number of indices less than $𝑟$
However, let's explore a different approach that can also be extended to the
second type of query.

**A different way to find the Index of $𝑟$ in the list of vertices**
crafticat marked this conversation as resolved.
Show resolved Hide resolved

Instead of binary searching on the leaf, we update $𝑟$ as we recurse down the
tree.
If we can determine the position (index) of $𝑟$ in the left and right
children of a node.
We can recurse down the tree and determine its position in the leaf node.

To find the position of $𝑟$ in a node's left and right children, we need to
determine how many indices are smaller than the middle value (mid) and precede
$𝑟$.
This can be done using a prefix sum.

Let's define:
- $c[i]$ = as 1 if $index[i]$ is smaller than mid otherwise 0
crafticat marked this conversation as resolved.
Show resolved Hide resolved
- $prefixB[i]$ as prefix sum of $c[i]$

Formally

$$
c[i] = index[i] < mid ? 1 : 0;
prefixB[i] = prefixB[i - 1] + c[i]
$$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest getting rid of the formally part if you're just rewriting your statement in code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it might help people who don't fully understand what's written, or make it a bit easier to comprehend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Let's define:
- $c[i]$ = as 1 if $index[i]$ is smaller than mid otherwise 0
- $prefixB[i]$ as prefix sum of $c[i]$
Formally
$$
c[i] = index[i] < mid ? 1 : 0;
prefixB[i] = prefixB[i - 1] + c[i]
$$
Let's define:
- $c[i]$ = as $1$ if $index[i]$ is smaller than mid otherwise $0$
- $prefixB[i]$ as prefix sum of $c[i]$
Formally
$$
c[i] = index[i] < mid ? 1 : 0;
prefixB[i] = prefixB[i - 1] + c[i]
$$

Hm, to me it feels a little redundant. If you want to keep it, it would be better in a code block instead of LaTeX.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it a bit to use latex formating, should we get rid of the first part?



To update $r$ as we recurse down, we do the following:
- To know the value of 𝑟 if we recurse left, we use prefixB[r]
- If we recurse right, we use 𝑟 - prefixB[r]

## Solving the second type of query
**Given a range l, r find the k smallest element**

We will determine whether the answer for a given node is on the left or the
right segment.
We can calculate how many times the elements within the segments' ranges appear
in our range $(l, r)$ using our first type of query.
Note that this also works for non-leaf nodes using the following formula:

$$
\texttt{occurrences}(l, r) = r - l
$$
<Info title="Simular">
This is similar to counting how many times a value appears up to index 𝑅 in our previous query. We did this by using the new 𝑅 value at the leaf node. But now, we consider the difference between the updated 𝑅 and 𝐿
crafticat marked this conversation as resolved.
Show resolved Hide resolved
</Info>

Therefore, the occurrences of the left node is

$$
\texttt{left\_occurrences} = prefixB[r] - prefixB[l]
$$

<Info title="Left occurrences">
Note that $\texttt{left\_occurrences}$ is the number of indices between l and r whose value is less than mid

</Info>
crafticat marked this conversation as resolved.
Show resolved Hide resolved

- If $\texttt{left\_occurrences}$ is greater or equal to $k$, it means the $k$-th smallest element is in
the left subtree. Therefore, we update our range and recurse into the left
child
- If $\texttt{left\_occurrences}$ is less than $k$, it means the
$k$-th smallest element is in the right subtree. We adjust k by subtracting
$\texttt{left\_occurrences}$ from $k$, update our range, and recurse into the right child

<Info title="Notice">
Notice we still update $l, r$ accordingly when we go left or right
</Info>

the answer then will be the value of the node we end up on (leaf)

## Implemention
**Time Complexity:** $\mathcal{O}(Q \cdot \log(M))$

<LanguageSection>
<CPPSection>

```cpp
#include <bits/stdc++.h>

using namespace std;

struct Segment {
Segment *left = nullptr, *right = nullptr;
int l, r, mid;
bool children = false;
vector<pair<int, int>> indices; // Index, Value
vector<int> prefixB;

Segment(int l, int r, const vector<pair<int, int>> &indices)
: l(l), r(r), mid((r + l) / 2), indices(indices) {
calculate_b();
}

// Sparse since values can go up to 1e9
void update() {
if (children) { return; }
children = true;
if (r - l > 1) {
// Split the indices for left and right child
vector<pair<int, int>> leftIndices, rightIndices;
crafticat marked this conversation as resolved.
Show resolved Hide resolved
partition_copy(indices.begin(), indices.end(), leftIndices.begin(),
rightIndices.begin(), [this](const pair<int, int> &elem) {
return elem.second < mid;
});

left = new Segment(l, mid, leftIndices);
right = new Segment(mid, r, rightIndices);
}
}

// Calculates the prefix B
void calculate_b() {
crafticat marked this conversation as resolved.
Show resolved Hide resolved
int i = 1;
int j = 0;
prefixB.resize(indices.size() + 1);
for (auto [ind, val] : indices) {
if (val < mid) j++;
prefixB[i++] = j;
}
}

int find_k_smallest(int a, int b, int k) {
update();
if (r - l <= 1) { return l; }

int lb = prefixB[a];
int lr = prefixB[b];
int inLeft = lr - lb; // Amount of values in range (a,b) that are less the mid

if (k <= inLeft) {
return left->find_k_smallest(lb, lr, k); // Appears in left
} else {
return right->find_k_smallest(a - lb, b - lr,
k - inLeft); // Appears in right
}
}
};

int main() {
int n, q;
cin >> n >> q;

vector<pair<int, int>> indices;
for (int i = 0; i < n; ++i) {
int v;
cin >> v;
indices.emplace_back(i, v);
}
Segment seg(0, 1e9 + 2, indices);
crafticat marked this conversation as resolved.
Show resolved Hide resolved

for (int i = 0; i < q; ++i) {
int a, b, k;
cin >> a >> b >> k;
k++;
cout << seg.find_k_smallest(a, b, k) << " ";
}
}
```
</CPPSection>
</LanguageSection>



## Supporting updates

Let's support updates of type:
- change value at index $i$ to $y$

crafticat marked this conversation as resolved.
Show resolved Hide resolved
We can traverse down to the leaf to remove the old element and also traverse down to add the new element.

So what do the updates change?
-
Our indices vector
Our prefix vector

crafticat marked this conversation as resolved.
Show resolved Hide resolved
To change the indices vector, what we can do is, instead of storing a vector, use a set.
Then erasing and adding values becomes easy.
crafticat marked this conversation as resolved.
Show resolved Hide resolved

<IncompleteSection />
On the other hand, To change the prefix vector, since each update could change our prefix vector a lot, we can't maintain just the normal vector. What we could do is use a sparse segment tree.
- erasing and inserting can be done by just setting the value to 0 or 1 at the specific index
- querying for a prefix can be done by querying the segment tree from 0 to $i$
crafticat marked this conversation as resolved.
Show resolved Hide resolved
This approach is not memory efficient and requires a segment tree's implementation.
A more friendly approach would be using an order statistics tree.
Such that querying for a prefix would be equivalent to order_of_key($i$)
crafticat marked this conversation as resolved.
Show resolved Hide resolved

### Problems

Expand Down
36 changes: 36 additions & 0 deletions content/6_Advanced/Wavelet.problems.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,42 @@
}
],
"wavelet": [
{
"uniqueId": "cf-840D",
"name": "Destiny",
"url": "https://codeforces.com/contest/840/problem/D",
"source": "CF",
"difficulty": "Normal",
"isStarred": false,
"tags": ["Wavelet"],
"solutionMetadata": {
"kind": "none"
}
},
{
"uniqueId": "spoj-ILKQUERY2",
"name": "I Love Kd-Trees",
"url": "https://www.spoj.com/problems/ILKQUERY2/",
"source": "SPOJ",
"difficulty": "Normal",
"isStarred": false,
"tags": ["Wavelet"],
"solutionMetadata": {
"kind": "none"
}
},
{
"uniqueId": "coci-20-index",
"name": "2021 - Index",
"url": "https://evaluator.hsin.hr/tasks/HONI202167index/",
"source": "COCI",
"difficulty": "Normal",
"isStarred": false,
"tags": ["Wavelet, Persistent Segtree"],
"solutionMetadata": {
"kind": "none"
}
},
{
"uniqueId": "kattis-easyquery",
"name": "Easy Query",
Expand Down
Binary file added content/6_Advanced/assets/diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading