Reasoning behind choosing the same bucket length for different buckets in adwin #1477

LorenzoCutrupi · 2023-12-18T11:54:02Z

LorenzoCutrupi
Dec 18, 2023

I was looking at the code of adwin, and I noticed something I didn't expect in a method of the adwin_c.pyx (for simplicity I'll put it below):

cdef void _compress_buckets(self):

  cdef:
      unsigned int idx, k
      double n1, n2, mu1, mu2, temp, total12
      Bucket bucket, next_bucket

  bucket = self.bucket_deque[0]
  idx = 0
  while bucket is not None:
      k = bucket.current_idx
      # Merge buckets if there are more than max_buckets
      if k == self.max_buckets + 1:
          try:
              next_bucket = self.bucket_deque[idx + 1]
          except IndexError:
              self.bucket_deque.append(Bucket(max_size=self.max_buckets))
              next_bucket = self.bucket_deque[-1]
          n1 = self._calculate_bucket_size(idx)   # length of bucket 1
          n2 = self._calculate_bucket_size(idx)   # length of bucket 2
          mu1 = bucket.get_total_at(0) / n1       # mean of bucket 1
          mu2 = bucket.get_total_at(1) / n2       # mean of bucket 2

          # Combine total and variance of adjacent buckets
          total12 = bucket.get_total_at(0) + bucket.get_total_at(1)
          temp = n1 * n2 * (mu1 - mu2) * (mu1 - mu2) / (n1 + n2)
          v12 = bucket.get_variance_at(0) + bucket.get_variance_at(1) + temp
          next_bucket.insert_data(total12, v12)
          self.n_buckets += 1
          bucket.compress(2)

          if next_bucket.current_idx <= self.max_buckets:
              break
      else:
          break

      try:
          bucket = self.bucket_deque[idx + 1]
      except IndexError:
          bucket = None
      idx += 1

In particular I have doubts about the lines:

n1 = self._calculate_bucket_size(idx)   # length of bucket 1
n2 = self._calculate_bucket_size(idx)   # length of bucket 2

Why the index is the same if we want to pick length of different buckets?

smastelini · 2023-12-18T12:15:17Z

smastelini
Dec 18, 2023
Maintainer

Hi @LorenzoCutrupi, thanks for reporting. This does indeed look odd. I went to check the original MOA code and the same happens there.

I was not involved in the original port from there, nor the conversion to Cython. I will need to study a bit more to understand the possible reason if there is any.

A possible reason is that ADWIN organizes the buckets by levels and follows an exponential histogram idea where buckets at the same level have the same capacity:

level 0: capacity 1
level 1: capacity 2
level 2: capacity 4
level 3: capacity 8

and so on...

Maybe that is the reasoning but I can't say for sure.

2 replies

LorenzoCutrupi Dec 18, 2023
Author

I also checked the code from an outdated library skmultiflow here and the same thing happens. I don't know if there is a reason for it at this point or the same error has been ported to different libraries lol. Anyway I'm trying to change the index of the second bucket to understand if there's any significant difference

smastelini Dec 18, 2023
Maintainer

Yeap, River's ADWIN indeed comes from skmuktiflow :)

smastelini · 2023-12-18T12:18:19Z

smastelini
Dec 18, 2023
Maintainer

You can also check a pure Python version I implemented a while ago if you want. I did not follow the Java code for this one, just the paper. We considered bringing it to River to replace the Cython version, as none of the current maintainers are very familiar with the current Java-ish code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasoning behind choosing the same bucket length for different buckets in adwin #1477

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Reasoning behind choosing the same bucket length for different buckets in adwin #1477

LorenzoCutrupi Dec 18, 2023

Replies: 2 comments · 2 replies

smastelini Dec 18, 2023 Maintainer

LorenzoCutrupi Dec 18, 2023 Author

smastelini Dec 18, 2023 Maintainer

smastelini Dec 18, 2023 Maintainer

LorenzoCutrupi
Dec 18, 2023

Replies: 2 comments 2 replies

smastelini
Dec 18, 2023
Maintainer

LorenzoCutrupi Dec 18, 2023
Author

smastelini Dec 18, 2023
Maintainer

smastelini
Dec 18, 2023
Maintainer