Skip to content
This repository has been archived by the owner on Oct 26, 2024. It is now read-only.

Scale queries to 100 billion tiles. #329

Open
tbenthompson opened this issue Apr 6, 2023 · 1 comment
Open

Scale queries to 100 billion tiles. #329

tbenthompson opened this issue Apr 6, 2023 · 1 comment

Comments

@tbenthompson
Copy link
Member

tbenthompson commented Apr 6, 2023

New thoughts

CHUNKING. CHUNKING. CHUNKING. CHUNKING. CHUNKING. CHUNKING. CHUNKING.
We can track a minimum of the bootstrapped lambda* for every chunk. We can easily select the chunks of the lowest lambda* values.

This is somewhat inefficient because some chunks will contain tiles that don’t need to be refined or deepened. But the loss of efficiency might be in the single digits percent. When we insert chunks, we can group tiles so that chunks will all very likely either need to be refined/deepened or not.

If we simulate 100k tiles in a packet sent to a worker and we split that into 100 chunks so that the tile grouping can be tight (tiles in the group all generally will either need to be refined/deepened or will not need that), then the chunks will 1000 tiles each and the cost of all our queries will drop by a factor of 1000.

Original thoughts

Scaling to 100 billion tiles requires another architectural change to what we're doing.

  • The easier problem is the "tile selection" query.
  • The harder problem is the bias calculation where we need to minimize the bootstrapped lambda* over all the tiles.

Tile selection:

  • Currently, the database queries that select the next batch of tiles (typically 1-5 million tiles) require scanning almost the entire tile list. This seems to start hitting performance barriers in Clickhouse around 2 billion tiles. We need to fix this to scale to bigger problems!
  • Fundamentally, we need a disk-backed priority queue.
  • The queue doesn't need to be exact. Some imprecision in the ordering is fine and might be useful in getting better performance. The queue should be deterministic even if it is inexact.
  • 100 billion tiles with an 8-byte identifier and a 4-byte ordering key is 1.2TB of storage.
  • While that's feasible in-memory on a giant AWS instance, I think it makes more sense to do this on disk and an SSD is fast enough that doing it on disk shouldn't be a barrier.
  • If possible, we should just do this within the existing Clickhouse database.

Minimizing bootstrapped lambda*
The more severe problem is taking the minimum of the bootstrapped lambda* values for each bootstrap. Taking a minimum over 50 different columns is a similar challenge to the ones above and all the same solutions will help. But it's more difficult because of the sheer amount of data: 50 * 8 * 100e9 = 40 TB of data. So the minimum query will require being smart about excluding large subsets of data and caching intermediate results. All while working with a pile of data that doesn’t fit on a single disk… This problem really fits well with having a distributed database. Otherwise, we will need to fight with a lot of shit.

Solutions:

  1. Clever table structuring in Clickhouse. One idea is to have two separate tables. One table of "important" tiles that might be refined in the next few steps and another table of tiles that are likely to be unimportant because they are not anywhere close to being the most important tiles.
  2. Consider other distributed databases that might be more amenable to deletes and updates. DynamoDB comes to mind here. Replacing the database layer is not an insane amount of work.
  3. Embrace the imprecision of the Adagrid process. At every step, it's more important to be doing something useful than to be doing the optimally useful thing. So, having an imprecise calculation of bias and an imprecise tile selection process is okay. At the end of the day, two things matter: 1) having a good guess of the lambda** value that will come out of the final resimulation of the final grid. 2) being deterministic/reproducible.
  • so this suggests that we could just treat adagrid as maxing out at 5 billion tiles or so. When we go above that, we can just loop through the separate adagrid problems in sequence and improve each "subproblem" individually.
  • when we insert tiles into the database, we could group them into chunks of 1000 tiles or so that have generally similar lambda* values. Then, we select groups of tiles by the minimum lambda* of that group. This kind of technique should intuitively reduce the scale of all the queries by about a factor of the chunk size while mildly reducing the efficiency.
  1. I'm sure there are more ideas!!
@tbenthompson tbenthompson changed the title Disk-backed priority queue for Adagrid Scale queries to 100 billion tiles. Apr 6, 2023
@tbenthompson
Copy link
Member Author

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant