Scaling the caching system #112

asdine · 2019-05-10T14:16:22Z

Regula is designed to be used used by any kind of program: mobile apps, microservices, monoliths, frontend applications, etc.
Programs can use Regula in two ways:

Using the Eval API

The client runs a synchronous http call to the Regula server in order to evaluate a ruleset. It can pass parameters and specify which version of the ruleset to use. Basically, this can be summed up as the following curl command:

curl "http://localhost:5331/rulesets/some/path?eval&param1=a&param2=b"

By embedding Regula

The program imports Regula as a library, loads a bunch of ruleset definitions from the server, caches that in memory and evaluates everything locally.

The issue with cache prefilling

The second option is great to avoid network round trips but the "loads a bunch of ruleset definitions from the server" is problematic.

Currently, the client can load these rulesets by selecting a prefix:

Every ruleset whose path begins with this prefix will get downloaded, including all the different versions of these rulesets.

Of course, an empty prefix is a valid one, meaning that it is possible to tell the client to download every possible ruleset including their precedent versions.

This obviously doesn't scale at all.

This could be solved by selecting a long prefix to narrow the results, but any node of a tree is a tree itself so it would only help providing that we are sure that node doesn't grow big.

Let's review our requirements

Regula was designed with a set of requirements in mind and I believe that wanting to absolutely comply with all of them will lead to a bad design. Here is the list:

Only one network round trip (to fill the local cache)
Have access to any ruleset regardless of its path in the tree without any additional round trip
Have access to any precedent version of any ruleset regardless of its path in the tree without any additional round trip
Being notified if any new version is created so the program can update its cache

By brainstorming with @tealeg, it appeared to us that we focused too much on making sure every ruleset is available locally just in case you might need it, whereas in fact people actually know what rulesets they need within their program.

Here is a revised list of requirements that in our opinion scales much better, while providing a great experience:

Only one network round trip (to fill the local cache) for selected rulesets
Have access to any ruleset regardless of its path in the tree ~~without any additional round trip~~
Have access to any precedent version of any ruleset regardless of its path in the tree ~~without any additional round trip~~
Being notified if any new version is created so the program can update its cache

With these new requirements, we can provide the following solution.

The solution

In order to provide a solution that works in multiple situations, I will first describe the default behavior then talk about customizations that will lead to satisfying the requirements listed above.

Default client

The Regula client, with its default configuration, will act as a simple HTTP client library and will run an HTTP request every time the program wants to evaluate a ruleset.
The client will allow evaluating any version of any ruleset.

Caching option

With the caching option enabled, any ruleset evaluated for the first time will:

get downloaded instead of being evaluated remotely
cached
evaluated locally

Subsequent calls to the same ruleset will use the cached version

Caching option + Watch option

Same as the previous one but any cached version will get updated if a new version is created in the server.

Prefill mechanism

The program can ask the client to prefill its cache with a defined set of rulesets.

no prefix, complete paths must be provided
only one round trip would be performed to fetch all of the selected rulesets using a Batch API

Solution analysis

Let's explore various scenarios using the solution described above:

Scenario	What to use
I don't care about network round trips	Use the default Regula client or any http client
I want no round trips besides the first one	Use the prefill option to declare all the rulesets you use in your program
I want no round trips besides the first one and I want my cache to be updated if new versions are created in the admin	Use the prefill and the watch option
I'm storing the version of the ruleset used in a database to reuse it again and I want to avoid as much round trips as possible	Use the cache + prefill option. All the latest versions of the rulesets used will be evaluated from the cache. Previous versions will do too as long as the program is not rebooted. If the program is run and expects to use an old version, there will be a cache miss and a network round trip will be necessary to fetch that ruleset version
I'm storing the version of the ruleset used in a database to reuse it again and I want absolutely no other round trips	No solution provided by Regula

Conclusion

I'm certain that by slightly changing the requirements we can provide a scalable solution that still works for our use case.

@drommk @christophe-dufour @genesor does that still work for you?

The text was updated successfully, but these errors were encountered:

yaziine · 2019-05-10T15:21:55Z

I think that for the "Prefill mechanism" we should provide a way to use prefixes.

Let's say that we want a complete node filled by a lot of rulesets, instead of listing them all, what prevents us to retrieve the node entirely?

asdine · 2019-05-10T15:25:35Z

For various reasons:

Why download rulesets you won't use? if you are not using them why bother downloading them?
It's not scalable for the reasons I explained above
It would make us code and maintain a complex API for no good reason (that's the point of this issue, to avoid writing that specific API)

drommk · 2019-05-13T02:54:26Z

I'm good with this approach for a generic lib-level solution.
Probably not an issue, but let's keep in mind that it limits direct discoverability from the client POV though.

asdine · 2019-05-13T09:15:53Z

Probably not an issue, but let's keep in mind that it limits direct discoverability from the client POV though.

Indeed, but I think that's what created this issue in the first place, we mixed serving rulesets for cache purpose with discoverability.
Decoupling them will allow both APIs to scale much better

qmathe · 2019-05-13T14:22:32Z

For the mobile side, the prefill mechanism should be ok.

There is one downside though, it's going to require us to type each ruleset path twice (download + evaluation). We could use a constant to avoid harcoding each path as a string twice. Do you see a way to avoid this or do you consider its an acceptable trade-off in term of APIs?

More generally speaking, I'm not yet convinced we should always avoid prefixes or some other tagging mechanism to indicate which rulesets to download. The main issue I see with prefixes as they exist, is that they force the ruleset tree structure to encode both:

semantic organization
download/cache boundaries

imo splitting the ruleset tree into downloadable/cachable subtrees should not be handled by the existing tree structure as it is. I'm not sure this responsability should be entirely shifted to the client side though. Did you consider tagging rulesets directly when writing them or supporting rulesets appearing under multiple paths? For example, we could have tags like mobile or service names, then on the client side instead of using a prefix we would use one more tags to download/cache rulesets. Each ruleset could be required to have at least one tag.

As a side note, downloading only the latest ruleset versions on mobile sound good enough btw.

drommk · 2019-05-14T04:48:43Z

Did you consider tagging rulesets directly when writing them or supporting rulesets appearing under multiple paths?

The idea is appealing but I think that would be a slippery slope, because in practice you'd end up including consumer logic into your rulesets (eg having "mobile" tags on rulesets)

Again, I think that any heetch-app-oriented optimization should be handed by the gateway, not regula itself.

I agree that the double writing is painful (that's what I called the discoverability issue) but not something we can't live with until we have the need to build heetch-app-oriented optimizations

qmathe · 2019-05-15T14:13:52Z

The idea is appealing but I think that would be a slippery slope, because in practice you'd end up including consumer logic into your rulesets (eg having "mobile" tags on rulesets)

Makes sense.

From an implementation/storage standpoint, I agree that tags should be stored outside the ruleset tree and the tag notion should not exist in the Regula main API. They could be part of the Batch API outlined by Asdine though.

Even if tags are not part of Regula, exposing the ability to tag rulesets in Regula UI editor is what matters imo (rather than introducing a distinct tool or web app).

asdine added this to the v0.7.0 milestone May 10, 2019

asdine mentioned this issue May 22, 2019

New watch api #122

Closed

asdine self-assigned this May 28, 2019

asdine removed their assignment Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling the caching system #112

Scaling the caching system #112

asdine commented May 10, 2019

yaziine commented May 10, 2019

asdine commented May 10, 2019

drommk commented May 13, 2019

asdine commented May 13, 2019

qmathe commented May 13, 2019

drommk commented May 14, 2019

qmathe commented May 15, 2019

Scaling the caching system #112

Scaling the caching system #112

Comments

asdine commented May 10, 2019

Using the Eval API

By embedding Regula

The issue with cache prefilling

Let's review our requirements

The solution

Default client

Caching option

Caching option + Watch option

Prefill mechanism

Solution analysis

Conclusion

yaziine commented May 10, 2019

asdine commented May 10, 2019

drommk commented May 13, 2019

asdine commented May 13, 2019

qmathe commented May 13, 2019

drommk commented May 14, 2019

qmathe commented May 15, 2019