Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle configuration fetching vs. cache race condition #49

Merged
merged 5 commits into from
May 21, 2024

Conversation

aarsilv
Copy link
Contributor

@aarsilv aarsilv commented May 21, 2024

Eppo Internal:
🎟️ Ticket: FF-2111 - Android SDK should initialize with fetched data if it finishes before the cache load
🗨️ Slack Conversation: "...Race condition between cache access/http request?..."

While typically, one would expect file systems to be quicker than HTTP, this is not guaranteed. Especially if emulation is at play, the storage speed is slowed (encrypted, degraded), etc.

If the cache load finishes after the configuration fetch, it is possible it may unintentionally overwrite that newly-obtained configuration with a cached, older version.

This PR tracks when flags are set via fetch so that they are not later overwritten by reading the cache.

It also adds additional logging to help any future debugging.

@@ -99,9 +101,6 @@ static class AssignmentValueTypeAdapter implements JsonDeserializer<AssignmentVa
this.defaultValue = defaultValue;
}

AssignmentValueTypeAdapter() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🪓 Wasn't used

httpClientOverrideField.setAccessible(false);
}
}
setHttpClientOverrideField(mockHttpClient);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up the modification of the private static override variables into a little helper function 🧹

@@ -429,6 +413,7 @@ public void testDifferentCacheFilesPerKey() {

// Pre-seed a different flag configuration for the other API Key
ConfigCacheFile cacheFile2 = new ConfigCacheFile(ApplicationProvider.getApplicationContext(), safeCacheKey(DUMMY_OTHER_API_KEY));
// Set the experiment_with_boolean_variations flag to always return true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is unrelated to the fix in this PR, but useful for that test

Comment on lines +464 to +465
@Test
public void testFetchCompletesBeforeCacheLoad() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test that reproduced the issue (and after the fix, verified it is handled correctly) 💪

@@ -497,22 +515,22 @@ private void waitForPopulatedCache() {
}
}

private void waitForNonNullAssignment() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🪓 Not used

@@ -24,45 +24,50 @@ public ConfigurationRequestor(ConfigurationStore configurationStore, EppoHttpCli
}

public void load(InitializationCallback callback) {
AtomicBoolean cachedUsed = new AtomicBoolean(false);
AtomicBoolean callbackCalled = new AtomicBoolean(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of tracking cache usage, we just track if the callback has been called yet. This way both fetch won't call the callback if the cache load finished first, and the cache load won't call the callback if the fetch finished first.

configurationStore.loadFromCache(new CacheLoadCallback() {
@Override
public void onCacheLoadSuccess() {
cachedUsed.set(true);
if (callback != null) {
if (callback != null && callbackCalled.compareAndSet(false, true)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareAndSet() is atomic and should prevent any race conditions resulting in multiple threads each thinking the other hasn't yet called the callback

throw new IllegalStateException("Cached configuration file has empty flags");
}
if (loadedFromFetchResponse.get()) {
Log.w(TAG, "Configuration already updated via fetch; ignoring cache");
callback.onCacheLoadFail();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this a cache failure. Conceptually it sort of is as the cache was too slow 😛

@@ -72,14 +76,25 @@ public void loadFromCache(CacheLoadCallback callback) {
});
}

protected RandomizationConfigResponse readCacheFile() throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled this out so it can be mocked by tests, but it cleans things up a bit too.

private static EppoHttpClient httpClientOverride = null;
private static ConfigurationStore configurationStoreOverride = null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give the ability for tests to use reflection to inject a configuration store without complicating the public "API" of this class.

public void setFlagsFromResponse(Reader response) {
RandomizationConfigResponse config = gson.fromJson(response, RandomizationConfigResponse.class);
if (config == null || config.getFlags() == null) {
Log.w(TAG, "Flags missing in configuration response");
flags = new ConcurrentHashMap<>();
} else {
flags = config.getFlags();
Log.d(TAG, "Loaded" + flags.size() + "flags from configuration response");
loadedFromFetchResponse.set(true); // Record that flags were set from a response so we don't later clobber them with a slow cache read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like flags could still be overwritten if flags are set here, then in the cached method before line 96 runs, right? if so, we can move this to a line above/before setting flags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, let's move this above!

@aarsilv aarsilv assigned leoromanovsky and aarsilv and unassigned petzel May 21, 2024
@aarsilv aarsilv merged commit af501a7 into main May 21, 2024
1 check passed
@aarsilv aarsilv deleted the aaron/ff-2111/fetch-cache-race-condition branch May 21, 2024 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants