Fall back to non-persistent object caching if Redis is unavailable #4

jtsternberg · 2015-06-29T16:44:38Z

[27-Jun-2015 23:12:44 UTC] PHP Fatal error: Uncaught exception 'RedisException' with message 'Redis server went away' in PATH/object-cache.php:713
[28-Jun-2015 03:10:20 UTC] PHP Fatal error: Uncaught exception 'RedisException' with message 'read error on connection' in PATH/object-cache.php:652

There should be some handling in place for these errors. Ideally, we would be catching these exceptions and dumping them to the error log so that we know when these errors are occurring as well as what the errors are, but the site should simply fall back to connecting to the DB for its info, rather than fatal erring.

khromov · 2015-08-20T22:55:24Z

Agree something better than fatal error has to be done, but just returning that the key does not exist isn't a great solution either - WordPress will make thousands of SQL queries for each page load if the Object Cache is broken.

The best solution would be to look at the connection management in phpredis and if that connection fails, default to using the default Object Cache implementation, which is a simple non-persistent array structure.

joshkoenig · 2015-09-01T21:56:10Z

Hey @jtsternberg - thanks for filing this. Picking up this issue to get it some love in September.

I can replicate this locally by taking down the redis container, and I agree that we should fall-back to the database as well as logging this to the error log.

joshkoenig · 2015-10-22T18:52:44Z

We should also make the plugin "Pantheon savvy" and let the user know they need to enable it in the dashboard for it to work (if it isn't).

jtsternberg · 2015-10-22T21:31:28Z

@joshkoenig curious, what is the status here? (for having it fallback to DB)

joshkoenig · 2015-10-22T21:55:49Z

Should be getting eyes on it in the next two weeks.

joshkoenig · 2015-10-26T20:11:05Z

Other option that might be preferable: fall back to APC? Could this be a setting?

danielbachhuber · 2015-10-28T23:45:08Z

The best solution would be to look at the connection management in phpredis and if that connection fails, default to using the default Object Cache implementation, which is a simple non-persistent array structure.

Looking at this in a bit more depth. Because wp-redis.php/object-cache.php defines the same functions and class name as wp-includes/cache.php, we can't simply load and instantiate the latter.

In #10, I've subclassed Redis so we can fall back to a run-time object cache when PHPRedis is unavailable. However, there's a conceptual problem we need to address:

Redis is available, so WordPress uses it as a persistent object cache.
Redis becomes unavailable, so WordPress falls back to its run-time object cache, adding, updating, and deleting keys.
Redis becomes available again, so WordPress begins to pull now stale and incorrect values from the persistent storage backend.

danielbachhuber · 2015-10-29T00:08:56Z

Related tollmanz/wordpress-pecl-memcached-object-cache#55

jtsternberg · 2015-10-29T18:05:04Z

@danielbachhuber your concerns are indeed valid. What if we store the modifed keys in an option (update the option at shutdown) and when persistent cache comes back, update (or delete) those keys? Not super-elegant, but better than getting stuck in a stale-cache feedback loop.

khromov · 2015-10-29T21:58:48Z

@danielbachhuber

Redis is available, so WordPress uses it as a persistent object cache.
Redis becomes unavailable, so WordPress falls back to its run-time object cache, adding, updating, and deleting keys.
Redis becomes available again, so WordPress begins to pull now stale and incorrect values from the persistent storage backend.

How about this - if an error is detected and we fall back to the run-time object cache, we won't return to Redis during that request.

Side effects aren't that bad (you get newer data once the fallback kicks in) and everything is nice and tidy once the request is over. (The next request might use Redis if it's back.)

jtsternberg · 2015-10-29T22:42:44Z

The side effects come with cache writes, which will do a DB update. The DB has a new value, but Redis doesn't know that. So the next time Redis kicks back on, it will pull back the value from the cache, NOT the updated value in the DB.

danielbachhuber · 2015-10-30T21:09:33Z

@joshkoenig and I talked about this a bit and came up with an idea:

If WP Redis fails to connect to Redis, then WP Redis will write an option to the database documenting such and then fall back to the internal object cache.
On every page load, WP Redis checks for the presence of this option. If the option exists, and WP Redis is able to reconnect to Redis, the WP Redis emits a flushAll call to Redis and removes the option. The flushAll would effectively ensure Redis won't serve stale results.
If the option doesn't exist and WP Redis can connect to Redis, then WP Redis would just serve the cache from Redis.

#18 is everything but the database write and check. I'm hesitant about a write stampede, and adding another SQL SELECT to every page load. However, I think we can mitigate both by making sure we're checking the primed alloptions cache on reads, and doing a read check before we do a write. Plus, we can add a kill switch for this behavior (falling back to potentially serving stale cache) if someone runs into a situation where it's causing a performance problem.

jtsternberg · 2015-10-30T21:14:10Z

That plan is similar to #4 (comment) except instead of flushing individual keys, you're flushing the entire cache, which is better than doing an option write every load where a key is updated/deleted. Seems like a pretty great solution to me. Thanks for the updates!

joshkoenig · 2015-10-31T00:38:56Z

Thanks for the input @jtsternberg!

While it's a bummer to have to dump the cache on a the fail-back event, I think this is the only "bulletproof" way to insure cache coherence. Even if it does mean a slower request to repopulate caches, it is highly preferable to downtime or crazy difficult bugs due to cache incoherence.

@danielbachhuber +1 for having a killswitch option. I do not think one more sql select should be a problem. wp_options is well keyed, and the data transferred is minimal, so it should add very little load (mysql actually makes a pretty good key-value store too...).

khromov · 2015-10-31T01:22:33Z

Clearing all cache on connection error is the safest route for sure.

From experience on a large multisite network, we get very high database load when object cache gets cleared, so that'd be a potential caveat for this solution.

joshkoenig · 2015-11-04T19:32:12Z

@khromov I agree that there may be a case where big sites may want to handle things differently. The implementation will include a CONSTANT that will allow you to opt-out of the flushing problem. This opts you into a cache-coherency problem, but that's your choice! 😄

shmuel-krakower · 2015-11-05T10:11:05Z

Hi
Just want to share (and hope not to confuse) as I see this issue mentions a similar issue in the memcached plugin @ tollmanz/wordpress-pecl-memcached-object-cache#55

First - I think it is important to keep in mind that once cache is introduced into a system - it is expected that cache might return stale data. This means that it might be nice to try and sync the external cache once it is available again after some downtime period, but the application should deal with it and not persist any objects/data which shouldn't be cached. I think WP is doing that correctly - so you shouldn't be concerned with that case.

Nevertheless, it seems like there is a core WP issue with external caching plugins, where there is a specific place where WP misbehave with external caching plugins which try to fallback to internal non persistant cache.
I have created a defect and a suggested fix to WP core and I suggest you to review it and understand that it might probably break your fallback strategy as well:
https://core.trac.wordpress.org/ticket/33276#ticket

(I should update my fix because it is incomplete but seems like there's no response anyway from the WP team for few months now)

joshkoenig changed the title ~~Does not catch RedisException errors and white-screens the site~~ Fall back to DB if Redis is unavailable Sep 1, 2015

joshkoenig mentioned this issue Sep 1, 2015

Retry on 'Redis server went away' #7

Closed

joshkoenig assigned danielbachhuber Oct 26, 2015

joshkoenig changed the title ~~Fall back to DB if Redis is unavailable~~ Fall back (to DB?) if Redis is unavailable Oct 26, 2015

joshkoenig added the priority:high label Oct 28, 2015

danielbachhuber mentioned this issue Oct 28, 2015

Better handle when Redis is unavailable or intermittent #10

Merged

3 tasks

This was referenced Oct 30, 2015

Split PHP_REDIS=disabled into a separate build #17

Merged

Better handling when Redis is enabled, but unavailable or gone away #18

Merged

danielbachhuber modified the milestone: v0.2.0 Oct 30, 2015

This was referenced Nov 3, 2015

Trigger an error when Redis goes away and we reconnected #21

Merged

Force cache flush if Redis went away #22

Merged

joshkoenig changed the title ~~Fall back (to DB?) if Redis is unavailable~~ Fall back to non-persistent object caching if Redis is unavailable Nov 4, 2015

joshkoenig closed this as completed in #22 Nov 12, 2015

dbarczak1 mentioned this issue Oct 27, 2021

WP Redis Error #334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fall back to non-persistent object caching if Redis is unavailable #4

Fall back to non-persistent object caching if Redis is unavailable #4

jtsternberg commented Jun 29, 2015

khromov commented Aug 20, 2015

joshkoenig commented Sep 1, 2015

joshkoenig commented Oct 22, 2015

jtsternberg commented Oct 22, 2015

joshkoenig commented Oct 22, 2015 via email

joshkoenig commented Oct 26, 2015

danielbachhuber commented Oct 28, 2015

danielbachhuber commented Oct 29, 2015

jtsternberg commented Oct 29, 2015

khromov commented Oct 29, 2015

jtsternberg commented Oct 29, 2015

danielbachhuber commented Oct 30, 2015

jtsternberg commented Oct 30, 2015

joshkoenig commented Oct 31, 2015

khromov commented Oct 31, 2015

joshkoenig commented Nov 4, 2015

shmuel-krakower commented Nov 5, 2015

Fall back to non-persistent object caching if Redis is unavailable #4

Fall back to non-persistent object caching if Redis is unavailable #4

Comments

jtsternberg commented Jun 29, 2015

khromov commented Aug 20, 2015

joshkoenig commented Sep 1, 2015

joshkoenig commented Oct 22, 2015

jtsternberg commented Oct 22, 2015

joshkoenig commented Oct 22, 2015 via email

joshkoenig commented Oct 26, 2015

danielbachhuber commented Oct 28, 2015

danielbachhuber commented Oct 29, 2015

jtsternberg commented Oct 29, 2015

khromov commented Oct 29, 2015

jtsternberg commented Oct 29, 2015

danielbachhuber commented Oct 30, 2015

jtsternberg commented Oct 30, 2015

joshkoenig commented Oct 31, 2015

khromov commented Oct 31, 2015

joshkoenig commented Nov 4, 2015

shmuel-krakower commented Nov 5, 2015