Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fall back to non-persistent object caching if Redis is unavailable #4

Closed
jtsternberg opened this issue Jun 29, 2015 · 17 comments
Closed
Assignees
Milestone

Comments

@jtsternberg
Copy link

[27-Jun-2015 23:12:44 UTC] PHP Fatal error: Uncaught exception 'RedisException' with message 'Redis server went away' in PATH/object-cache.php:713
[28-Jun-2015 03:10:20 UTC] PHP Fatal error: Uncaught exception 'RedisException' with message 'read error on connection' in PATH/object-cache.php:652

There should be some handling in place for these errors. Ideally, we would be catching these exceptions and dumping them to the error log so that we know when these errors are occurring as well as what the errors are, but the site should simply fall back to connecting to the DB for its info, rather than fatal erring.

@khromov
Copy link

khromov commented Aug 20, 2015

Agree something better than fatal error has to be done, but just returning that the key does not exist isn't a great solution either - WordPress will make thousands of SQL queries for each page load if the Object Cache is broken.

The best solution would be to look at the connection management in phpredis and if that connection fails, default to using the default Object Cache implementation, which is a simple non-persistent array structure.

@joshkoenig
Copy link
Member

Hey @jtsternberg - thanks for filing this. Picking up this issue to get it some love in September.

I can replicate this locally by taking down the redis container, and I agree that we should fall-back to the database as well as logging this to the error log.

@joshkoenig joshkoenig changed the title Does not catch RedisException errors and white-screens the site Fall back to DB if Redis is unavailable Sep 1, 2015
@joshkoenig
Copy link
Member

We should also make the plugin "Pantheon savvy" and let the user know they need to enable it in the dashboard for it to work (if it isn't).

@jtsternberg
Copy link
Author

@joshkoenig curious, what is the status here? (for having it fallback to DB)

@joshkoenig
Copy link
Member

joshkoenig commented Oct 22, 2015 via email

@joshkoenig joshkoenig changed the title Fall back to DB if Redis is unavailable Fall back (to DB?) if Redis is unavailable Oct 26, 2015
@joshkoenig
Copy link
Member

Other option that might be preferable: fall back to APC? Could this be a setting?

@danielbachhuber
Copy link
Contributor

The best solution would be to look at the connection management in phpredis and if that connection fails, default to using the default Object Cache implementation, which is a simple non-persistent array structure.

Looking at this in a bit more depth. Because wp-redis.php/object-cache.php defines the same functions and class name as wp-includes/cache.php, we can't simply load and instantiate the latter.

In #10, I've subclassed Redis so we can fall back to a run-time object cache when PHPRedis is unavailable. However, there's a conceptual problem we need to address:

  1. Redis is available, so WordPress uses it as a persistent object cache.
  2. Redis becomes unavailable, so WordPress falls back to its run-time object cache, adding, updating, and deleting keys.
  3. Redis becomes available again, so WordPress begins to pull now stale and incorrect values from the persistent storage backend.

@danielbachhuber
Copy link
Contributor

@jtsternberg
Copy link
Author

@danielbachhuber your concerns are indeed valid. What if we store the modifed keys in an option (update the option at shutdown) and when persistent cache comes back, update (or delete) those keys? Not super-elegant, but better than getting stuck in a stale-cache feedback loop.

@khromov
Copy link

khromov commented Oct 29, 2015

@danielbachhuber

Redis is available, so WordPress uses it as a persistent object cache.
Redis becomes unavailable, so WordPress falls back to its run-time object cache, adding, updating, and deleting keys.
Redis becomes available again, so WordPress begins to pull now stale and incorrect values from the persistent storage backend.

How about this - if an error is detected and we fall back to the run-time object cache, we won't return to Redis during that request.

Side effects aren't that bad (you get newer data once the fallback kicks in) and everything is nice and tidy once the request is over. (The next request might use Redis if it's back.)

@jtsternberg
Copy link
Author

The side effects come with cache writes, which will do a DB update. The DB has a new value, but Redis doesn't know that. So the next time Redis kicks back on, it will pull back the value from the cache, NOT the updated value in the DB.

@danielbachhuber
Copy link
Contributor

@joshkoenig and I talked about this a bit and came up with an idea:

  1. If WP Redis fails to connect to Redis, then WP Redis will write an option to the database documenting such and then fall back to the internal object cache.
  2. On every page load, WP Redis checks for the presence of this option. If the option exists, and WP Redis is able to reconnect to Redis, the WP Redis emits a flushAll call to Redis and removes the option. The flushAll would effectively ensure Redis won't serve stale results.
  3. If the option doesn't exist and WP Redis can connect to Redis, then WP Redis would just serve the cache from Redis.

#18 is everything but the database write and check. I'm hesitant about a write stampede, and adding another SQL SELECT to every page load. However, I think we can mitigate both by making sure we're checking the primed alloptions cache on reads, and doing a read check before we do a write. Plus, we can add a kill switch for this behavior (falling back to potentially serving stale cache) if someone runs into a situation where it's causing a performance problem.

@jtsternberg
Copy link
Author

That plan is similar to #4 (comment) except instead of flushing individual keys, you're flushing the entire cache, which is better than doing an option write every load where a key is updated/deleted. Seems like a pretty great solution to me. Thanks for the updates!

@joshkoenig
Copy link
Member

Thanks for the input @jtsternberg!

While it's a bummer to have to dump the cache on a the fail-back event, I think this is the only "bulletproof" way to insure cache coherence. Even if it does mean a slower request to repopulate caches, it is highly preferable to downtime or crazy difficult bugs due to cache incoherence.

@danielbachhuber +1 for having a killswitch option. I do not think one more sql select should be a problem. wp_options is well keyed, and the data transferred is minimal, so it should add very little load (mysql actually makes a pretty good key-value store too...).

@khromov
Copy link

khromov commented Oct 31, 2015

Clearing all cache on connection error is the safest route for sure.

From experience on a large multisite network, we get very high database load when object cache gets cleared, so that'd be a potential caveat for this solution.

@joshkoenig
Copy link
Member

@khromov I agree that there may be a case where big sites may want to handle things differently. The implementation will include a CONSTANT that will allow you to opt-out of the flushing problem. This opts you into a cache-coherency problem, but that's your choice! 😄

@joshkoenig joshkoenig changed the title Fall back (to DB?) if Redis is unavailable Fall back to non-persistent object caching if Redis is unavailable Nov 4, 2015
@shmuel-krakower
Copy link

Hi
Just want to share (and hope not to confuse) as I see this issue mentions a similar issue in the memcached plugin @ tollmanz/wordpress-pecl-memcached-object-cache#55

First - I think it is important to keep in mind that once cache is introduced into a system - it is expected that cache might return stale data. This means that it might be nice to try and sync the external cache once it is available again after some downtime period, but the application should deal with it and not persist any objects/data which shouldn't be cached. I think WP is doing that correctly - so you shouldn't be concerned with that case.

Nevertheless, it seems like there is a core WP issue with external caching plugins, where there is a specific place where WP misbehave with external caching plugins which try to fallback to internal non persistant cache.
I have created a defect and a suggested fix to WP core and I suggest you to review it and understand that it might probably break your fallback strategy as well:
https://core.trac.wordpress.org/ticket/33276#ticket

(I should update my fix because it is incomplete but seems like there's no response anyway from the WP team for few months now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants