Caching - Gotchas & Lessons Learned
It has been said time and again - “There are only two hard things in Computer Science: cache invalidation and naming things”. Having run into both of these problems in my professional career, I figured I could write a post, summarizing the lessons I have learned along the way by seeing and building various caching architectures across many companies, big and small.
Just like threading, caching is easy to code, but often creates more problems than it intends to solve. These problems can arise from - you guessed it - invalidation, sub-par efficiency, inconsistency, and many more. It is also one of my favorite topics for technical interviews :)
Caching - An Oversimplified Introduction
By definition, a cache is a storage which stores the result from a previous data access, in order to make it available more efficiently for future accesses. Cached data usually has a defined time to live, or TTL so it can be evicted, and replaced with more recent state. When data is found in the cache, it is called a hit, and a failure results in a miss. When a miss happens, the underlying data store is accessed, and the result is placed in the cache with a TTL, and the cycle goes on.
Hit ratio (nHits:nMisses
) is a metric used to measure caching efficiency - a higher hit ratio would mean efficient caching, while a lower hit ratio would mean the cache is not really helping as much as it should. This is the case when the cached data is highly mutable, and gets invalidated way too often to be of any value for future requests.
With this introduction, here are the lessons learned, and best practices:
1. Take it easy
Caching is super easy to implement. It also has the charm that attracts many architects and developers, giving a false impression of being a low hanging fruit. However, do not get carried away. Identify the immutable vs. mutable objects in your application, as having a large imbalance between these may make caching a blessing or a curse. Identify the usage patterns around your data. Start small by caching the immutable data, and monitor the performance of the application. Measure the hit ratio before moving on to the more aggressive approaches. Make data-driven decisions.
2. Not all caches are alike
A cache can be in-process (taking the memory space of the application server), or global (out-of-process). For example, EHCache is an in-process cache, while Memcached or Redis are global cache servers. There are pros and cons to these. Some architectures also do a mix-and-match, which is why knowing your data usage patterns is the first step in designing a good caching strategy. The in-process cache will add state to the server, and, if in a cluster, there is a good chance each instance will end up with a different state of the data, causing inconsistency. However, for immutable data, in-process caches work great. As does a simple Map. Global caches hold a shared state of the application, but need extra infrastructure. Global caches can be clustered, and provide a single, consistent view of cached data.
3. Design for High Availability
In-process caches are more available than global caches if we mitigate the consistency issues. Global caches are a different story. In the event of global caches failures, based on the client, the connections will time out, thereby taking up network time and locking threads. A failure will also cause a huge load on the back end, snowballing into increased database contention, possibly rippling across multiple applications which are relying on your application’s API. Highly available caching is implemented via cache replication and/or persistent caches. This replication can happen via shards, and/or in a master/slave type configuration. Couchbase provides replication via Couchbase cluster. Redis provides a highly available Redis Sentinel. Consider these servers instead of inventing one of your own, unless absolutely necessary. Varnish HTTP Cache is an HTTP cache, useful for caching HTTP responses with a built in grace period
, where the cache serves stale data for an amount of time even if the back-end is unavailable.
4. Beware of Cache Hotspots
Hotspots are created when a value tied to key becomes all too popular. An example can be a breaking news story cached in one of the servers. In this event, the global server containing the key being accessed heavily ends up degrading performance for other keys on the same server. This can be avoided via a near/far cache architecture, where a cache server with very low TTL runs on the local instance (localhost), and the application looks up the key in the near server first, and in the event of a miss, reaches out to the far server. The low TTL (few seconds) ensures that the data is not too stale, while the far cache server does not have to get hit that hard in a hotspot situation.
5. Namespace your Keys
Keys are the identifiers against which the values are stored in the cache. Key collisions can have nightmarish results. Have a good naming strategy for the keys, especially when sharing a global cache cluster across multiple application servers. Have a common function to take the application name, object name, and object ID and spit out a key. This is particularly important for multi-tenant servers, where they’d need to cache multiple, identical key names. Tenant A’s person_1234 is not the same as Tenant B’s person_1234. If all the IDs are GUIDs, that can offer some help, but most often the IDs are auto-inc numeric primary keys. The keys are also distributed via a hash functions, usually implemented in the client (like spymemcached). This ensures consistent key distribution across the members of a cache cluster. You may want to write your own hash function if the one provided by the cache provider is not up to the requirements.
6. Do not Update
Do not bother updating the cache on state changes. Just delete the key from the cache, and let the next read take care of caching the value. It may seem like a good idea to update the cache with the fresh data in the event of a state change, but it is not worth the effort. Not only this tends to have race conditions in a clustered environment, in write-heavy scenarios, data consistency becomes an issue as well. Just have your state change methods invalidate the key.
7. Build a secure, auditable, invalidation mechanism
Realistically, no one should update the database directly while bypassing the API or the server code. However, this is an all too common scenario in the real world. This creates a situation where the cache does not hold the right state that the database has. The options are limited - delete the key, or wait for the key to expire. These options may not work if the keys are hashed (vs. human readable like myapp.myperson.1234
), or the TTL is too high. I have seen cache servers having to be bounced after this, causing a snowball effect as the database gets overloaded, akin to a complete failure of a cache server. To avoid this, create an API endpoint that would take basic parameters to form a key, create and hash the key, and delete that key from the cache. Best way is to have a parameterized jenkins job for this operation - which provides security and auditing out of the box.
8. Be aware of the key and value size limits
Different cache servers have different limits. Memcached key names cannot be greater than 250 bytes, and values cannot be larger than 1 MB. Redis has much larger limits, 512 MB for key and value. For memcached, keep this in mind when namespacing. A good idea is to MD5 or hash the key name to ensure consistent and unique key of acceptable length. Similarly, based on the efficiency trade-offs, you may want to compress or change the serialization format of the values as appropriate.
9. HTTP Cache is not the same as Application Cache
While so far we have discussed objets from the database being cached, there are cases where it makes a lot of sense to cache at the HTTP level, i.e. the HTTP response from an API, or an entire HTML response from a webserver. CDNs (Content Delivery Networks) are giant HTTP Caches. This is a also common scenario in edge caching, where the cache sits at the HTTP layer to cache static content with much larger TTLs. In case of APIs, the HTTP GET API URL is used as the key name. If going this route, be aware of the query strings that may have been used to modify the response. If the API supports features like field filtering, projections, sort, etc - the URL based caching may have a lower hit ratio, or inconsistent results. It is a good idea to account for these features while constructing the key. Caching the response for ?expand=address,business
should also result in a hit for ?expand=business,address
.
10. Use HTTP Cache Headers for HTTP Responses
This is where we move caching from the server, to the client. In other words, we can instruct the requester of our APIs or HTTP Content to cache the response at their end. Instead of building our own vernacular for these instructions, use the standard HTTP Spec. The HTTP RFC 7234 describes how HTTP content caching should behave, particularly when the client is a browser. The same directives and headers should also be used for all HTTP responses. For example, the directive no-cache
would instruct the client to never cache the response, while max-age
would instruct the client to safely re-use the response till it expires (as defined by the max-age
value, or the TTL in seconds). Regardless of a webpage or a JSON response from an API, it is a good idea to adhere to HTTP Spec.
Here is an example of cache headers returned by Google -
bash-3.2$ curl --head https://www.google.com
HTTP/1.1 200 OK
Expires: -1
Cache-Control: private, max-age=0
11. Centralize the Cache Interface, and keep it simple
Last but not the least, design all the cache operations to go through a single class instead of spreading it everywhere. This has many benefits, like centralizing the the logic for key name creation, providing vendor independent method for the rest of the application to interact with the cache, and also, optimize the serialization of the objects (like replacing Java Object Serialization with Kyro or protobuf). This also provides a way for cache key lookups and invalidations with human-friendly key names, despite the cache having hashed key names. For instance, in Java, this can be a Singleton
sitting in a commons-jar used across all the applications sharing a cache cluster.
In conclusion, while no solution is wrong, it is best to apply caching to get the biggest bang for the buck, reduce infrastructure overhead, and satisfy the consistency and availability requirements. Caching strategy for a Financial Application will be entirely different than that of a Social Networking App, which will be entirely different than a Content/Media Application. Caching is not a one size fits all proposition. However, I hope that by using the notes above, you can design an efficient, data-driven caching strategy across all tiers of your application.