By Simon Karpen, Sr. Technical Consultant
The MySQL Query Cache is one of the most commonly misunderstood and mistuned components of the MySQL database. It was an early feature addition (from when MySQL clearly in the “toy database” market), and it has the potential to cause a significant speedup or slowdown for your workload. Before we get into the details of tuning, it helps to understand what the query cache is (and isn’t).
The MySQL query cache is a very simple, straightforward query-level cache. This means it is caching the results of a specific query, not operating at the table or database level. It is completely separate from the key buffer, InnoDB buffer pool, or other MySQL memory structures. It’s also invalidated at the table level, not the row level.
For our examples, we have a simple table of T-shirts, called T-shirts, with the following columns:
Now, let’s say a user searches for green shirts:
SELECT id FROM tshirts WHERE color=”green”
If the query cache is enabled, MySQL will first check the query cache for this exact query. If this query is in the cache, and the T-shirts table has not changed, MySQL will return the cached result. If the query is not in the cache (or the T-shirts table) has changed, MySQL will run the query and store the result in the cache.
Now, let’s say the next user looks for green shirts in a specific size, and this exact query is not in the query cache:
SELECT id FROM tshirts WHERE color=”green” AND size=”large”
Even if no tables have been updated, this query will not make use of the previous query for green shirts. It is not the exact same query, so MySQL will have to execute this query, and store the result in the query cache. Of course, if another user searches for large green shirts (and the table has not been updated), that query will be served from the cache.
Now, let’s look at another query example. Let’s say you just want to know how many large green shirts are in stock:
SELECT COUNT(id) FROM tshirts WHERE color=”green” AND size=”large”
Even though you just ran a query that accessed all of this data, since your specific query is not in the query cache, it will go to the database. MySQL will of course store the result in the query cache. If you have experience tuning MySQL, you’re probably saying “It’ll be fast anyway, because your indices have already been loaded”.
There are several popular “query cache tuning” web tools, which give a tongue-in-cheek answer that the proper size for the query cache is always zero. While this is true of the vast majority of workloads, there are cases where the query cache can help performance. It is also true that the query cache is less likely to help well designed /well-written applications, but as an operations DBA, you do not always have control of the quality of the queries, application-level caching, or access patterns.
When is the query cache useful?
In short, the query cache can help poorly designed applications. Applications that make repetitive inefficient queries are often helped by the query cache, particularly if the data set is relatively static (think directory type applications). If an application is helped by the query cache, it would almost certainly be better served by a proper caching solution such as Memcached or Redis, managed at the application level. This gives you the advantages of caching, without all of the downsides of the MySQL Query Cache (below!)
Even if you think you have such an application, it is critical to monitor performance with and without the query cache. This is particularly true if the query cache usage appears to be gradually growing (not steady state); low-memory prunes (when the cache removes the least recently used objects) are particularly expensive due to locking.
To figure out the query cache hit rate, simply do:
Qcache_hits / (Qcache_hits + Qcache_inserts)
This does not include all queries covered by Com_select. To calculate an overall hit rate (including queries not cached, so therefore not subject to 100% of the query cache overhead), you would do:
Qcache_hits / (Qcache_hits + Com_select)
A hit rate in the 50–75% range is typical of an application that is helped by the query cache. If your hit rate is significantly lower than 50%, you are almost certainly better off disabling the query cache and using the memory for something more useful such as the InnoDB Buffer Pool.
Best case, how much overhead does the query cache add?
Even if all other factors are optimal, the query cache adds approximately 10% overhead to query processing; this means that the cache must serve over 10% of the query workload from itself just to break even. If queries are unique (within cache lifetime / table-level invalidation windows), a query cache with a minimal hit-rate will cost a 10% performance penalty, plus the opportunity cost of the database server RAM dedicated to the query cache.
What other overhead does the query cache add?
Cache invalidation: anytime a table is written to (INSERT, UPDATE, etc), all queries referencing this table are invalidated. This is true even if the rows impacted by the query are not touched; the query cache does not have any fine-grained invalidation.
Cache pruning: If the query cache is full, the least recently used items will be pruned. This is a blocking operation and will block all threads on the database when it happens. In a specific case, a 32MB query cache on a large memory server has been observed consuming more CPU than the database workload itself.
Locking: Query cache operations are atomic. Especially in modern many-core servers, gaining access to the query cache can result in very significant lock / wait overhead. This is particularly true if the query cache is too small, or the database server is serving a high-transaction workload from fast storage.
Opportunity Cost: What else could you use the memory for? Would you be better served adding it to the InnoDB Buffer Pool? Or use it to run a Memcached to store frequently
accessed slow-changing data?
These overheads combined can, in some cases, add up to substantially more load on the database server than the production workload itself. The query cache was designed in the era of single-core systems with very limited RAM and I/O, and these limitations show on modern multi-core systems, particularly with fast storage.
As an ops DBA, what should I do?
As a DBA, the key is to understand your workload. Hopefully, this article will give you an idea as to whether or not it’s worth benchmarking the query cache. If the query cache is already enabled, measure the hit rate and lowmem prunes, and consider whether or not it makes sense to disable the query cache. If the query cache is off, and your application does not meet the recommended criteria off, you are almost certainly safe leaving the query cache off and focusing on more productive ways to improve application performance.