When Memcached Doesn’t Cut It - Website Caching Part II

11 October 2007 – 7:38 pm

We use memcached on lots of sites. The beautiful thing is, once you’ve got a bunch of servers running memcached, it’s really easy to cache queries and save some serious database hits.

We use it sometimes when we probably shouldn’t. Or at least we should be caching a little differently. Case in point, our costume site, Costumzee.com on which we’ve recently been enjoying a large amount of traffic.

We did a redesign a few weeks ago, and right after the launch, during the busiest times of the day (evening), the database server was grinding to a halt, no more available connections, tmp tables all over, database deadlock, CPU load through the roof. Nightmare.

Here’s the CPU load on that server for the past two weeks. I’ve marked the times of the new design and when I implemented a new caching method.


Liquidweb PIMS - Home

The cause was a newly designed page that had several fairly heavy queries on it. They were all cached in memcache, but it wasn’t really helping. First, there are a lot of unique queries, thousands of them. Second, the data coming back is pretty big — big for storing in RAM anyway. Third, the data on the page changes only once a day, so the data could potentially be cached for up to 24 hours, depending on the time of day.

This situation cries out for a different sort of cache. More permanent memory for hours, not seconds or minutes.

I can’t simply implement page caching on this site as there is user-specific information on every page. So portions of pages would have to be cached. It’s possible, but not something I currently have in my bag of tricks. Further, caching to files is complicated because this site runs on two front-end servers behind a load balancer. If I employ file caching, I’d have to somehow share these files between the two servers. It’s possible with NFS mount, or the like, but again, not something I had quickly at my fingertips.

Since I was in a bit of a panic as traffic was increasing by about 25% per day, I had to go with what I knew: PHP and MySQL.

These queries that were bogging down the server were doing joins across several very large tables. The killer was that they were fulltext searches, sorted by relevance, and this sorting was causing tmp tables to be created. Once the data came back, there really wasn’t THAT much of it. So I implemented a way to cache the query result data structures to a database table designed for the purpose. So, now, once a page is viewed, the data structures (database result objects) are stored in this cache table indexed by the id of the page.

Here’s the trimmed down pseudo code that I use for this. The database is a very simple structure with fields for the primary key, data, and cached data.


$sql = "select * from page_cache where 1 and page_name = '$page_name' ";
$result = db_query($sql);
if ($cache_data = mysql_fetch_object($result)) {
// we have cached data, use it...
$page_cached = 1;
$product_array = unserialize($cache_data->product_list);
}

if (!$page_cached) {
// get the data from the database using the heavy queries
// code omitted. Data ends up in $product_array
// save the data to the cache
$page_cache = serialize($product_array);
$sql = "replace into page_cache set page_name = '$page_name', cache_date = '$now', product_list = '$page_cache'";
db_query($sql);
}

The magic is in the PHP serialize and unserialize commands. These commands will take any complex data structure, like a database result object and put it in a text format that can be stored in a text field and retrieved later. The unserialize command turns this text back into the data structure.

Since my data refreshes once a day, I don’t expire this cached data after a certain amount of time (though I easily could). Instead, I simply wipe it all clean right after the database is updated. By the time I do, I’ve cached tens of thousands of pages, and hundreds of megabytes of data.

And as you can see from the graph above, the server is now barely breaking a sweat, even during the busiest times of day. Further, the days after the “episode” saw traffic much greater than the days that had the problem. In fact, if you look closely, you can see that load is now lower than before the redesign.

So, if you’ve got a dynamic site, keep an eye on your database and server load. And ask yourself the question, do you really need to get fresh data with every page view? The answer to that question is probably “no”, in which case you might want to look at some method of caching.

If you enjoyed this post, please subscribe to my RSS feed

  1. 11 Responses to “When Memcached Doesn’t Cut It - Website Caching Part II”

  2. This is basically how the CMS “Drupal” handles caching.

    What system are you using for that site?

    An alternative to caching the entire page could be to cache the result of one of your funky queries… Simply create an array of result objects and serialize that into an array… Maybe use some kind of MD5 hash of a UserID and PageID/URL?

    Drupal caches things at different levels… Anonymous users can be presented with purely cached pages and the content for anonymous users is usually static - its often authenticated users which get custom pages (even if it is something like changing the “login” link to some text like “Welcome, Joe Bloggs”). On top of that, modules such as Views (which provides a VERY nice API and GUI for generating lists of pages based on certain rules like “all blog posts written by John” or “all video nodes tagged ‘running’”). You define the rules and this module generates a query to get it for you. The actual object the module creates (which determines things like filters, returned fields, output type (page or block), sort order and more) also gets serialized and cached. This way the next time a page is generated by the views module it doesn’t have to do all the hard PHP work of generating the $view object again.

    I suggest you take a look into Drupal. Its not the lightest of CMS’s and its certainly capable of causing server load like you’re experiencing, however it also provides very nice developer API’s to save time.

    By Nicholas Thompson on Oct 12, 2007

  3. Hi Nicholas.

    Our system is built from scratch custom. That’s how we like to roll. :)
    Though I’m sure I could learn a heck of a lot from a system like Drupal.

    By Scott on Oct 12, 2007

  4. Scott,

    We do custom stuff as well. That’s how we roll also. My business partner is a PHP MySql guy. He’ll understand the details better than I do, so I’ll send him the link. Thanks for getting the wheels ticking on things to do.

    Cheers,

    David

    By David on Oct 13, 2007

  5. This is similar to what jpcache does (http://www.jpcache.com/). Jpcache caches the whole rendered page though. Interesting post.

    By Fernando on Oct 14, 2007

  6. Smells like overengineering. If you’re effectively caching the entire page, why not let HTTP manage it? Manage your Expires headers properly, and you can let Apache’s mod_cache handle this in-line (and with no modification push it out to an intermediary front-end HTTP cache or CDN).

    One of the most common mistakes in scaling a content site is not looking at the life cycle of the dynamic source — I’m guessing there may be too many permutations to warrant it, but there’s the alternative angle of content pre-generation. As database contents are refreshed, the really boring solution is to look at refreshing “dynamic” content at the same time so you’re really just serving static content (note this could still be pseudo-dynamic and still have some small amounts of last-minute customization, but the heavy lifting is already done). Even if there’s some wasted effort, its predictable and scheduled on the backend that isn’t affecting your real-time delivery.

    By Kevin on Oct 15, 2007

  7. Hi Kevin,

    I’m sometimes prone to over-engineering for sure.

    In this case, I’m not caching the entire page. A good portion of it is dynamic, from the header with the current user info to the sidebar that has some custom content per user, to any comments entered by users.

    In any case, the number one priority was to do something immediate, and this was something I was able to go from brainstorm to development to production between 1am and 3am after that last spike. (I do have a hard time calling something that took me two hours over-engineered. ;) )

    When we do a major redesign after this costume season is over in a few weeks, I’ll look at a more “boring” solution to cache the data that updates once a day but allow dynamic updates of the truly dynamic stuff.

    thanks for the comment and forcing me to think. :)

    By Scott on Oct 15, 2007

  8. Follow-up… I’ve been working on caching Part III today.

    I took a little time to do some file caching on a few pages that still had some heavy queries on them.

    There are lots of caching how-tos and classes out there, but they focus mainly on caching entire pages. I can’t have that. I just want to cache chunks of the page.

    So, using PHP output buffering and file caches, I’ve come up with a good solution, which I’ll write up soon.

    By Scott on Oct 17, 2007

  9. Why didn’t you just add the query results to memcache with an appropiate ttl? You say you guys already use memcache, it would seem logical to put these results in memcache as well then….

    By Arjen on Nov 16, 2007

  10. Might want to look at this class for Data Base querying and caching.

    http://www.phpclasses.org/browse/package/4222.html

    By EllisGL on Nov 16, 2007

  11. such a difficult topic to fully understand.

    but yes, I think this will save some database hits.

    By dynasty gamers on Dec 28, 2007

  12. Yes it will….and it will lead to me swallow my words and use the recomended plugin

    By Magic on Mar 4, 2008

Post a Comment


Note: Your comment may be delayed for moderation. Sorry for the delay.