Magento Config: A Critique and Caching

Like this article? Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

This article is part of a longer series exploring the Magento global configuration object. While this article contains useful stand-alone information, you’ll want to read part 1, part 2, and part 3 of the series to fully understand what’s going on.

Please Note: The specifics of this article assume a Magento 1.6.1 CE system, although the concepts apply to all versions of Magento.

If you were following along with the previous article, you may have noticed a significant compromise/problem/trade-off in the loading of Magento’s system XML configuration variables. We hinted at it ourselves.

This means each individual node at websites/[CODE] has a full copy of every configuration value in the system.

Every time you create a new store view in Magento, you’re adding anywhere between 50KB - 75KB (depending on the modules you have installed) of data to the full global configuration tree. The problem’s even greater when you’re adding new websites to the system. Consider the Magento sample data. It uses a store view for english, french, and german versions of the store. Each new website created will (presumably) need the same three store views, and suddenly you’re adding between 150KB - 225KB of data to the configuration tree.

While this may seem like a trivial amount of data (it fits on a floppy), because of certain implementation details this trivial amount of data can start to create problems under load.

First off, there’s caching. It’s unavoidable that a modular web system will end up using a cache, as loading every module’s configuration from disk on each page load would quickly fail under load. Disk is always the second thing to bottleneck after database throughput. Caching the global configuration saves at least 60 disk reads per request. Caching is a good thing.

However, because PHP is a poor environment for serializing objects directly to memory, this means the global configuration is serialized as an XML string. When the configuration is loaded from cache

#File: lib/Varien/Simplexml/Config.php
$xmlString = $this->_loadCache($this->getCacheId());
$xml = simplexml_load_string($xmlString, $this->_elementClass);
if ($xml) {
    $this->_xml = $xml;
    $this->setCacheSaved(true);
    return true;
}

PHP needs to load the entire string into memory while creating the object, and then the object itself needs to store that data for the lifetime of the request. Even if PHP properly discards the local variables there’s still a split second of time where it will need enough memory to hold the configuration tree twice over.

Even assuming the PHP internals handle this more elegantly than described above, 150KB extra for a single user isn’t that big a deal. But with 10 concurrent users that’s suddenly closer to 1.5MB of data. For 100 concurrent users that’s 15MB of additional data, just for adding a few store views.

Total Configuration Memory Usage

Lets consider the memory usage of the entire global configuration. Create a blank controller action (or other bootstrapped place to run code), clear your cache, and run the following

$xml = Mage::getConfig()->getNode()->asXml();
file_put_contents('/tmp/no-cache.xml', $xml);

What we’re doing here is serializing the entire XML tree as an XML string, and then saving it to a file in the /tmp directory. Let’s take a look at the size of this file

$ ls -lh /tmp/no-cache.xml
-rw-rw-rw-  1 _www  wheel   584K Feb 11 13:54 /tmp/nocache.xml

On my development system that’s over half a meg. This means 2 concurrent users will need 1MB of memory just for the configuration. 200 concurrent users will need a gig, and remember, this is just for the global configuration, it doesn’t take the layout system, or memory for objects loaded from the database, into consideration.

Fortunately, the core team has taken steps to implement some intelligent caching around this, but the fact remains: Magento is a RAM greedy system, and there’s a reason it will never run on a $15/month hosting plan.

Smart Cache Loading

Now we’re going to take a look at some curious behavior. Change the code from above to write out to a file named with-cache.xml

$xml = Mage::getConfig()->getNode()->asXml();
file_put_contents('/tmp/with-cache.xml', $xml);

Make sure caching is enabled on your system and run your code a few times to be sure you’re getting a file saved during a cache hit. Now take a look at your file sizes

$ ls -lh /tmp/*.xml
-rw-rw-rw-  1 _www  wheel   215K Feb 11 13:54 /tmp/with-cache.xml
-rw-rw-rw-  1 _www  wheel   584K Feb 11 13:54 /tmp/no-cache.xml

Somehow there’s less data (215 KB vs. 584 KB) in the tree during a cache hit. What gives?

Next, try running the following code, both with and without a cache hit.

$xml = Mage::getConfig()->getNode();
foreach($xml as $node)
{
    var_dump($node->getName());
}

On my system, without a cache hit (i.e. immediately after clearing cache), I see the following output

string 'global' (length=6)
string 'default' (length=7)
string 'varien' (length=6)
string 'admin' (length=5)
string 'modules' (length=7)
string 'frontend' (length=8)
string 'adminhtml' (length=9)
string 'install' (length=7)
string 'stores' (length=6)
string 'websites' (length=8)
string 'crontab' (length=7)
string 'phoenix' (length=7)

The name of each node of the configuration tree under <global/> is dumped out. However, on the next request (with a cache hit), I see the following.

string 'global' (length=6)
string 'default' (length=7)
string 'varien' (length=6)
string 'modules' (length=7)
string 'frontend' (length=8)
string 'phoenix' (length=7)

What?! Where are all the nodes? We’re missing admin, adminhtml, install, stores, websites, and crontab. This explains the smaller memory footprint, but how can Magento load its system configuration variables without a <stores/> node? If you really want to confuse yourself, try grabbing that node directly

$xml = Mage::getConfig()->getNode('stores');
var_dump($xml->asXml());

With the above code you’ll see the XML for the stores node, even during a cache hit.

string '<config><global><install><date>Wed, 19 Oct 2011 21:19:04 +0000</date></install><resources ...

However the following code exporting the entire global configuration tree

$xml = Mage::getConfig()->getNode();
var_dump($xml->asXml());

will not contain the XML for the stores node.

At this point a sane PHP developer unfamiliar with Magento innards will run away screaming, give up programming, and join a monastery/nunnery.

What’s Going On

So what’s really going on here? Strange voodoo? System bugs?

Neither.

What we’re seeing is Magento taking advantage of wrapping these XML trees in objects, and implementing intelligent caching and uncaching of the XML configuration tree to reduce the amount of RAM needed for each request. The remainder of this article will investigate how Magento stores the global configuration tree to the cache, and how data is intelligently fetched from the cache.

To understand how Magento caches data, you’ll need to understand how most web frameworks handle caching. If you’re unfamiliar with the term, the Wikipedia definition should do

In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster.

Caching is one of the quickest ways to improve performance of your computer program. You do something that takes a long time once, save the result, and the next time you need to do that something you just load the saved result. The cache most familiar to web developers is the browser cache. Your web browser fetches a CSS file once, and then the next time you request the same page your web browser will (should, may, depending) load the CSS from its local cache-storage rather than make another HTTP request.

Caching in Magento (as with most web frameworks), works something like an array/dictionary would. That is, anytime you want to cache a bit of data, you (at minimum)

  1. Think up an ID for the piece of data
  2. Tell the caching system to save that piece of data with your ID

Then, when you want to load something from cache, you say

Hey, caching system, give me the piece of data saved with this ID.

In Magento that looks something like this

#File: app/code/core/Mage/Core/Model/App.php
public function saveCache($data, $id, $tags=array(), $lifeTime=false)
{
    $this->_cache->save($data, $id, $tags, $lifeTime);
    return $this;
}

The Magento application object’s $_cache property contains the PHP object that implements a caching interface. It has a save method that allows you to save data with a certain ID. There’s two additional parameters as well: $tags and $lifeTime. The $tags variable is an array of tag names, which will be assigned to your piece of cache data. Most caching interfaces allow you to interact with tagged data en-masse. For example, you can quickly say something like

delete all cache values tagged with LAYOUT_GENERAL_CACHE_TAG

This is what allows you to selectively delete certain cache types via the Magento admin console.

The $lifeTime paramaters allows you to indicated how long a value should be cached for. You can cache an item for 5 minutes, 5 hours, 5 days, or 500 years. This allows you to use a long interal for things that don’t change often, while ing a short interval for things that might change more frequently.

Loading a value from cache is as simple as passing the ID value to the cache object’s load method.

#File: app/code/core/Mage/Core/Model/App.php
public function loadCache($id)
{
    return $this->_cache->load($id);
}

So that’s a quick “caching for dummies” overview. The key thing to take away is the cache allows you to store a certain value at a certain ID, and then load it later on.

Saving the Configuration to Cache

Lets take a look at how the configuration is saved into cache.

#File: app/code/core/Mage/Core/Model/App.php
protected function _initModules()
{
    if (!$this->_config->loadModulesCache()) {
        $this->_config->loadModules();
        if ($this->_config->isLocalConfigLoaded() && !$this->_shouldSkipProcessModulesUpdates()) {
            Varien_Profiler::start('mage::app::init::apply_db_schema_updates');
            Mage_Core_Model_Resource_Setup::applyAllUpdates();
            Varien_Profiler::stop('mage::app::init::apply_db_schema_updates');
        }
        $this->_config->loadDb();
        $this->_config->saveCache();
    }
    return $this;
}

After we’ve loaded our system configuration variables from the database with $this->_config->loadDb();, Magento is finished loading the entire configuration tree, and saves the value to cache with a call to

$this->_config->saveCache();

Before we trace this method to the configuration cache, let’s consider how a simple implementation of cache saving might happen. We need to cache the entire tree, so it would make sense for us to come up with a cache ID for the global configuration tree, serialize the tree as XML, and then save it to cache. Something like this

Mage::app()->saveCache($xml->getNode()->asXml(), 'global_config');

However, when we take a look at the real thing, we see a method that’s more complicated than a simple key/value save.

#File: app/code/core/Mage/Core/Model/Config.php
public function saveCache($tags=array())
{
    if (!Mage::app()->useCache('config')) {
        return $this;
    }
    if (!in_array(self::CACHE_TAG, $tags)) {
        $tags[] = self::CACHE_TAG;
    }
    $cacheLockId = $this->_getCacheLockId();
    if ($this->_loadCache($cacheLockId)) {
        return $this;
    }

    if (!empty($this->_cacheSections)) {
        $xml = clone $this->_xml;
        foreach ($this->_cacheSections as $sectionName => $level) {
            $this->_saveSectionCache($this->getCacheId(), $sectionName, $xml, $level, $tags);
            unset($xml->$sectionName);
        }
        $this->_cachePartsForSave[$this->getCacheId()] = $xml->asNiceXml('', false);
    } else {
        return parent::saveCache($tags);
    }

    $this->_saveCache(time(), $cacheLockId, array(), 60);
    $this->removeCache();
    foreach ($this->_cachePartsForSave as $cacheId => $cacheData) {
        $this->_saveCache($cacheData, $cacheId, $tags, $this->getCacheLifetime());
    }
    unset($this->_cachePartsForSave);
    $this->_removeCache($cacheLockId);
    return $this;
}

At the risk of ruining the ending, here’s the 10,000 foot view of this method. Rather than save the entire XML tree under one cache ID, Magento splits the entire global configuration tree into sections. These sections are defined by the keys of the following array property

#File: app/code/core/Mage/Core/Model/Config.php
protected $_cacheSections = array(
    'admin'     => 0,
    'adminhtml' => 0,
    'crontab'   => 0,
    'install'   => 0,
    'stores'    => 1,
    'websites'  => 0
);

Each cache “section” corresponds to a node immediately under the top level <config/> node. For each key in this array, Magento will create a cache ID based on that name

'config_global_admin'
'config_global_adminhtml'
'config_global_crontab'
etc..

and then pull out that part of the global configuration tree to save separately. You can see that happening here

#File: app/code/core/Mage/Core/Model/Config.php
$xml = clone $this->_xml;
foreach ($this->_cacheSections as $sectionName => $level) {
    $this->_saveSectionCache($this->getCacheId(), $sectionName, $xml, $level, $tags);
    unset($xml->$sectionName);
}
$this->_cachePartsForSave[$this->getCacheId()] = $xml->asNiceXml('', false);

The _saveSectionCache method is where the actual work happens.

#File: app/code/core/Mage/Core/Model/Config.php
protected function _saveSectionCache($idPrefix, $sectionName, $source, $recursionLevel=0, $tags=array())
{
    if ($source && $source->$sectionName) {
        $cacheId = $idPrefix . '_' . $sectionName;
        if ($recursionLevel > 0) {
            foreach ($source->$sectionName->children() as $subSectionName => $node) {
                $this->_saveSectionCache(
                    $cacheId, $subSectionName, $source->$sectionName, $recursionLevel-1, $tags
                );
            }
        }
        $this->_cachePartsForSave[$cacheId] = $source->$sectionName->asNiceXml('', false);
    }
    return $this;
}

Here you can see Magento generate a $cacheId

#File: app/code/core/Mage/Core/Model/Config.php
$cacheId = $idPrefix . '_' . $sectionName;

and then generate the XML tree as a string to save, stashing it in an internal property array ($this->_cachePartsForSave) for later.

#File: app/code/core/Mage/Core/Model/Config.php
$this->_cachePartsForSave[$cacheId] = $source->$sectionName->asNiceXml('', false);

Then, back up in saveCache, the part of the tree we just saved is unset

#File: app/code/core/Mage/Core/Model/Config.php
unset($xml->$sectionName);

We can safely remove this section of the config, since we pulled out a copy which will be added to cache later.

Once each cache section has been stashed away in $_cachePartsForSave, we take the remainder of the tree and save it with the main cache ID (global_config)

#File: app/code/core/Mage/Core/Model/Config.php
$this->_cachePartsForSave[$this->getCacheId()] = $xml->asNiceXml('', false);

There’s one more import thing to cover here. The $_cacheSections property contains an array whose keys define the cache sections for the configuration. However, there’s also the value to consider

#File: app/code/core/Mage/Core/Model/Config.php
protected $_cacheSections = array(
    'admin'     => 0,
    'adminhtml' => 0,
    'crontab'   => 0,
    'install'   => 0,
    'stores'    => 1,
    'websites'  => 0
);

This value is “0” for all the cache sections except for store.

If we consider our foreach again

#File: app/code/core/Mage/Core/Model/Config.php
foreach ($this->_cacheSections as $sectionName => $level) {
    $this->_saveSectionCache($this->getCacheId(), $sectionName, $xml, $level, $tags);

We can see this value is being iterated over as the variable $level and passed in as the fourth paramater of the _saveSectionCache method. If we look at _saveSectionCache again,

#File: app/code/core/Mage/Core/Model/Config.php
protected function _saveSectionCache($idPrefix, $sectionName, $source, $recursionLevel=0, $tags=array())
{
    if ($source && $source->$sectionName) {
        $cacheId = $idPrefix . '_' . $sectionName;
        if ($recursionLevel > 0) {
            foreach ($source->$sectionName->children() as $subSectionName => $node) {
                $this->_saveSectionCache(
                    $cacheId, $subSectionName, $source->$sectionName, $recursionLevel-1, $tags
                );
            }
        }
        $this->_cachePartsForSave[$cacheId] = $source->$sectionName->asNiceXml('', false);
    }
    return $this;
}

we can see this is the $recursionLevel variable, which triggers a conditional block we ignored earlier. The $recursionLevel value allows you to specify how deeply Magento will split up the cache tree. Since the stores cache section is defined with a recursion level of 1, this means cache tags with names like

'config_global_stores_default'
'config_global_stores_admin'
'config_global_stores_german'
'config_global_stores_french'

would be generated for the sample data system. That is, with the <stores/> node, Magento will persist each sub-node of <store/>

stores/default/*
stores/admin/*
etc ...

as a separate cache entry.

Once all this is done and the global configuration tree has been sliced up into the $_cachePartsForSave array property, Magento will go through each section and actually save them to the caching engine.

#File: app/code/core/Mage/Core/Model/Config.php
foreach ($this->_cachePartsForSave as $cacheId => $cacheData) {
    $this->_saveCache($cacheData, $cacheId, $tags, $this->getCacheLifetime());
}

The _saveCache method is a wrapper to the Magento application object’s saveCache method

#File: app/code/core/Mage/Core/Model/Config.php
protected function _saveCache($data, $id, $tags=array(), $lifetime=false)
{
    return Mage::app()->saveCache($data, $id, $tags, $lifetime);
}

Loading Cache Values

So that’s saving the cache, but what about cache loading? When Magento loads the configuration from cache, it only loads the global_config cache ID (fetched with a call to $this->getCacheId())

#File: app/code/core/Mage/Core/Model/Config.php
public function loadCache()
{
    if (!$this->validateCacheChecksum()) {
        return false;
    }

    $xmlString = $this->_loadCache($this->getCacheId());
    $xml = simplexml_load_string($xmlString, $this->_elementClass);
    if ($xml) {
        $this->_xml = $xml;
        $this->setCacheSaved(true);
        return true;
    }

    return false;
}

That’s why our earlier file size check returned a smaller value when the configuration was being loaded from cache. The XML tree only contains nodes that were not sliced off as cache sections. Of course, this still raises the question

What happens when we attempt to retrieve one of the configuration sections that’s saved separately

That’s to say, when we ask Magento for the admin node, or the stores node,

$config->getNode('admin');
$config->getNode('stores');
$config->getNode('stores/default');

how can it return the values if they haven’t been loaded?

We’d be right to be confused if getNode were a simple dumb wrapper to the bare XML object, but fortunately for us, it’s an intelligent wrapper to the bare XML object. Let’s take a look at getNode’s definition

#File: lib/Varien/Simplexml/Config.php
public function getNode($path=null, $scope='', $scopeCode=null)
{
    //... altering path to account for $scope and $codeCode ...

    /**
     * Check path cache loading
     */
    if ($this->_useCache && ($path !== null)) {
        $path   = explode('/', $path);
        $section= $path[0];
        if (isset($this->_cacheSections[$section])) {
            $res = $this->getSectionNode($path);
            if ($res !== false) {
                return $res;
            }
        }
    }
    return  parent::getNode($path);
}

In the conditional, we see Magento will look at the first portion ($path[0]) of any configuration path that’s passed in. If this path matches one of our predefined cache sections (the sections we used to slice up the configuration), Magento will load the node via the getSectionNode instead of the parent’s getNode method. If we take a look at that method definition

#File: lib/Varien/Simplexml/Config.php
public function getSectionNode($path)
{
    $section    = $path[0];
    $config     = $this->_getSectionConfig($path);
    $path       = array_slice($path, $this->_cacheSections[$section]+1);
    if ($config) {
        return $config->descend($path);
    }
    return false;
}

we can see it wraps to a call to the _getSectionConfig method,

#File: lib/Varien/Simplexml/Config.php
protected function _getSectionConfig($path)
{
    $section = $path[0];
    if (!isset($this->_cacheSections[$section])) {
        return false;
    }
    $sectioPath = array_slice($path, 0, $this->_cacheSections[$section]+1);
    $sectionKey = implode('_', $sectioPath);

    if (!isset($this->_cacheLoadedSections[$sectionKey])) {
        Varien_Profiler::start('init_config_section:' . $sectionKey);
        $this->_cacheLoadedSections[$sectionKey] = $this->_loadSectionCache($sectionKey);
        Varien_Profiler::stop('init_config_section:' . $sectionKey);
    }

    if ($this->_cacheLoadedSections[$sectionKey] === false) {
        return false;
    }
    return $this->_cacheLoadedSections[$sectionKey];
}

which will load a particular cache section with the _loadSectionCache method. This method

#File: lib/Varien/Simplexml/Config.php
protected function _loadSectionCache($sectionName)
{
    $cacheId = $this->getCacheId() . '_' . $sectionName;
    $xmlString = $this->_loadCache($cacheId);

    /**
     * If we can't load section cache (problems with cache storage)
     */
    if (!$xmlString) {
        $this->_useCache = false;
        $this->reinit($this->_options);
        return false;
    } else {
        $xml = simplexml_load_string($xmlString, $this->_elementClass);
        return $xml;
    }
}

is the one that does the actual loading of the saved configuration fragment. The XML configuration is loaded from the cache

#File: lib/Varien/Simplexml/Config.php
$xmlString = $this->_loadCache($cacheId);

and then loaded into an XML object that’s returned and used in the getNode configuration request.

#File: lib/Varien/Simplexml/Config.php
$xml = simplexml_load_string($xmlString, $this->_elementClass);
return $xml;

The end result of this configuration splitting is the solution to the perviously discussed problem of out of control configuration growth. While it’s true a full configuration tree will still grow out of control, with this technique a full configuration tree is rarely loaded into memory for each request. Instead, only the parts that are needed for a particular request are loaded, which keeps the memory footprint down.

To an end-programmer user, the change is seamless. The getNode method behaves exactly the same, even if the implementation details are different.

This doesn’t fully alleviate Magento’s hunger for RAM, but it does prevent it from becoming a Unicron/Galactus level threat.

Trade Off

Of course, this solution doesn’t come without some tradeoffs. The first, and worst, and too often overlooked in any startup created code-base, is complexity. It’s not that the concepts here are particularly hard to follow, it’s that they’re non-obvious for anyone who needs to do some customizations beyond the getStoreConfig method. The seemingly weird behavior due to this caching scheme makes it less likely that any third party (or even first party) developers will want anything to do with this part of the codebase.

This multiplies the effects of the second tradeoff, which is reliability. There’s a bug in this implementation that folks familiar with caching will have already noticed.

One important rule of caching is this: Always assume a cache hit will fail, and never rely on a cache save having worked. That’s because the cache server may need to expire things before their specified end-of-life to clear up space for future cache saves, or a caching system may be distributed across multiple servers and one of those servers may be reset at any time.

The trouble with splitting the cache across multiple IDs is it’s possible that the main cache ID

config_global

will be present in the cache storage and the configuration object will load property, but that one of the other cache IDs will not be there

config_global_stores
config_global_stores_default
config_global_stores_admin

When a piece of Magento client code attempts to load a node in one of these cache sections, Magento will return false, causing a cascade of failure from this non-atomic system behavior. Although not common, you will see posts on internet forums from users reporting mysterious vanishing cache section.

What’s interesting is the Magento core team seems to know about this problem. If we look at the cache loading code again.

#File: lib/Varien/Simplexml/Config.php
$cacheId = $this->getCacheId() . '_' . $sectionName;
$xmlString = $this->_loadCache($cacheId);

/**
 * If we can't load section cache (problems with cache storage)
 */
if (!$xmlString) {
    $this->_useCache = false;
    $this->reinit($this->_options);
    return false;
} else {
    $xml = simplexml_load_string($xmlString, $this->_elementClass);
    return $xml;
}

we see that the Magneto core code checks if $xmlString contains a returned cache hit, along with a comment hinting that this code is here to deal with failures in cache storage. Magento even kicks off a re-initialization of the configuration tree after detecting a cache hit failure

#File: lib/Varien/Simplexml/Config.php
$this->_useCache = false;
$this->reinit($this->_options);
return false;

but it also returns false for the node request, which will certainly make your system do something completely unexpected.

Wrap Up

While Magento should applauded for their efforts in combatting their early performance problems, the decision to go with a native PHP XML based configuration system is one mistake in the early development of the system that we’re still living with today, and will likely be living with for as long as there is a Magento.

In a void, the development of a configuration system that maintains the current interface without loading the entire tree into memory for each request would be possible, but because there’s no explicit (or even implicit) public or private interfaces for the PHP configuration API, there’s a sea of third party code which assume the global configuration is, and always will be, stored as native PHP XML objects.

This means the Magento core team is stuck between supporting an early design decision, or causing massive failure in community modules and existing integrations. Rather than solve this (and other) performance problem with software engineering, Magento used a business tact and partnered with Rackspace, throwing old school physical servers at the problem.

While this sort of solution is never satisfying for a programmer, sometimes it’s the smarter business choice. Knowing when and where to make these sorts of choices is a key aspect of running your business’s technology, whether you’re a internet behemoth, savvy startup, or just a small business riding the waves.

This concludes the series on the loading of Magento’s global configuration. If you’re found it at all useful, please consider purchasing Commerce Bug or No Frills Magento Layout from the Pulse Storm store. Beyond supporting my Magento tutorials, if you’re doing professional Magento work these products will pay for themselves the first time you need to debug your system.

For those who have already purchased a Pulse Storm product, thank you for your support.

Originally published February 13, 2012
blog comments powered by Disqus