Categories


Archives


Recent Posts


Categories


Fixing Magento Flat Collections with Chaos

astorm

Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

No Frills Magento Layout is the only Magento front end book you'll ever need. Get your copy today!

This entry is part 28 of 43 in the series Miscellaneous Magento Articles. Earlier posts include Magento Front Controller, Reinstalling Magento Modules, Clearing the Magento Cache, Magento's Class Instantiation Abstraction and Autoload, Magento Development Environment, Logging Magento's Controller Dispatch, Magento Configuration Lint, Slides from Magento Developer's Paradise, Generated Magento Model Code, Magento Knowledge Base, Magento Connect Role Directories, Magento Base Directories, PHP Error Handling and Magento Developer Mode, Magento Compiler Mode, Magento: Standard OOP Still Applies, Magento: Debugging with Varien Object, Generating Google Sitemaps in Magento, IE9 fix for Magento, Magento's Many 404 Pages, Magento Quickies, Commerce Bug in Magento CE 1.6, Welcome to Magento: Pre-Innovate, Magento's Global Variable Design Patterns, Magento 2: Factory Pattern and Class Rewrites, Magento Block Lifecycle Methods, Goodnight and Goodluck, and Magento Attribute Migration Generator. Later posts include Pulse Storm Launcher in Magento Connect, StackExchange and the Year of the Site Builder, Scaling Magento at Copious, Incremental Migration Scripts in Magento, A Better Magento 404 Page, Anatomy of the Magento PHP 5.4 Patch, Validating a Magento Connect Extension, Magento Cross Area Sessions, Review of Grokking Magento, Imagine 2014: Magento 1.9 Infinite Theme Fallback, Magento Ultimate Module Creator Review, Magento Imagine 2014: Parent/Child Themes, Early Magento Session Instantiation is Harmful, Using Squid for Local Hostnames on iPads, and Magento, Varnish, and Turpentine.

There’s a number of Magento tasks that, despite four plus years with the platform, I’m still bad at. One task is remembering to test my code for compatibility with the flat category and flat product collection mode.

After shipping code to a client that had a flat category bug (and then quickly fixing it) I had two clear choices. The first was to get serious about methodically testing any code that leaves my computer. The second was to come up with some crazy development methodology that would allow me to continue being lazy.

This article explores both the implementation details of the flat catalog features, and my new solution for ensuring my code is compatible with both.

Trusting the Abstractions

One technique that allows a Magento developer to succeed is a day-to-day blind trust in the abstractions. When you’re working with a lighter-weight framework, you don’t necessarily need this blind trust. When you say

$product = new Product(27);

you’re aware that the simple ActiveRecord-ish ORM is making a query that looks something like this

SELECT * FROM products WHERE id = 27;

However, trying to stay aware of these underlying details in Magento, especially when you’re starting out, is impossible. Instead, trust that Magento’s objects will load the information you need, and only worry about the specifics if things stop behaving as you expect. Over time you’ll start to get an understanding of how Magento’s various sub-systems work, and which parts use ActiveRecord, EAV, or custom SQL.

This approach has served me well, but it’s also part of the reason I keep letting flat catalog bugs into my code.

The Flat Collections

Most models in Magento have a corresponding collection object, and this collection object can be instantiated by calling the model’s getCollection method.

$collection = Mage::getModel('cms/page')
->getCollection()
->addFieldToFilter('title',array('like'=>'%science%'));

Collection objects are used to fetch multiple instances of a particular object. If we could talk to Magento in plain english, that would sound something like

Hey Magento, give me all the CMS page objects whose titles contain the word “science”

The Magento product and category objects have collections

$collection = Mage::getModel('catalog/product')->getCollection();
$collection = Mage::getModel('catalog/category')->getCollection();

However, both these collection objects present a performance problem. The Product objects are EAV objects, and fully loading each one requires multiple SQL queries. The Category objects are also EAV objects, and in addition to that the category collection is implemented as a SQL heavy tree based data structure.

To solve these performance problems, Magento introduced the idea of “flat” data. Reimplementing these models and collections in non-EAV/tree terms would have been a huge engineering effort. Magento’s entire “highly configurable” product model relies on EAV attributes and collections, and the category editing UI would be difficult to recreate without the nested tree features. Even if Magento had taken all this on, a lot of third party code would have stopped working.

Instead, Magento implemented indexers which will periodically query the standard collections and populate flat database tables

catalog_category_flat_store_*
catalog_product_flat_*

These tables have non-normalized product and category data that’s intended to be read only. This allows Magento to fetch category and product data in a single query.

Of course, we need to tell Magento is should do this instead of loading data from the primary EAV tables. Turning on flat catalog mode gives us a performance boost, but at the cost of some functionality, so Magento put that decision in that hands of its users. There are two configuration flags at

System -> Configuration -> Catalog -> Frontend -> Use Flat Catalog Category
System -> Configuration -> Catalog -> Frontend -> Use Flat Catalog Product

Magento will reference these flags (via helper methods) when it instantiates catalog or product objects. If set to Yes, Magento will instantiate flat resource models, and these new resource model classes will reference the flat tables for reading data

In grand Magento tradition, this has been implemented slightly differently by the teams/individuals responsible for each feature. For categories, this is implemented during model construction

#File: app/code/core/Mage/Catalog/Model/Category.php
protected function _construct()
{
    if (Mage::helper('catalog/category_flat')->isEnabled()) {
        $this->_init('catalog/category_flat');
        $this->_useFlatResource = true;
    } else {
        $this->_init('catalog/category');
    }
}

whereas for products it’s implemented during construction of the collection itself

#File: app/code/core/Mage/Catalog/Model/Resource/Product/Collection.php
protected function _construct()
{
    if ($this->isEnabledFlat()) {
        $this->_init('catalog/product', 'catalog/product_flat');
    }
    else {
        $this->_init('catalog/product');
    }
    $this->_initTables();
}

Ignoring internal Magento software architecture politics, what this means is when Magento is running in “flat” mode the category collection objects is a

Mage_Catalog_Model_Resource_Category_Flat_Collection

as opposed to

Mage_Catalog_Model_Resource_Category_Collection

when running in “normal” mode.

Products are a little trickier. There’s no flat collection object — instead the main collection object creates different SQL depending on which mode Magento is running in (complicated by further different SQL depending on a cached loading vs. a normal loading). However, if you call the product collection’s getEntity method

$products   = Mage::getModel('catalog/product')->getCollection();
var_dump(get_class($product->getEntity()));

you’ll see different classes depending on which mode Magento is running in

Mage_Catalog_Model_Resource_Product_Flat
vs.
Mage_Catalog_Model_Resource_Product

As you can see, despite the two different approaches, the factory pattern is leveraged extensively in the flat catalog features.

OOP Gone Wrong

So why do we care? In theory, we shouldn’t need to. As client developers we can continue to use the Magento abstractions regardless of which mode Magento is running in and we’ll get back the information we’re after.

Unfortunately, while it’s clear the original Magento core team excelled at principles of object oriented programming, not all team members were on the same page. If you’re going to do the sort of polymorphism that lets you swap in the class

Mage_Catalog_Model_Resource_Category_Flat_Collection

for the class

Mage_Catalog_Model_Resource_Category_Collection

at runtime, the replacement class should be interface compatible with the original. As many Magento developers learn time and time again, this is not the case.

Here’s a concrete example of what I’m talking about. I was recently working on a feature that required getting the product count for each category. I did some digging and discovered if I loaded the category collection with the following code

$collection = Mage::getModel('catalog/category')
->getCollection()
->setLoadProductCount(true);

the categories would load with a product count. Pleased with myself, I implemented the feature and moved on to my next task.

However, during testing the QA team received the following error

Fatal error: Call to undefined method Mage_Catalog_Model_Resource_Category_Flat_Collection::setLoadProductCount

I had been developing with flat category mode off. Unfortunately, the setLoadProductCount method didn’t exist on the category collection which triggered a fatal error when the QA team tested the site in flat category mode. The two collection objects weren’t interface compatible.

(For developers just entering the field, “QA” stands for quality assurance. In ancient times companies hired teams of people to test code and features before shipping them to clients. A few companies still practice this arcane art.)

While my philosophy of “trusting the abstractions” usually steers me right, in this specific case it steered me wrong, because I assumed the collection object would operate the same in flat vs. non-flat mode.

Solving the Problem — with Chaos

All this brings us full circle. While I quickly fixed the specific problem in my code, there’s still the question of how to fix my bug of always forgetting about these collections.

I could double my billable time by developing in non-flat mode, testing in flat mode, fixing problems, re-testing in non-flat mode, and continuing the cycle until everything works. As much as that sounds like a good idea on paper, in practice it means I’d be burning a lot of cycles on things that weren’t really a problem for the rare instance when they were a problem.

There’s always a full-scale unit/integration/acceptance testing suite, but again the issue of billable time rears it’s ugly, but unavoidable head.

Instead, what I’ve landed on is automating that switching with a little bit of Netflix inspired chaos.

Pulse Storm Chaos

If you’re not familiar with Chaos Monkey, the TL;DR; version is Netflix deliberately introduces code into their production system that replicates “bad things” happening to their server farm. Servers dying, slow responses, etc. This forces their engineering teams to write fault tolerant systems.

While I’d never run this code in production, I decided a similar approach might be useful for dealing with Magento’s plethora of configuration options. The end result was a simple module called Pulse Storm Chaos. (GitHub, Magento Connect Package)

The Chaos module allows you to specify random values for Magento configuration variables in a fields.php configuration file. These values are swapped in at runtime, (the actual persistent database storage is never touched), during the pre-dispatch action controller stage.

If that doesn’t make sense, take a look at fields.php

return array(
    'catalog/frontend/flat_catalog_category'=>'getRandomBoolean',
    'catalog/frontend/flat_catalog_product'=>'getRandomBoolean',
);    

This include file returns an array of key/value pairs. The key is the configuration node, and the value is a valid PHP callback or a method on the pulsestorm_chaos/values model. The above configuration will, on a per request basis, randomly switch your store into flat or non-flat mode.

This may seem like a crazy way to work, but it will quickly surface any assumptions you’re making about flat vs. non-flat collections. Another possible use might be randomly swapping between themes to make sure theme specific problems are surfaced sooner rather than later.

return array(
    'catalog/frontend/flat_catalog_category'=>'getRandomBoolean',
    'catalog/frontend/flat_catalog_product'=>'getRandomBoolean',
    'design/theme/default'=>function(){
        $themes = array('modern','default');
        return $themes[rand(0,1)];
    }        
);    

The above code uses a PHP 5.3+ anonymous function as a callback to set the design/theme/default value. With the above code in place, your store would swap between themes during development.

Wrap Up

Pulse Storm Chaos won’t be for everyone, and may cause more confusion than it helps solves. However, if you’re willing to accept a little chaos in your life, this module can help you spot problems before they get to the QA team or worse, before they get to you customers.

Originally published October 1, 2012
Series Navigation<< Magento Attribute Migration GeneratorPulse Storm Launcher in Magento Connect >>