Categories


Archives


Recent Posts


Categories


In Depth Magento Dispatch: Advanced Rewrites

astorm

Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

No Frills Magento Layout is the only Magento front end book you'll ever need. Get your copy today!

We’re in the middle of a series covering the various Magento sub-systems responsible for routing a URL to a particular code entry point. So far we’ve covered the general front controller architecture and four router objects that ship with the system, as well as the basics of Magento’s two request rewrite systems. Today we’ll be diving deeper into Magento’s database-based request rewrite system. You may want to bone up on parts one, two, three, and four before continuing.

Loading a Rewrite

Last time, we glossed over how a rewrite object would take the two path variables (one slashed, one non-slashed)

/foo/baz/bar.html
/foo/baz/bar.html/    

and use them load a rewrite object’s data. Let’s de-gloss those details. If we look at the definition for loadByRequestPath, (remembering that $path is actually the array with our two string paths)

#File: app/code/core/Mage/Core/Model/Url/Rewrite.php
public function loadByRequestPath($path)
{
    $this->setId(null);
    $this->_getResource()->loadByRequestPath($this, $path);
    $this->_afterLoad();
    $this->setOrigData();
    $this->_hasDataChanges = false;
    return $this;
}

we see a pretty standard loading pattern. The code that ultimately does the loading is (as expected) in the resource model, in a method with the same name

$this->_getResource()->loadByRequestPath($this, $path);

The loadByRequestPath method on the resource model ends up being a bit more complex that a standard model select SQL query, for reasons we’ll explore below.

Custom Query

The first step in loading our rewrite object is to query the database for any records that match our path values and match the current store id, (set earlier on the passed in rewrite $object).

That’s what the first half of the loadByRequestPath is for.

#File: app/code/core/Mage/Core/Model/Resource/Url/Rewrite.php
public function loadByRequestPath(Mage_Core_Model_Url_Rewrite $object, $path)
{

    if (!is_array($path)) {
        $path = array($path);
    }

    $pathBind = array();
    foreach ($path as $key => $url) {
        $pathBind['path' . $key] = $url;
    }
    // Form select
    $adapter = $this->_getReadAdapter();
    $select  = $adapter->select()
        ->from($this->getMainTable())
        ->where('request_path IN (:' . implode(', :', array_flip($pathBind)) . ')')
        ->where('store_id IN(?)', array(Mage_Core_Model_App::ADMIN_STORE_ID, (int)$object->getStoreId()));

    $items = $adapter->fetchAll($select, $pathBind);

This is much simpler than it looks. The query we’re constructing uses the Zend Framework’s object oriented database syntax along with named parameters. If you’re not familiar with the concept, the code above ultimately generates a query string that looks like this (notice :path0 and :path1)

SELECT `core_url_rewrite`.* 
FROM `core_url_rewrite` 
WHERE 
    (request_path IN (:path0, :path1)) 
AND 
    (store_id IN(0, 1))

Then, the select object and the $pathBind array are passed to the fetchAll method. The $pathBind contains our two normalized path information strings, and looks something like this

array
    'path0' => string 'electronics/cell-phones.html' 
    'path1' => string 'electronics/cell-phones.html/'

The values from $pathBind are swapped in for the named paramaters in the query, using the array keys (path0, path1) to swap in the named query parameter tokens (:path0, :path1). These key names were generated at the beginning of the method with the following

#File: app/code/core/Mage/Core/Model/Resource/Url/Rewrite.php
$pathBind = array();
foreach ($path as $key => $url) {
    $pathBind['path' . $key] = $url;
}

A key was generated for each $url, and used to create the $pathBind array. Then, when creating the bind parameters for the query

->where('request_path IN (:' . implode(', :', array_flip($pathBind)) . ')')

we use the same array keys by flipping $pathBind and imploding the results with :.

Next, you may have noticed that in addition to searching for our two request_path variables, we’re also searching for two store ids

(store_id IN(0, 1))

Magento allows you to create different rewrites for different stores, so that explains one of the store ids, but what about the other one? In addition to searching for a rewrite with the store id on the passed in rewrite object $object->getStoreId(), Magento will also always search for items that match the admin store_id (typically 0). This indicates that in addition to providing rewrites for the frontend application, the request rewrite system can provide rewrites for the admin console. It also creates the possibility, although rare, that a rewrite would apply to both the admin console and the frontend. The logic that resolves these conflicts is our next topic.

Resolving the Right Rewrite

This multiple store ids and multiple rewrite path situation creates a problem. Once our query runs and populates the result set $items, we could have up to four different rows returned. After querying, we need a way of prioritizing which of these four items to use.

Magento solves this problem by implementing a complicated set of bitwise logic which implements a priority system, and then runs each row through the algorithm until one matches

#File: app/code/core/Mage/Core/Model/Resource/Url/Rewrite.php
$mapPenalty = array_flip(array_values($path)); // we got mapping array(path => index), lower index - better

$currentPenalty = null;
$foundItem = null;
foreach ($items as $item) {
    $penalty = $mapPenalty[$item['request_path']] << 1 + ($item['store_id'] ? 0 : 1);
    if (!$foundItem || $currentPenalty > $penalty) {
        $foundItem = $item;
        $currentPenalty = $penalty;
        if (!$currentPenalty) {
            break; // Found best matching item with zero penalty, no reason to continue
        }
    }
}

If you’re not up for a quick primer on bitwise operations, feel free to skip ahead to the next section.

Bitwise operators are always an interesting trip in the land of non-compiled programming languages. They come from the early days of programming, and operate directly on the binary representation of a variable. For example, let’s consider the bitwise OR operator (|)

$a = 5;    //expressed as the binary 0101
$b = 3;    //expressed as the binary 0011

$c = $a | $b;

echo $c;

The above bit of code will output the number 7. How do you take 5 and 3 and get 7? Take a look at the binary version of the numbers

 0101   #binary 5
 0011   #binary 3
-----
 0111

A bitwise OR looks directly at each binary column (also known as a “bit” of memory) and will return a new binary number with the columns set true (or 1) where either of original number’s columns was set true.

In the above example, the first column of binary 5 is “1” (0101), and the second column of binary 3 is also “1” (0011). This means the first column of our results is a “1” (1 OR 1 == true). Follow this logic through for the remaining columns, and you arrive at 0111.

The binary number 0111 translates to 7 in decimal, and that’s how 5 and 3 make 7.

In the early days of programming, when the performance of every instruction mattered, bitwise operators allowed programmers to come up with a number of clever tricks to save instructions for the then simple processors vs. more complicated algorithms to do regular math. Wikipedia has a good overview of the topic if you want to learn more.

If you’re using bitwise operators every day (i.e. you’re a C programmer or in school) and are binary inclined, their logic becomes relatively simple. If you’re not dealing with binary math on a regular basis, they become a giant headache. Additionally, in the higher level languages you often lose the performance improvements that are available with the lower level languages, or the performance increases are trivial enough not to warrant the added complexity to you and your team.

In Plain English (or, What Language do they Speak in What?)

That’s all we’re going to say on bitwise operators. If you’re so inclined, picking apart how this particular chunk of code works would be a useful exercise in bitwise shifts, (the << operator)

More directly useful is a plain english explanation of the priority levels, and which items Magento will pick over the others.

Restating the problem, we have a URL string in its natural state (matches the current request)

/electronics/cell-phones.html
/electronics/foo/

and a URL string in it’s unnatural state, with the trailing slashed removed or added (has or does not have a slash, depending on the natural state)

/electronics/cell-phones.html/
/electronics/foo

We also have a real store id, and the admin store id. This gives us four possible rewrites that could be returned: Natural with a Store ID, Natural with the Admin Store ID, Adulterated with a Store ID, Adulterated with the Admin Store ID

When loading a URL rewrite, Magento will favor, in order

  1. A URL in its Natural state, with the Admin Store ID
  2. A URL in its Natural state, with the Non-Admin Store ID
  3. A URL in its adulterated state, with the Admin Store ID set
  4. A URL in its adulterated state, with the Non-Admin Store ID set

This can be somewhat confusing to figure out and trust, as the algorithm used in the core code is is dependent on the order the rows are returned in, and there’s no ORDER BY for the query.

It’s a confusing and confounding state of affairs. Although this code is here to let you be lazy about your trailing slashes, I’d highly recommend you don’t add to the confusion with that laziness. If your team (or the previous team) hasn’t been able to manage that, then just keep the above ordering in mind, and don’t be afraid to drop a version of Rewrite.php into your local code pool with some logging added to suss out why your rewrites aren’t being applied.

Where do Rewrite Objects Come From

So far we’ve treated the database rewrite system as a generalized, system level tool. However, throughout Magento’s life (either by design or engineering expedience), the database rewrite system started to become intimately, and inseparably, intertwined with the shopping cart application. We’ve already seen hints of this with the store id situation mentioned above, but things shift into overdrive when we consider category and product URLs.

The key problem is this. No store owner wants their product or category pages that have URLs which look like this

catalog/category/view/id/8
catalog/product/view/id/16/

They want the category and product name to be in the URL so search engines (i.e. Google) will drive more traffic to these pages. Google rewards sites that use semantic URLs, or put a different way, Google saw a pattern in the early web where sites that used semantic URLs tended to be more relevant, so those sites ranked higher in the search engines. It’s inevitable that rewrites would need to become a core part of the cart offering, and not a stand-alone system. From a system developer’s point of view, this presents a tricky problem because there’s no set list of categories or product names, which means they can’t be incorporated directly into the routing/MVC system. Here’s what Magento did.

When you create a category in the Magento admin console, one of the fields is named URL Key.

If a category has this field set, Magento will automatically create a rewrite for the category landing page, as well as a product URL based on the full category tree path. Product objects have their own URL key as well.

Additionally, if you change the URL Key for a category, not only will Magento create a new set of rewrites for you, it will also use the rewrite’s “option” field to create permanent HTTP 301 redirects from the old pages to the new, attempting to preserve any existing SEO juice. An HTTP status code of 301 is meant to indicate that a web page has moved somewhere else, and you should stop looking for it here. Sort of like a forwarding address. Without these redirects in place, Google treats the moved page as brand new and ranks it accordingly.

That’s what all those extra data properties on the rewrite object are for

array(
    'url_rewrite_id' => string '213' (length=3)
    'store_id' => string '1' (length=1)
    'category_id' => string '25' (length=2)
    'product_id' => string '133' (length=3)
    'id_path' => string 'product/133/25' (length=14)
    'request_path' => string 'electronics/cameras/accessories/universal-camera-case.html' (length=58)
    'target_path' => string 'catalog/product/view/id/133/category/25' (length=39)
    'is_system' => string '1' (length=1)
    'options' => null
    'description' => null
);

The category_id, product_id, and id_path properties are there so Magento can keep track of which rewrite applies to a particular category, product, or both. The is_system property might be more accurately named is_canonical_rewrite_for_category_or_product_category_combo. That is, is_system, is a boolean flag that Magento sets to let itself know what rows are system level rewrites, created by Magento, and currently represent the “main” URL for a particular entity (as opposed to the redirection rewrites, which are also created by the system, but have their is_system flag set to false).

This data is also used to drive the Admin Console’s rewrite UI at

Catalog -> URL Rewrite Management

as well as determine which URL should be used when a programmer uses the Category or Product object’s URL helper methods

Catalog Rewrite Generation Code

The raises the question of where the code for automatically generating these rewrites lives. You might think it lives in the category and product save methods (it doesn’t), or maybe in a post save event (wrong again). Magento’s automatically generated rewrites are managed by the indexing engine. If you browse to

System -> Index Management

you’ll see one of the index processors is named Catalog URL Rewrites. This is the process responsible for ensuring the database request rewrites are up to date, and reflect the information stored with a product or category object. (With apologies, the indexing system is an article series in and of itself, so we’ll be skipping over some of its nuances)

When you run the Catalog URL Rewrites index, it ultimately instantiates a catalog/indexer_url model, and calls its reindexAll method

#File: app/code/core/Mage/Catalog/Model/Indexer/Url.php
public function reindexAll()
{
    Mage::getSingleton('catalog/url')->refreshRewrites();
}

As you can see, “indexing” the catalog urls simply means instantiating a catalog/url model and calling its refreshRewrites method. If we dive into that method

#File: app/code/core/Mage/Catalog/Model/Url.php
public function refreshRewrites($storeId = null)
{
    if (is_null($storeId)) {
        foreach ($this->getStores() as $store) {
            $this->refreshRewrites($store->getId());
        }
        return $this;
    }

    $this->clearStoreInvalidRewrites($storeId);
    $this->refreshCategoryRewrite($this->getStores($storeId)->getRootCategoryId(), $storeId, false);
    $this->refreshProductRewrites($storeId);
    $this->getResource()->clearCategoryProduct($storeId);

    return $this;
}

We can see that, from a high level, the work of the system automatically creating rewrites is, for each store id,

  1. Clearing out old rewrites for deleted products and root categories (clearStoreInvalidRewrites)
  2. Updating rewrites for category pages (refreshCategoryRewrite)
  3. Updating rewrites for product pages (refreshProductRewrites)
  4. Cleaning up rewrites for products no longer in a particular category (clearCategoryProduct)

The specifics of this is left as an exercise for the reader. The important take away here is Magento is constantly updating and refreshing this rewrite table on its own, which means your store may be generating numerous URLs behind the scenes without any explicit action by you. If you’re looking to seriously overhaul how Magento handles URLs, this is the place you’ll need to dig deep into.

Wrap Up

With that, we’ll need to end our journey into the Magento routing system. There’s plenty more here to explode (specifics of auto-generated rewrites, how the indexing system works), but there’s always more to explore with Magento. The way the routing system interacts with the rewrite system, which in turn is slowly being consumed by the needs of Magento’s SEO system, which in turn is managed by the indexing system is a perfect example of how Magento’s various sub-systems interlock into the full system, and how it’s often impossible to know one part of Magento without understanding five others.

Junior and intermediate level developers may know one or two Magento sub-systems well, but if you’re looking for true Magento mastery it’s better to understand how the systems interact, which will allow you to dive in and discover the correct solution to your particular problem in your particular installation.

I hope these five article have peeled back the cracks enough for you to feel more comfortable exploring the routing and rewrite systems on your own, as well as encourage you to develop your own best practices so you can avoid spending time down at this level debugging.

If this series has been in anyway useful I’d encourage you to browse through Pulse Storm’s Magento related products like the Commerce Bug debugging extension, or No Frills Magento Layout, the only layout book you’ll ever need.

If digital products don’t strike your fancy, please consider sending a few dollars my way to encourage future article writing

For those of you who’ve already purchase a product, or donated a few bucks, you have my gratitude. These articles and my work with Magento wouldn’t exist without you.

Originally published September 26, 2011
Series Navigation<< In Depth Magento Dispatch: Rewrites

Copyright © Alan Storm 1975 – 2017 All Rights Reserved

Originally Posted: 26th September 2011