Categories


Archives


Recent Posts


Categories


Generating Google Sitemaps in Magento

astorm

Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

No Frills Magento Layout is the only Magento front end book you'll ever need. Get your copy today!

This entry is part 17 of 43 in the series Miscellaneous Magento Articles. Earlier posts include Magento Front Controller, Reinstalling Magento Modules, Clearing the Magento Cache, Magento's Class Instantiation Abstraction and Autoload, Magento Development Environment, Logging Magento's Controller Dispatch, Magento Configuration Lint, Slides from Magento Developer's Paradise, Generated Magento Model Code, Magento Knowledge Base, Magento Connect Role Directories, Magento Base Directories, PHP Error Handling and Magento Developer Mode, Magento Compiler Mode, Magento: Standard OOP Still Applies, and Magento: Debugging with Varien Object. Later posts include IE9 fix for Magento, Magento's Many 404 Pages, Magento Quickies, Commerce Bug in Magento CE 1.6, Welcome to Magento: Pre-Innovate, Magento's Global Variable Design Patterns, Magento 2: Factory Pattern and Class Rewrites, Magento Block Lifecycle Methods, Goodnight and Goodluck, Magento Attribute Migration Generator, Fixing Magento Flat Collections with Chaos, Pulse Storm Launcher in Magento Connect, StackExchange and the Year of the Site Builder, Scaling Magento at Copious, Incremental Migration Scripts in Magento, A Better Magento 404 Page, Anatomy of the Magento PHP 5.4 Patch, Validating a Magento Connect Extension, Magento Cross Area Sessions, Review of Grokking Magento, Imagine 2014: Magento 1.9 Infinite Theme Fallback, Magento Ultimate Module Creator Review, Magento Imagine 2014: Parent/Child Themes, Early Magento Session Instantiation is Harmful, Using Squid for Local Hostnames on iPads, and Magento, Varnish, and Turpentine.

Magento comes bundled with the ability to generate a Google sitemap. Google sitemaps are XML files that tell Google’s webmaster tools where your site’s content is. There’s debate in the professional webmaster community as to how a Google sitemap will or won’t affect your search engine listings, but chances are someone’s going to ask you to generate one, and this article will get you sorted.

Magento’s Google sitemap implementation also offers a simple example of the Magento Inc. approach to object oriented programming.

To use the Google Sitemap feature, you’ll need to first tell Magento you want a site map. This is done via the

Catalog -> Google Sitemap 

menu in the Admin Console.

This UI will let you tell Magento you want a Google sitemap. Click on the Add Sitemap button, and enter a filename, a file path, and select a store view.

Magento allows/requires you to setup an individual sitemap for each store in your system. The file path and filename are combined to create a path from the root of your installation. This path must be writable by the file system.

Click on Save & Generate and Magento will save your sitemap configuration, as well as generate a file at the location you specified above.

Automatic Sitemap Generation

If you’ve setup the Magento maintenance cron job on your system, you can also have Magento generate a sitemap for you on a regular basis. In the Admin Console, browse to

Systme -> Configuration -> Catalog: Google Sitemap -> Generation Settings

These system config setting will allow to you configure how often a Magento sitemap is created. The rest of this article is going to dive into some code, but if your do a lot of sitemaps submitting you should check out Ashley Schroder’s Sitemap Submit extension.

Into the Code

The sitemap generation code is a good introduction to Magento’s object oriented philosophy. You’ve probably seen a lot of one-off sitemap generation scripts that came about something like

  1. OK, we need a sitemap, so we better make a shell script to generate one

  2. First that shell script needs to read a bunch of information from somewhere to figure out what URLs are active in a Magento site.

  3. Next that shell script needs to generate a bunch of information in XML format based on the information I read above in step two

  4. Finally, I need to write that shell script out to the file system somewhere

At first, there’s nothing wrong with the above. It will do the job and generate a sitemap. The problems with this non-object-oriented approach comes later, when someone else wants to generate a Google sitemap, or new types of pages are added that the original script missed. All there is to work with is this one off shell script. Maybe they can require_once it into their project, but all the logic of what a site means (step two) is “trapped” in global level variables and (maybe) functions. It becomes difficult for anyone but the person who wrote that script to use any of that code elsewhere.

Magento’s approach to this (and every other problem) is different. Whenever a Magento programmer approach a problem, they break the problem entities out into Domain Model objects. With the example of Google sitemaps, that means a single Model object to represent a sitemap

Mage_Core_Model_Sitemap

The reason this object is a model isn’t because it’s reading/writing information to/from a database. It’s a model because it models the problem domain of Google sitemaps.

For the developer responsible for creating the sitemap cron job, they don’t need to know anything about Google sitemaps. All they need to know is which method to call on the sitemap. Consider the Magento cron job code

#File: app/code/core/Mage/Sitemap/Model/Observer.php
public function scheduledGenerateSitemaps($schedule)
{
    $errors = array();
    if (!Mage::getStoreConfigFlag(self::XML_PATH_GENERATION_ENABLED)) {
        return;
    }
    $collection = Mage::getModel('sitemap/sitemap')->getCollection();
    foreach ($collection as $sitemap) {
        try {
            $sitemap->generateXml();
        }
        catch (Exception $e) {
            $errors[] = $e->getMessage();
        }
    }

    //...
}    

This code gets a collection of all sitemap objects

$collection = Mage::getModel('sitemap/sitemap')->getCollection();

and then iterates over them calling the generateXml method.

foreach ($collection as $sitemap) {
    try {
        $sitemap->generateXml();
    }
    catch (Exception $e) {
        $errors[] = $e->getMessage();
    }
}    

That’s it. The cron job writer doesn’t need to know anything about a Google sitemap. All the programmer needs to know is what method to call.

On the other site of the contract, the sitemap object implements sitemap generation in the generateXml method.

app/code/core/Mage/Sitemap/Model/Sitemap.php
public function generateXml()
{
    ...
}

This method may change, but as long as it fulfills its contract, the cron job writer never needs to change the cron job code. It can continue to run unmolested.

Long Term vs. Short Term

That’s a key benefit of Magento’s object oriented system. By structuring code into accepted design patterns it becomes easier to split a large project into different components. The person working on the cron job system doesn’t need to worry about the person working on the sitemap system is doing, and neither of them needs to worry about what the person implementing the promotional engine is doing. Then, when it comes times to integrate everyone’s code, the path forward is

  1. Look at the domain objects that have been implemented
  2. Learn what the contracts are for any particular method

When you’re stuck on a particular gnarly bit of Magento code, think like you’re a part of their team. Accept the only unifying thing between Magento’s various sub-systems is the use of domain model objects which perform actions in controllers, and are read from in blocks to generate output. Look at the Model objects used in the system you’re investigating, investigate what their implied contracts are, and go forward from there.

You will never understand what every sub-system in Magento does, but if you learn its core architectural principles, you’ll be able to quickly zero in on whatever bit of functionality you need to understand and/or modify. That’s what separates a good Magento developer from a bad one: being able to use the system as a black box, but knowing how the black box is put together when it won’t do what you want.

Originally published March 14, 2011
Series Navigation<< Magento: Debugging with Varien ObjectIE9 fix for Magento >>