Categories


Archives


Recent Posts


Categories


Magento’s Many 404 Pages

astorm

Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

No Frills Magento Layout is the only Magento front end book you'll ever need. Get your copy today!

This entry is part 19 of 43 in the series Miscellaneous Magento Articles. Earlier posts include Magento Front Controller, Reinstalling Magento Modules, Clearing the Magento Cache, Magento's Class Instantiation Abstraction and Autoload, Magento Development Environment, Logging Magento's Controller Dispatch, Magento Configuration Lint, Slides from Magento Developer's Paradise, Generated Magento Model Code, Magento Knowledge Base, Magento Connect Role Directories, Magento Base Directories, PHP Error Handling and Magento Developer Mode, Magento Compiler Mode, Magento: Standard OOP Still Applies, Magento: Debugging with Varien Object, Generating Google Sitemaps in Magento, and IE9 fix for Magento. Later posts include Magento Quickies, Commerce Bug in Magento CE 1.6, Welcome to Magento: Pre-Innovate, Magento's Global Variable Design Patterns, Magento 2: Factory Pattern and Class Rewrites, Magento Block Lifecycle Methods, Goodnight and Goodluck, Magento Attribute Migration Generator, Fixing Magento Flat Collections with Chaos, Pulse Storm Launcher in Magento Connect, StackExchange and the Year of the Site Builder, Scaling Magento at Copious, Incremental Migration Scripts in Magento, A Better Magento 404 Page, Anatomy of the Magento PHP 5.4 Patch, Validating a Magento Connect Extension, Magento Cross Area Sessions, Review of Grokking Magento, Imagine 2014: Magento 1.9 Infinite Theme Fallback, Magento Ultimate Module Creator Review, Magento Imagine 2014: Parent/Child Themes, Early Magento Session Instantiation is Harmful, Using Squid for Local Hostnames on iPads, and Magento, Varnish, and Turpentine.

The 404 page has a long and illustrious history in the world of web development. What started as a simple, unfriendly error message has turned into a key part of any site’s experience, and any retail outlet’s conversion rate. Like many other PHP frameworks, Magento faces the challenge of providing a unified 404 experience. Also like many other PHP frameworks, Magento has punted that responsibility onto the end user-developer of the system. In this article we’ll explore the various ways that the Magento cart application generates 404 pages, which will allow you to make educated choices when building your 404 experience.

Before we begin though, a quick history lesson and HTTP primer is in order.

Some HTTP Background

If you have curl installed on your system, try running the following command

curl -I http://example.com

Assuming your computer is connected to the internet, you should see output something like this

HTTP/1.0 302 Found
Location: http://www.iana.org/domains/example/
Server: BigIP
Connection: Keep-Alive
Content-Length: 0

Here’s another one

curl -I http://www.iana.org/domains/example/

with results something like

HTTP/1.1 200 OK
Date: Fri, 22 Apr 2011 13:15:13 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Wed, 09 Feb 2011 17:13:15 GMT
Content-Length: 2945
Connection: close
Content-Type: text/html; charset=UTF-8

The command line curl program allows you to download files over http via the shell. The -I option tells curl that we only want the HTTP headers returned to us, not the actual contents of the file. HTTP headers are the information that a web server sends to the client about the request. While it’s good to understand what every line means, it’s the first line of each response that we’re interested in

HTTP/1.0 302 Found
HTTP/1.1 200 OK

These are HTTP status codes. HTTP stands for Hyper Text Transfer Protocol, and is the common language of the web. It defines how a computer or software application should act when it receives or requests information. The response code can be broken into two parts. The first is the HTTP version being used (HTTP/1.0, HTTP/1.1), and the second is the status itself (302 Found, 200 OK).

This status attempts to describe the type of response from the server. For example, a code of 200 means everything went as expected (OK). A code of 302 tells the client/browser that a resource has been moved to a different URL. This may seem like over engineered nerdy fluff, but it’s actually important.

When a browser receives a status code of 200, it knows to expect a document after the headers, and that it should attempt to render the document, or in the case of supporting files (images, CSS, Javascript), apply the contents of those files to the main HTML document in a way that makes sense (display the image, apply the CSS, run the Javascript). However, when a browser receives a status code of 302, it knows to look for a companion Location header, and then automatically make another request for the URL it finds there.

That’s the first reason status codes are important. They tell the browser what to do with a particular request. Status codes also allow other kinds of web clients, particularly web spiders, to infer information about a page/resource based on its status headers. For example, if a URL returns a status of

301 Moved Permanently

the spider knows it may safely ignore the previous URL in the future, and start treating the new URL in the Location field as canonical. Google infers a significant amount of information about your site based on its headers, which is why their webmaster tools are geared towards cleaning these up.

Status 404

This brings us, finally, to the topic at hand. Give the following request a try

curl -I http://www.iana.org/domains/example/notthere.html

You should get a response something like

HTTP/1.1 404 NOT FOUND
Date: Fri, 22 Apr 2011 14:02:26 GMT
Server: Apache/2.2.3 (CentOS)
Connection: close
Content-Type: text/html; charset=utf-8

A “404 page” gets it’s name from the HTTP status code for file not found. Back in the day, the original web servers were designed to share documents. The 404 status code was originally intended to tell a browser that the file they were looking for was not available. The HTTP specification is silent on how a browser should handle 404 responses. Early web servers included a brief HTML document along with the not found status

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /testing was not found on this server.</p>
</body></html>

and most browsers chose to display whatever HTML was returned via a 404 document. This seemingly innocuous choice had an interesting effect on web development and internet culture.

Webmasters of that bygone era quickly realized that the standard 404 page provided an awful user experience for their visitors, and they started customizing the HTML output on a per-site basis so that a more useful page was returned. From a user experience point of view this allowed the end-user-visitor to continue navigating on the site despite the fact the page they were looking for wasn’t there. From an engineering point of view this created a weird situation where you needed to return a document even though the document wasn’t found. The 404 code went from being a simple status to becoming an integral part of any website’s design.

The interesting bit is, if a modern browser encounters a default 404 page (such as the one above), instead of displaying the page it will display a custom error message

If the original web browsers/web-culture had chosen to implement things this way, the entire idea of a 404 page may have never existed.

404 in the MVC Era

Modern PHP web development and 404’s present a problem that needs to be solved. Out of the box most web servers (Apache, etc.) handle 404 pages themselves. Early PHP web applications relied on a server’s 404 mechanism handling the file not found responses. If the URL was for a file that existed, PHP would process the request. If the user requested a PHP page that didn’t exist, Apache would send back its configured 404 document, and the request would never get to the PHP processing portion.

However, as you’re likely aware, most modern PHP MVC systems route all requests through a single PHP file.

http://example.com/index.php/some/uri/path
http://example.com/some/uri/path

The code in index.php is then responsible for bootstrapping the system, and handing off control to a PHP controller class. The problem this creates is with PHP handling the request, the web server (Apache) can no longer handle 404’s. As far as the web server is concerned, if the request mapped to a PHP file, that’s a 200 OK. This means if a user enters an invalid route, it’s the responsibility of the PHP framework to

  1. Send back HTML for a 404 page
  2. Send back the proper HTTP 404 header

Framework authors need to be careful and provide a centralized 404 mechanism, or else they may end up with multiple sources for 404 page content. Also, and very commonly missed, is sending the proper 404 header. If your PHP page is returning a status 200 header Google ends up indexing every file-not-found page as an actual page, meaning you may have an infinite number of identical pages in your google results, which will negatively impact your search rankings.

Magento gets the status code right. However, it falls prey to the problem most PHP frameworks do, in that there are multiple ways a 404 page is created and rendered. Let’s take a look at those now.

Magento 404 Pages

If we take a look at the rewrite rule (in .htaccess) that captures and redirect’s requests into Magento’s bootstrap file

############################################
## always send 404 on missing files in these folders

    RewriteCond %{REQUEST_URI} !^/(media|skin|js)/

############################################
## never rewrite for existing files, directories and links

    RewriteCond %{REQUEST_FILENAME} !-f 
    RewriteCond %{REQUEST_FILENAME} !-d 
    RewriteCond %{REQUEST_FILENAME} !-l 

############################################
## rewrite everything else to index.php

    RewriteRule .* index.php [L]

we can see that the line that does the capturing is

RewriteRule .* index.php [L]

However, it’s preceeded by by four RewriteCond statments. These statements provide rules that will allow certain requests to skip the bootstrapping process. For example, these three

RewriteCond %{REQUEST_FILENAME} !-f 
RewriteCond %{REQUEST_FILENAME} !-d 
RewriteCond %{REQUEST_FILENAME} !-l 

say only apply this rule if a file (-f), directory (-d) or link (-l) do not exist for the request. This allows Apache to serve out existing static files without incurring the performance cost of Magento’s bootstrapping. The primary reason this rule is here is to allow the serving of CSS, Javascript and images from any folder in the system without additional special cases. You can also use the presence of these rules to implement a simple static cache. If you have a URL like this

http://magento.example.com/some/controller/route

and created a static HTML file at the following location

/path/to/wwwroot/some/controller/route/index.html

Apache would serve out the index.html file instead of handing control over to Magento.

Of particular interest to us is the first rule

RewriteCond %{REQUEST_URI} !^/(media|skin|js)/

This one says if the request URL starts with media, skin, or js, then Apache should handle the request. This means requests for files that don’t exist with URLs that look like the following

http://magento.example.com/media/file.jpg
http://magento.example.com/skin/base/badstyle.css
http://magento.example.com/js/another-file-that-is-not-there.js

will use the web server’s configured 404 page. This means if you want to ensure all 404 pages have the same experience, you still need to configure a custom 404 page via your web server.

That’s the first 404 page you need to be aware of in a Magento system.

Magento’s Outer Shell

Magento’s index.php bootstrap is relatively simple. A few environmental variables are set and checked, and then the following static method is called

Mage::run($mageRunCode, $mageRunType);

The run method is on the Mage class located in app/Mage.php. On the surface this run method is relatively simple.

public static function run($code = '', $type = 'store', $options=array())
{
    try {
        Varien_Profiler::start('mage');
        self::setRoot();
        self::$_app = new Mage_Core_Model_App();
        self::$_events = new Varien_Event_Collection();
        self::$_config = new Mage_Core_Model_Config();
        self::$_app->run(array(
            'scope_code' => $code,
            'scope_type' => $type,
            'options'    => $options,
        ));            
        Varien_Profiler::stop('mage');
    //...    
}

Outside of the profiler lines, all that’s involved in starting up a Magento system is five lines of code

  1. First, the root file path for the application is stored for later retrieval and path creation (self::setRoot();)

  2. Then, an “application” domain model object is instantiated (self::$_app = new Mage_Core_Model_App())

  3. Then, an event collection is instantiated (self::$_events = new Varien_Event_Collection();
    )

  4. Then, a configuration object is instantiated (self::$_config = new Mage_Core_Model_Config();)

  5. Finally, the run method of the application domain model object is called self::$_app->run(...

Each of the objects instantiated here gets assigned as a static
property of the Mage class, and will be referenced later
during the processing of the request. You’ll notice this entire bit
of code is enclosed in a try block.
Let’s take a look at the exception catching to see what happens if an
exception bubbles up to this top layer

...
    Varien_Profiler::stop('mage');

} catch (Mage_Core_Model_Session_Exception $e) {
    header('Location: ' . self::getBaseUrl());
    die();
} catch (Mage_Core_Model_Store_Exception $e) {
    require_once(self::getBaseDir() . DS . 'errors' . DS . '404.php');
    die();
} catch (Exception $e) {
    if (self::isInstalled() || self::$_isDownloader) {
        self::printException($e);
        exit();
    }
    try {
        self::dispatchEvent('mage_run_exception', array('exception' => $e));
        if (!headers_sent()) {
            header('Location:' . self::getUrl('install'));
        } else {
            self::printException($e);
        }
    } catch (Exception $ne) {
        self::printException($ne, $e->getMessage());
    }
}

Here we can see there’s three catch blocks. First Magento looks for its custom exceptions (Mage_Core_Model_Session_Exception, Mage_Core_Model_Store_Exception), and then the last block is a catch-all for any other exception type. The session and generic exception blocks are worth exploring, but that’s for another article. It’s the store exception we’re interested in.

} catch (Mage_Core_Model_Store_Exception $e) {
    require_once(self::getBaseDir() . DS . 'errors' . DS . '404.php');
    die();
}

If a Mage_Core_Model_Store_Exception is thrown anywhere in the system and is uncaught, Magento will catch it up here. When a store exception is caught, Magento will require in the following file.

errors/404.php

This is second Magento 404 handler. It handles page not found states for requests that don’t quite make it to the controller dispatch stage. Let’s take a look at what’s going on in 404.php

Error Proceesor

If you take a look at 404.php you’ll see the following code.

require_once 'processor.php';    
$processor = new Error_Processor();
$processor->process404();

This code bootstraps a mini error processing system inside Magento. (If you’ve spent anytime with Magento you’ll find that it’s the Mandelbrot set of software systems). The end result of process404 being called is the rendering of the following phtml template

errors/default/page.phtml

In turn, this phtml template will include the following inner-template

errors/default/404.phtml

If you had called $processor->process503(); then 503.phtml would have been rendered instead, with page.phtml remaining the outer template. If you’re interested in tracing how this happens, then checkout the definition of the Error_Processor class in

errors/processor.php

Customizing the Store Exception 404 Page

Chance are you’re going to want to customize this 404 page. You could just edit edit page.phtml and 404.phtml with your desired style and content. However, like any Magento core hack, you run the risk of your changes being overritten during an upgrade, and the general scorn of the Magento development community.

Fortunatly, Magento provides a mechanism for creating a custom skin folder for your error pages. Take a look at the following file

errors/local.xml.sample

This is a sample error configuration override file. If you rename it to

errors/local.xml

the Error_Processing class will load this file and use its values rather than use the defaults hard coded in the class, (for legacy reasons Magento will also look for a design.xml file). Take a look at the skin node in this file

<config>
    <skin>default</skin>
    <!-- ... -->
</config>

This is the value that controls which folder the Error_Processor object looks for it’s phtml files in. Let’s change that to something like

<config>
    <skin>our_custom_skin</skin>
    <!-- ... -->
</config>

Error skin names must me comprised of letters, numbers, and the underscore character. A folder created with any other characters will be ignored.

To test our custom 404 we’ll need to trigger a Magento store exception. The simplest way to do that is temporarily add one to the run method in Mage.php

#File: app/Mage.php
public static function run($code = '', $type = 'store', $options=array())
{
    try {
        Varien_Profiler::start('mage');
        self::setRoot();
        self::$_app = new Mage_Core_Model_App();
        self::$_events = new Varien_Event_Collection();
        self::$_config = new Mage_Core_Model_Config();

        #our new exception             
        throw new Mage_Core_Model_Store_Exception('');

        self::$_app->run(array(
            'scope_code' => $code,
            'scope_type' => $type,
            'options'    => $options,
        ));            

If you reload your development environment with the above in place, you’ll see a 404 page something like

Creating the Custom Skin

When we jiggered our system to throw that Exception, Magento ignored the custom value in the <skin/> node because it didn’t find a errors/our_custom_skin folder. Let’s change that now. Copy the existing errors/default to create a new errors/our_custom_skin

cp -r errors/default errors/our_custom_skin

and then let’s edit the text in our_custom_skin/404.phtml. Replace the following file with the following content.

#File: errors/our_custom_skin/404.phtml
<div id="main" class="col-main">
<!-- [start] content -->
    <div class="page-title">
        <h1>404 error: Page not found.</h1>
        <p>
            <em>we're sorry that you / had to see this four o four / it is what it is</em>
        </p>

    </div>
<!-- [end] content -->
</div>

Reload the page and you should now see your own custom 404 page.

Important: The entire default folder will need to be copied over to make this work. There’s isn’t a robust “look in my custom folder, then look in default” fallback system in place as there are in other parts of the Magento system.

Before we continue. you’ll want to restore app/Mage.php by removing the custom exception we dropped into place.

#our new exception             
##throw new Mage_Core_Model_Store_Exception('');

self::$_app->run(array(
    'scope_code' => $code,
    'scope_type' => $type,
    'options'    => $options,
));

No Route 404

So far we’ve covered two of Magento’s 404 errors. The first was the Apache 404 issued when requesting nonexistent files in the media/skin/js folders. The second was the store exception 404. The third, and most common yet most complex is the no route 404. You can see this 404 page by browsing to the following URL

http://magento.example.com/not/a/file

In a default install you should see a page that looks something like this

Magento is an MVC system. If you’re not sure what that means now might be a good time to review the Magento for PHP MVC Developers series. We’ll be here when you get back.

Similar to other web MVC systems, when Magento encounters a URL like /not/a/file, it searches the configuration for a frontName with the name of not. If it finds one, next it will look for a controller in the associated module(s) named something like

class Packagename_Modulename_AController

if it finds the controller, it will look for an action method in that controller named

public fileAction() 

If any of the above steps fail, Magento will search the database for a CMS page with the identifier

not/a/file

If it find a CMS page, that page will be rendered. If none of the above result in a match, Magento will need to create a 404 page to let the user know their resource wasn’t found. In a default installation, Magento does this by manually setting the controller on the request object to the CMS Index controller, and the action to use for the request to noRouteAction.

When this is dispatched, the following code runs

#File: app/code/core/Mage/Cms/controllers/IndexController.php
public function noRouteAction($coreRoute = null)
{
    $this->getResponse()->setHeader('HTTP/1.1','404 Not Found');
    $this->getResponse()->setHeader('Status','404 File not found');

    $pageId = Mage::getStoreConfig(Mage_Cms_Helper_Page::XML_PATH_NO_ROUTE_PAGE);
    if (!Mage::helper('cms/page')->renderPage($this, $pageId)) {
        $this->_forward('defaultNoRoute');
    }
}

In a default instalation, this code looks for a CMS page named no-route, and if it finds one, the CMS page will be rendered. Magento ships with a default CMS page named no-route, which is the “Whoops, our bad…” page you’ve probably seen too much of.

If this CMS page has been deleted or renamed, Magento will forward the request on to the defaultNoRoute controller action,

$this->_forward('defaultNoRoute');

which looks like this

#File: app/code/core/Mage/Cms/controllers/IndexController.php
public function defaultNoRouteAction()
{
    $this->getResponse()->setHeader('HTTP/1.1','404 Not Found');
    $this->getResponse()->setHeader('Status','404 File not found');

    $this->loadLayout();
    $this->renderLayout();
}

resulting in a page like this

Here, Magento is simply setting the correct headers for a 404, and then loading and rendering the layout. This results in a layout handle of cms_index_defaultnoroute being issued, which (again, in a default installation), results in the following Layout Update XML being applied

<cms_index_defaultnoroute>
    <remove name="right"/>
    <remove name="left"/>

    <reference name="root">
        <action method="setTemplate"><template>page/1column.phtml</template></action>
    </reference>
    <reference name="content">
        <block type="core/template" name="default_no_route" template="cms/default/no-route.phtml"/>
    </reference>
</cms_index_defaultnoroute>

In layman’s terms, this removes the left and right content blocks, sets the root template to page/1column.phtml and then adds a content block that renders the following theme template

cms/default/no-route.phtml        

If you’re getting tripped up on layout concepts, reviewing this article or (better yet!) purchasing No Frills Magento Layout should set you straight.

Customizing No Route 404

You’ll notice the above paragraphs were peppered with a phrase something like “in a default install”. Out in the wild, there’s a huge number of ways the no route page might be customized. If you’re working for a variety of clients, or on a team with a number of head strong developers, you’ll probably run into some combination of the following. Neither the community or Magento Inc. has much guidance on “the right” way to do this, so your best bet is to be aware of each possible customization point and learn to debug them quickly. Let’s take a look.

Default Pages

If open up the Admin Console’s system configuration at

System -> Configuration -> Web -> Default Pages

you’ll see there’s several ways you might configure the behavior of the no route 404. Above we mentioned that Magento will attempt to load a page with the CMS identifier of no-route. The page that Magento attempt to load is actually controlled by the CMS No Route Page setting. If you wanted to hand over management of the CMS Page to some folks from marketing, this is your best bet

Using a Different Controller Action

The CMS no route page works because there’s code in Magento that will override the controller and action used for a request if no real route to a controller is detected. By default, that’s the CMS controller and the noRouteAction method. However, using the Default No-route URL System Configuration, a system owner can change which controller action is dispatched to a no route state. By default, this value is

cms/index/noRoute

The format of this string is

frontname/controller/action-name

You might do this if you were creating a custom module to run a significant amount of logic before (or after) displaying the 404 page.

If it wasn’t obvious from the above, if you’re using a custom controller action for your 404 page, you loose the ability to set a custom CMS page with CMS No Route Page

Controller 404 via Layout XML

The no route 404 page is rendered using the Magento layout xml system. That means its appearance may be customized by Adding custom Layout XML Updates to the handles cms_index_noroute and cms_index_defaultnoroute. This could happen via local.xml, a custom layout XML file, or by editing/replacing one of the existing layout XML files in the design package.

Finally, even if no custom Layout Update XML has been added, it’s possible that a new no-route.phtml template has been added to the current theme, or that someone has modified the no-route.phtml template in the base folder.

Wrap UP

A good 404 page is an important part of any website’s user experience, and a Magento store is no exception. We’ve shown you the various places where Magento will detect and render a 404 into its system, as well as shown you the various ways that the experience may have been customized. With these tools in hand, you’ll be ready to conquer any 404 related challenges that the fates (or you boss!) throw at you.

Originally published April 25, 2011
Series Navigation<< IE9 fix for MagentoMagento Quickies >>

Copyright © Alan Storm 1975 – 2017 All Rights Reserved

Originally Posted: 25th April 2011