Recently I was moving a few of the websites and web applications I run on my laptop from
.dev domain names over to
.localhost domain names. The
.dev top level domain became a real thing a few years back. Google bought it, started using it, and (most importantly) browsers started automatically redirecting
.dev domain names from http to https. Between doing a cert dance whenever I wanted to spin up a locally hosted web application and switching over to
.localhost, the later seemed like the better option.
Along the way I ended up running into a sea of paper cuts moving my WordPress sites over, and figured I’d leave this here to light the way when I (or you) need to do this all again. As a side note — I’m philosophically ground zero for “everyone should run their own websites”, but looking at how — annoying? — this one issue was for me I don’t think the world will embrace self-web-publishing anytime soon.
The Paper Cuts
WordPress, like most CMS systems, wants/needs to know the domain name (also known as the host name) of your website. While this information is available via PHP’s
HTTP_HOST server variable, it comes from the web server and the web server gets it from the HOST header of the HTTP request and that’s a problem. Whenever an end user can send an arbitrary string into your system via the HTTP request that’s a vector for an exploit of your system. Rather than take on the hard challenge of validating and escaping the value in
HTTP_HOST before outputting it (in multiple contexts), most CMS systems just ask you to configure it.
This works great — until you decide it’s time to change the URL of your website. Then you’re in for a headache. Ask any Magento developer about cookie domains.
For WordPress it’s easy enough to pop over to
Settings -> General and change both your WordPress address URL and Site Address URL — but that’s only the start. WordPress’s URL generating functions appear to generate full path (vs.
/relative/path) URLs, and this has led to a culture where paths are generated and stored with the domain name.
In practice this means that any of the core WordPress database tables might have your old URL in them, and depending on why you’re switching URLs you may, or may not, want to update these values. Then there’s extensions — WordPress offers some simple APIs for storing configuration and option values in the
wp_options table. When extensions want a little more structure to the data they’re saving they’ll often store data in this table using either JSON or PHP’s
serialize format. The
serialize option makes a simple SQL based search and replace for your host name a non-viable option. Consider an array of PHP serialized strings
s:5? Those are the lengths of the string in the array. If you were to edit one of the strings such that its length didn’t match the number, you’d get a PHP
Notice and the call to
unserialize would return
PHP Notice: unserialize(): Error at offset 46 of 50 bytes in ...
In other words, if you try to search and replace your URLs without updating those lengths, things will break.
There’s nothing official from WordPress core to deal with this sort of URL change. Willem Wigman was kind enough to point out the interconnectit/Search-Replace-DB repo, which is a PHP program that will read through database tables and perform a search and replace, inluding unserializing serialized data and recursing into arrays and objects. It’s README also contains vague gems like this,
Three character UTF8 seems to break in certain cases.
but like all free and open source software you get what you get but you don’t get upset.
The last bit of existential dread I ran into was post guids. The term guid usually refers to a specific sort of universally unique identifier — a string-or-value that a programmer can generate with a near zero probability of it being duplicated. WordPress guids are something else — they’re still unique identifiers for a post, but part of the algorithm for that uniqueness includes using the host name. So I was faced with the existential question of whether I should change these guids to reflect to the new hostname or leave the old host name and honor the intent of the guid.
Change the host name, I said. How hard could it be?