Using Apache Rewrite Rules to make cleaner, prettier URLs

This is my explanation of how to turn ugly, auto-generated URLs into pretty ones using the magic of Apache's mod_rewrite module.

Apache rewrite rules regular expressions comic
Just kidding! Regular expressions are not magic

My goal

  • My goal was to transform an ugly URL which originally looked like this:
    https://ankiewicz.com/ photos/travel/hullbeach/ slides/tree-in-sand.html
  • ...into this prettier, cleaner URL:
    https://ankiewicz.com/ photos/travel/hullbeach /tree-in-sand/
  • I wanted to do two things:
    1. Get rid of the term slides
    2. Get rid of the extension .html

You may have reasons for wanting to do this

  • Your gallery generator (I use jAlbum) only outputs ugly URLs
  • You have 10,000 ugly URLs and cannot be bothered to clean them up by hand
  • Some third-party plugin creates ugly URLs
  • You're OCD and the URLs drive you crazy
  • It's okay. We get it. Let's make them pretty

Some things to know

  • The mod_rewrite module might already be enabled
  • If not, you may need to request your system administrator to enable it for your website
  • The URL is the domain followed by the path
  • My domain is ankiewicz.com
  • This is a sample ugly path as the user would see it in their browser when visiting my domain:
    /photos/travel/ hullbeach/slides/tree-in-sand.html
  • This is a sample pretty path as the user would see it in their browser when visiting my domain:
    /photos/travel/ hullbeach/tree-in-sand/

First things first

  • Create a plain-text file named .htaccess (don't forget the dot)
  • Add as many comments as you want using the # symbol
  • Upload it to the top level of your website (root, as they say)
  • Include the following code:
  <IfModule mod_rewrite.c>
  
RewriteEngine On
RewriteBase /
#### Prettify my URLs
RewriteRule ^photos/(.*)/(.*)/(.*)/$ /photos/$1/$2/slides/$3.html [L]
</IfModule>

The nitty gritty

  • Domain consistency is not part of this discussion but I included it so you can see how it works
  • The first line that starts RewriteRule is where the magic happens
  • There are three parts to a Rewrite Rule:
    1. the pretty path on the left
    2. the ugly path on the right
    3. the flag in brackets at the end
  • The flag [L] tells it to stop processing rules after that line
  • The path on the left is what I want my path to look like
  • The path on the right is the original path that my users are used to seeing
  • The symbols are part of a system of regular expressions for matching one string to another

What the symbols mean

^ matches the beginning of the string
. matches any character
.* matches any character any number of times
( ) groups characters into a single unit and captures a match for use in a back-reference
/ are the normal slashes in your path
$ matches the end of your string

How it works

  • There are 3 instances of (.*) in my pretty path
  • Each instance of (.*) can back-referenced in my ugly path as $1, $2, and $3
  • The path on the right is a symbolic representation of a real, ugly file system path:
    • $1 represents the 1st instance of (.*)
    • $2 represents the 2nd instance of (.*)
    • $3 represents the 3rd instance of (.*)
  • You can now format your <a href> links to take the pretty form
  • For instance, here is my new link format:
    <a href="/photos/travel/ hullbeach/tree-in-sand/">here is a link</a>
  • You can link to your pretty path and it will find and serve up the file at the ugly path without the user being the wiser

Caveats

  • For this to work seamlessly you must change all the links in your HTML to be pretty
  • Otherwise your users will still be able to see the ugly paths
  • The ugly paths didn't die; I am simply not linking to them in my web pages anymore
  • If I typed the ugly path in by hand, it would still be accessible to me
  • If Google had indexed my ugly paths, they would still be accessible via the search engine. I make sure this doesn't happen by establishing a canonical URL for each page on my site.

End results