Making Friendly URLs
By Flib
2009-05-04
Category: PHP
The problem
PHP uses a single script to map onto a single script URL.
ie http://example.com/myscript.php maps directly to DOCROOT/myscript.php (where DOCROOT is the document root of the vhost)
The normal way to add extra data to a URL is to use query strings appended to the end of the script URL.
ie http://example.com/myscript.php?q=foo&num=100&bar=baz
For many sites, using this format is undesirable for many reasons.
Some of these are
- Search Engine Optimisation
- Not easily human readable
- Many proxies wont cache any URL with ? in it
Solution 1 - 404 Error docs
When a file doesn't exist the webserver looks for a 404 error document. This can be a static file, a script or even another server. Out of these three options, only the second is useful to us here.
At its simplest, you can specify in either a .htaccess file (assuming AllowOverides is set to allow this) or in the httpd.conf file an ErrorDocument directive to tell the server what file to use as the error page.
ie.
ErrorDocument 404 /errorhandler.php
Exactly what information differs on exactly how the server is setup, however on my suphp based setup here, the following useful variables are set when accessing a non-existent URL called http://example.com/test with the above option added to the .htaccess file.
REDIRECT_STATUS 404
SCRIPT_URL /test
SCRIPT_URI http://www.example.com/test
SCRIPT_FILENAME /home/example/example.com/errorhandler.php
REDIRECT_URL /errorhandler.php
ORIG_SCRIPT_FILENAME /var/www/cgi/php5.cgi
ORIG_PATH_INFO /errorhandler.php
ORIG_PATH_TRANSLATED /home/example/example.com/errorhandler.php
ORIG_SCRIPT_NAME /cgi/php5.cgi
As you can see, all the information we need is present in the above variables.
The last issue we need to solve is that both the client and the server still see it as a 404 error page. This means that IE will override the page with its own error page if its small enough and apache will fill up your error logs with requests that use this mechanism.
To get around this we need to do two things.
header('Status: 200 OK');
will tell apache to log it as a normal request and
header('HTTP/1.1 200 OK');
will send the correct response code to the browser and any intermediate proxies solving both issues with only two extra lines of code.
The advantage of this technique is when used for 'just-in-time' content generation. You can produce a static html file in your error handler save it to the location that was originally requested and send a copy to the user. Next time the same request comes in, the file will already exist and no redirection will occur.
Solution 2 - Mod_Rewrite
Mod_Rewrite is an apache module that is designed for rewriting urls. It does this by taking 0 or more conditions and if they match, applying 1 or more rules to the URL.
We can best demonstrate this with an example.
We have some rewrite rules as follows
RewriteEngine On
RewriteRule ^test/.* rewritehandler.php
RewriteRule ^test$ rewritehandler.php
The user requests http://example.com/test The server looks at the request and runs it through the url rewritting rules.
The first rule doesn't match, since there is no slash on the end of the url. Url rewritting doesn't happen. The second rule does however match. The URL is redirected internally by apache to the rewritehandler.php script.
If you haven't seen regular expressions before, then now might be the time to have a look at them, since writing rules will make great use of them.
Many of the same variables are sent when suphp is used, including:
REDIRECT_STATUS 200
SCRIPT_URL /test
SCRIPT_URI http://www.example.com/test
SCRIPT_FILENAME /home/example/example.com/errorhandler.php
REDIRECT_URL /errorhandler.php
REQUEST_URI /test
ORIG_SCRIPT_FILENAME /var/www/cgi/php5.cgi
ORIG_PATH_INFO /errorhandler.php
ORIG_PATH_TRANSLATED /home/example/example.com/errorhandler.php
ORIG_SCRIPT_NAME /cgi/php5.cgi
Personally, I use something a little more complex for my URL rewriting since my requirements are a little more restrictive.
RewriteCond %{REQUEST_URI} !^/$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* page.php
Essentially, this will rewrite all urls on the site where the url isn't the index page http://example.com/ and only if the file doesn't exist. This is essentially a variant of the 404 redirection above.
This only just scratches the surface of what Mod_rewrite can actually do, but it should be enough to get you started with this technique.
Solution 3 - PathInfo
There is one last technique that few people seem to be aware of and that is PATHINFO.
Put simply, if you append extra data to the end of a url it is available to the script.
ie. We have a script page.php in the root of the site. http://example.com/page.php works fine. We can use http://example.com/page.php/your/pathinfo/goes/here/ and /your/pathinfo/goes/here/ will be made available to the script in a variable called $_SERVER['PATH_INFO'].
We still have that pesky .php as part of the path though. Can we do anything about it?
The answer is yes. We could use mod_rewrite to do it, however if you have rejected that option already, then we need to use something else.
One thing we can use is the Multiviews option which turns on 'Content Negotiation'.
We can do this by adding Options +Multiviews in the .htaccess file.
This may have some impact on caching since it forces the output of a nocache header by apache, but it does the job we want it to.
With Multiviews enabled, we can use urls such as http://example.com/page/your/pathinfo/goes/here/ where page.php is a script that exists in the document root of the server.
Another possible option instead of the Multiviews option, is to enable mod_speling (this is not a typo) which will look for common spelling mistakes in urls before redirecting to a 404 errorpage. I have not used this technique myself since PHP 4.0.6 so I cannot be certain this will work. As always, test test test...