About Online Matters

Posts Tagged ‘.htaccess’

PostHeaderIcon .htaccess Grammar Tutorial – .htaccess Special Characters

One thing this blog promises is to provide information about anything online that someone coming new to the business of online marketing needs to know.  The whole point being my pain is your gain.  Well, I have had some real pain lately around .htaccess file rewrite rules and I wanted to provide an easy translation to those with a .htaccess grammar tutorial for beginners.

What is a .htaccess file and Why Do I Care?

A .htaccess file is a type of configuration file for Apache servers (only.  If you are working with Microsoft IIS, this tutorial does not apply).   There are several ways an Apache web server can be configured.  Webmasters who have write access to the Apache directories can access a series of files (especially httpd.conf) that allow them to do what are called server-side includes, which are preferable in many cases because they allow for more powerful command structures and tend to run faster than .htaccess files.

Why you and I care about .htaccess files is that many of us run in a hosted environment where we do not have access to Apache directories.  In many cases we run on a shared server with other websites.  In these cases, the only way to control the configuration of the Apache web server is to use a .htaccess file.

The .htaccess file should always be put in the root directory of the site to which it applies.

Why would I want to control the configuration of the Apache server?  Well, the most likely scenario is that you have moved pages, deleted pages, or renamed pages and you don’t want to lose the authority they have gained with the search engines that gives you a good placement in the SERPs.  You do this through what are called redirects that tell the server that if someone requests a specific URL like http://www.onlinematters.com/oldpage.htm  it will automatically map that to an existing URL such as http://www.onlinematters.com/seo.htm.  Another common reason to have a .htaccess file is to provide a redirect to a custom error page when someone types in a bad URL.

.htaccess Files are Touchy

.htaccess files are very powerful and, like most computer communications, are very exacting in the grammer they use to communicate with the Apache server. The slightest syntax error (like a missing space) can result in severe server malfunction. Thus it is crucial to make backup copies of everything related to your site (including any original .htaccess files) before working with your .htaccess.  It is also important to check your entire website thoroughly after making any changes.  If any errors or other problems are encountered, employ your backups immediately to restore the original configuration while you test your .htaccess files.

Is There a Place I Can Check the Grammar of My .htaccess File?

I asked this question at SMX West 2009 at a panel on Apache server configuration and 301 redirects (301 Redirect, How Do I Love You? Let Me Count The Ways).    The speakers were Alex Bennert, In House SEO, Wall Street Journal; Jordan Kasteler, Co-Founder, SearchandSocial.com; Carolyn Shelby from CShel; Stephan Spencer, Founder & President, Netconcepts; and Jonah Stein, Founder, ItsTheROI. These are all serious SEO players – so they would know if anyone would.  When the question got asked, they all looked puzzled and then said "I just test it live on my staging server."  I have spent hours looking for a .htaccess grammar checker and have yet to find anything that has any real horsepower.   So seemingly the only options to check your .htaccess grammar are either to test it on your stage or live server or find a friend or Apache guru who can review what you have done. 

Basic .htaccess Character Set

We’re going to start this overview of .htaccess grammar with a review of the core character definitions (which is probably the hardest documentation I’ve had to find.  You’d think everyone would start with "the letters"  of the alphabet, but believe it or not, they don’t).  In the next post, we will then construct basic statements with these character sets so you can see them in action.  After that, we’ll move into multipage commands. 

#
the # instructs the server to ignore the line. Used for comments. Each comment line requires it’s own #. It is good practice to use only letters, numbers, dashes, and underscores, as this will help eliminate/avoid potential server parsing errors.
 
[C]
Chain: instructs server to chain the current rule with the previous rule.
 
[E=variable:value]
Environmental Variable: instructs the server to set the environmental variable "variable" to "value".
 
[F]
Forbidden: instructs the server to return a 403 Forbidden to the client. 
 
[G]
Gone: instructs the server to deliver Gone (no longer exists) status message. 
 
[L]
Last rule: instructs the server to stop rewriting after the preceding directive is processed.
 
[N]
Next: instructs Apache to rerun the rewrite rule until all rewriting directives have been achieved.
 
[NC]
No Case: defines any associated argument as case-insensitive. i.e., "NC" = "No Case".
 
[NE]
No Escape: instructs the server to parse output without escaping characters.
 
[NS]
No Subrequest: instructs the server to skip the directive if internal sub-request.  
 
[OR]
Or: specifies a logical "or" that ties two expressions together such that either one proving true will cause the associated rule to be applied.
 
[P]
Proxy: instructs server to handle requests by mod_proxy
 
[PT]
Pass Through: instructs mod_rewrite to pass the rewritten URL back to Apache for further processing.  
 
[QSA]
Append Query String: directs server to add the query string to the end of the expression (URL).
 
[R]
Redirect: instructs Apache to issue a redirect, causing the browser to request the rewritten/modified URL.
 
[S=x]
Skip: instructs the server to skip the next "x" number of rules if a match is detected.
 
[T=MIME-type]
Mime Type: declares the mime type of the target resource.
 
[]
specifies a character class, in which any character within the brackets will be a match. e.g., [xyz] will match either an x, y, or z.
 
[]+
character class in which any combination of items within the brackets will be a match. e.g., [xyz]+ will match any number of x’s, y’s, z’s, or any combination of these characters.
 
[^]
specifies not within a character class. e.g., [^xyz] will match any character that is neither x, y, nor z.
 
[a-z]
a dash (-) between two characters within a character class ([]) denotes the range of characters between them. e.g., [a-zA-Z] matches all lowercase and uppercase letters from a to z.
 
a{n}
specifies an exact number, n, of the preceding character. e.g., x{3} matches exactly three x’s.
 
a{n,}
specifies n or more of the preceding character. e.g., x{3,} matches three or more x’s.
 
a{n,m}
specifies a range of numbers, between n and m, of the preceding character. e.g., x{3,7} matches three, four, five, six, or seven x’s.
 
()
used to group characters together, thereby considering them as a single unit. e.g., (perishable)?press will match press, with or without the perishable prefix.
 
^
denotes the beginning of a regex (regex = regular expression) test string. i.e., begin argument with the proceeding character.
 
$
denotes the end of a regex (regex = regular expression) test string. i.e., end argument with the previous character.
 
 ?
declares as optional the preceding character. e.g., monzas? will match monza or monzas, while mon(za)? will match either mon or monza. i.e., x? matches zero or one of x.
 
!
declares negation. e.g., “!string” matches everything except “string”.
 
.
a dot (or period) indicates any single arbitrary character.
 
-
instructs “not to” rewrite the URL, as in “...domain.com.* - [F]”.
 
+
matches one or more of the preceding character. e.g., G+ matches one or more G’s, while "+" will match one or more characters of any kind.
 
*
matches zero or more of the preceding character. e.g., use “.*” as a wildcard.
 
|
declares a logical “or” operator. for example, (x|y) matches x or y.
 
\
escapes special characters ( ^ $ ! . * | ). e.g., use “\.” to indicate/escape a literal dot.
 
\.
indicates a literal dot (escaped).
 
/*
zero or more slashes.
 
.*
zero or more arbitrary characters.
 
^$
defines an empty string.
 
^.*$
the standard pattern for matching everything.
 
[^/.]
defines one character that is neither a slash nor a dot.
 
[^/.]+
defines any number of characters which contains neither slash nor dot.
 
http://
this is a literal statement — in this case, the literal character string, “http://”.
 
^domain.*
defines a string that begins with the term “domain”, which then may be proceeded by any number of any characters.
 
^domain\.com$
defines the exact string “domain.com”.
 
-d
tests if string is an existing directory
 
-f
tests if string is an existing file
 
-s
tests if file in test string has a non-zero value   

 

      Redirection Header Codes [ ^ ]

  • 301 – Moved Permanently
  • 302 – Moved Temporarily
  • 403 – Forbidden
  • 404 – Not Found
  • 410 – Gone

 

 

 

 

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare
Posts By Date
September 2014
M T W T F S S
« Jul    
1234567
891011121314
15161718192021
22232425262728
2930