About Online Matters

PostHeaderIcon The End of The Chasm and the Beginning of The Rapids

I’m back at it after travel to SMX Advanced London, where I had my first speaking opportunity as the CEO of OnlineMatters.  Now that I’m clear from that presentation, it’s time to get back to the topic of my prior post (not) crossing The Chasm.

In my last post, I posited that The Chasm, as far as Internet-based businesses are concerned, is quickly disappearing.  Let me explain why and why The Chasm has now been replaced by what I call The Rapids.

The Chasm exists because of the disconnect between

  • The time/resources needed to evolve both a high-tech product and business model from meeting the needs of early adopters to support the larger market segment – the early majority.
  • The money available to fund this transition, which is limited by the size of the early adopter market.  The available funds are not likely to grow until the early majority purchases the product en masse, for two reasons. First, because sales generate cash.  Second, because sales validate that the business model and product features are now positioned for scale, thus increasing the company valuation and ability to attract new third-party capital needed for growth.

Voila – A Catch 22 and birth of The Chasm.

But today’s Internet-based business startups work differently.  Thanks to new technologies and approaches to IP (open APIs, open source licensing, crowdsourcing, cloud computing, mashups), software products that used to take months and years to conceive and develop using a large number of engineering and marketing resources, can now be brought to market in a few weeks by two-three people in a condo (Sad to say, the garage crowd went upscale in the late 90s in Silicon Valley and have never really returned to their roots in the humble garage).  Without a lot of effort, they use word-of-mouth and viral techniques, as well as search engine optimization and perhaps a little PR, to acquire an initial set of customers who hopefully like the product and tell their friends.  If they are smart, they add a customer feedback tool like getsatisfaction or uservoice to their sites and begin collecting immediate, extensive, and continuous feedback from customers.  They plow this feedback into the product using daily or even hourly code pushes.

The difference in speed of product evolution between traditional high tech startups and Internet-based business startups is like the difference between the gestational period of an elephant and a virus. And that difference is one of two reasons The Chasm is quickly shrinking, and ultimately disappearing, for Internet-based business startups.

While products in both cases evolve in discrete steps, the size of the steps in the case of an Internet-based business are relatively small and from day to day customers do not see huge changes in the product they are already familiar with.  However, they get to use the product even as it is evolving, unlike the case in more traditional hardware or software businesses where customers can only engage with a new set of features when they are released in large, discrete “chunks” and not before.  As a result the early adopters are brought along even as new customers try the product.  Each step involves a group that more and more reflects the larger majority of customers until at some point the product meets the needs of the earliest of what would have been called the early majority –  who also happen to be the latest of the early adopters.  There is no ability in this case to find a demarcation between these two groups.  Customers’ perceptions of the product and their needs from the product change as the product itself changes because they experience those changes in small steps as they occur.  Obviously some early customers will chose to leave the product as it evolves, but that is true of any product at all times as customer attrition is a fact of life.  Rather, the customer need and product feature evolve in tandem in a virtuous cycle, removing any need to leap between one set of customer expectations/needs and another.

So on the one hand, the capital requirements for an Internet-based business startup have declined (and continue to decline) substantially, while the time required to evolve the product and engage customers in its evolution continues to shrink.  As a result, the challenge for Internet businesses is not The Chasm, because it effectively doesn’t exist any longer.

The new strategic challenge for Internet-based startups is sifting through the reams of data available – customer feedback, site analytics, Twitter feeds, Facebook fan pages, competitive data (of which there is much more today than ever) –  in near real-time and clearly identifying the critical strategic requirements for the business and product requirements needed to serve the core customer segments.  Whereas the traditional high tech startup has difficulty getting enough feedback and making enough changes in a short enough time to perfect the product and business model, the Internet-based business startup faces the problem of determining the key insights from a plethora of data (“the rocks”) and making those insights actionable in time frames that are more logical for a supercomputer than a human being.    More importantly, they need to have the discipline to avoid taking on too many strategic imperatives (“oversteering”) and not over-evolving the product because…well…because it is relatively easy to do.    The Internet startup isn’t crossing a chasm.  They are trying to avoid the rocks and not oversteer their course.  They are navigating The Rapids.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon The End of The Chasm Is Nigh – Intro

Many years ago at Stanford, I had the opportunity to work with a team of researchers related to Everett Rogers, who wrote the book Diffusion of Innovations.  That book has had huge influence in high tech, because it was the first accessible, mass-market publication to provide a working model of how new technologies achieve market acceptance.  The most famous image is the Adoption Curve (see below), which defined 5 categories of technology adopters: innovators, early adopters, early majority, late majority, and laggards.  These terms have become fundamental in high tech marketing, and you will often hear phrases like “Our initial target market are the early adopters” in marketing planning sessions.

Everett Rogers Original Adoption Curve

Since I was involved with the team that developed the Adoption Curve, it became a standard part of my repertoire as a marketer.  Like most others, it structured my views on how to approach any market for a new product innovation.

Then in 1991, along came Geoffrey Moore, a consultant with the McKenna Group, who published Crossing the Chasm. Crossing the Chasm expanded on Roger’s diffusion of innovation model.  Moore argued that there is a chasm between the early adopters of the product (the “innovators”, or technology enthusiasts and visionaries) and the early majority, who while appreciating a new technology tend to be more pragmatic about its application.  As a result, the needs and purchasing decision-making of these two groups are quite different.  Since effective marketing requires selling to the needs of a specific segment, there comes a time when young companies face a “chasm” where the features and marketing that helped them gain their early followers will not work, and thus they need to adapt their business to a new set of customers and expectations.  It takes time, energy, and a lot of experimentation to find the right new model.  But in high tech businesses,  especially prior to the Web, sales cycles tend to be relatively long (12 -18 months is not unusual).  Given that most small companies have limited resources, the number of experimental cycles they can undertake to discover the correct new model is thus limited.  This makes the transition extremely hard – limited resources, limited time and a lot of spinning of wheels until the right model is discovered.  Requires a lot of heavy lifting and long hours – and if you’ve ever been through this, you’ll know why Moore chose to call it  ”a chasm.”   It feels like a huge, almost overwhelming leap from where you are today to where you need to be tomorrow.  Even with a running start, when you take the leap to grow your company to the next level, it’s easy to miss and “fall into the chasm.”

Everett Rogers Technology Adoption Curve Adapted with The Chasm

I had been working with the technology adoption model visually in my head for almost 10 years at the time Moore published his book.  And when I saw his curve, I realized that we tend to see only what we have modeled (or had modeled by others) in our minds about how the world works.  I had been struggling with the chasm for all that time, and never saw it, even though it was staring me in the face.  I swore that the next time I had an opportunity to experience something that was at odds with my internal models of reality, I wouldn’t ‘ignore the data’ and make a concerted effort to see past the limitations of my own mind.

So Geoff.  I have one for you.  For web-based businesses, the chasm is closing and I can already see a time in the near future when it no longer becomes a barrier to a company’s transition from a customer base mainly made up of innovators to a customer base of early adopters.  The End of “The Chasm” Is Nigh.  Darwin – and the real-time web – are dealing with it.

The detailed rationale in my next post.  Right now, I need to get onto my day job.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Web Site Latency and Performance Issues – Part 6

Taking up where we left off in part 5…

In the last post, we had just moved aboutonlinematters.com to a privately-hosted standalone server and seen a substantial decrease is web site latency. We had seen our ratings improve in Google Page Speed from being better than 43% of similar websites to about 53% of sites. So great improvement. But we were still showing a lot of issues in ySlow and the Google Page Speed tool. These fell into three categories:

  • Server Configuration. This involves optimizing settings on our Apache web server: enabling gzip for file compression, applying entity tags, adding expires headers, turning on keep-alive,  and splitting components across domains.
  • Content Compression. This involves items like compressing images,  javascript, and css, specifying image sizes, and reducing the number of DOM elements.
  • Reducing External Calls. This involves combining all external css and javascript files into a single file, using cookieless domains, minimizing DNS lookups and redirects, as well as optimizing the order and style of scripts.

We decided to attack the web site latency issues in stages, first attacking those elements that were easiest to fix (server configuration) and leaving the most difficult to fix (reducing external calls) until last.

Server Configuration Issues

In their simplest form, server configuration issues related to web site latency have to do with settings on a site’s underlying web server, such as Apache.   For larger enterprise sites, server configuration issues cover a broader set of technical topics, including load balancing across multiple servers and databases as well as the use of a content delivery network.  This section is only going to cover the former, and not the latter, as they relate to web site latency.

With Apache (and Microsoft IIS), the server settings we care about can be managed and tracked through a page’s HTTP headers.  Thus, before we get into the settings we specifically care about, we need to have a discussion of what HTTP headers are and why they are important.

HTTP Headers

HTTP headers are an Internet protocol, or set of rules, for formatting certain types of data and instructions that are either:

  • included in a request from a web client/browser, or
  • sent by the server along with a response to a browser.

HTTP headers carry information in both directions.  A client or browser can make a request to the server for a web page or other resource, usually a file or dynamic output from a server side script.  Alternately, there are also HTTP headers designed to be sent by the server along with its response to the browser or client request.

As SEOs, we care about HTTP headers because our request from the client to the server will return information about various elements of server configuration that may impact web site latency and performance. These elements include:

  • Response status; 200 is a valid response from the server.
  • Date of request.
  • Server details; type, configuration and version numbers. For example the php version.
  • Cookies; cookies set on your system for the domain.
  • Last-Modified; this is only available if set on the server and is usually the time the requested file was last modified
  • Content-Type; text/html is a html web page, text/xml an xml file.

There are two kinds of requests. A HEAD request returns only the header information from the server. A GET request returns both the header information and file content exactly as a browser would request the information. For our purposes, we only care about HEAD requests. Here is an example of a request:

Headers Sent Request
HEAD / HTTP/1.0
Host: www.aboutonlinematters.com
Connection: Close

And here is what we get back in its simplest form using the Trellian FireFox Toolbar :

Response: HTTP/1.1 200 OK
Date: Sun, 04 Apr 2010 00:17:06 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.3.0
X-Pingback: http://www.aboutonlinematters.com/xmlrpc.php
Link: <http://wp.me/DbBZ>; rel=shortlink
Content-Encoding: gzip
Cache-Control: max-age=31536000
Expires: Mon, 04 Apr 2011 00:17:06 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: Chunked
Proxy-Connection: Keep-alive
x-ua-compatible: IE=EmulateIE7

Different tools will return different header information depending on the specific requests made in the calling script. For example, Live HTTP headers, a plugin for FireFox, provides detailed header request and response information for every element on a page (it basically breaks out each GET and shows you the actual response that comes back from the server). This level of detail will prove helpful later when we undertake deep analysis to reduce external server requests. But for now, what is shown here is adequate for the purposes of our analysis.

For a summary of HTTP header requests and response codes, click here .  But for now, let’s get back to configuring our Apache Server to reduce web site latency.

Apache Server Settings Related to Site Latency

Enabling Gzip Compression

Web site latency substantially improves when the amount of data that has to flow between the server and the browser is at a minimum.  I believe I’ve read somewhere that image requests account for 80% of the load time of most web pages, so just following good image-handling protocols for web sites (covered in a later installment) can substantially improve web site latency and page loading times.  However, manually compressing images is painful and time consuming.  Moreover, there are other types of files – Javascript and CSS are the most common – that can also be compressed.

Designers of web servers identified this problem early on and provided a built-in tool on their servers for compressing files moving between the server and the browser.  Starting with HTTP/1.1, web clients indicate support for compression by including the Accept-Encoding header in the HTTP request.

Accept-Encoding: gzip, deflate

If the web server sees this header in the request, it may compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response.

Content-Encoding: gzip

Gzip remains the most popular and effective compression method. It was developed by the GNU project and standardized by RFC 1952. The only other compression format is deflate, but it’s less effective and less popular.

Gzipping generally reduces the response size by about 70%.  Approximately 90% of today’s Internet traffic travels through browsers that claim to support gzip. If you use Apache, the module configuring gzip depends on your version: Apache 1.3 uses mod_gzip while Apache 2.x uses mod_deflate.

Configuring Entity Tags

Web servers and browsers use Entity tags (ETags) to determine whether the component in the browser’s cache, like an image or script (which are examples of an “entity”) matches the one on the origin server. It is a simple string, surrounded by quotation marks, that uniquely identifies a specific version of the selected component/entity. The origin server specifies the component’s ETag using the ETag response header.

HTTP/1.1 200 OK
Last-Modified: Sun, 04 Apr 2010 00:37:48 GMT
Etag: "1896-bf9be880"
Expires: Mon, 04 Apr 2011 00:37:48 GMT

Later, if the browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned.

GET http://www.aboutonlinematters.com/wp-content/plugins/web-optimizer/cache/f39a292fcf.css?1270299922 HTTP/1.1
Host: www.aboutonlinematters.com
If-Modified-Since: Sun, 04 Apr 2010 00:37:48 GMT
If-None-Match: "1896-bf9be880"
HTTP/1.1 304 Not Modified

ETags can impact site latency because they are typically constructed using attributes that make them unique to a specific server. ETags won’t match when a browser gets the original component from one server and later tries to validate that component on a different server, which is a fairly standard scenario on Web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers. If the ETags don’t match, the web client doesn’t receive the small, fast 304 response that ETags were designed for.  Instead,  they get a normal 200 response along with all the data for the component.  This isn’t a problem for small sites hosted on a single server. But it is a substantial problem for sites with multiple servers using Apache or IIS with the default ETag configuration.  Web clients see higher web site latency, web servers have a higher load,  bandwidth consumption is high, and proxies aren’t caching content efficiently.

So when a site does not benefit from the flexible validation model provided by Etags, it’s better to just remove the ETag altogether. In Apache, this is done by simply adding the following line to your Apache configuration file:

FileETag none

Expires Headers

The Expires header makes any components in an HTTP request cacheable. This avoids unnecessary HTTP requests on any page views after the initial visit because components downloaded during the initial visit, for example images and script files, remain in the browser’s local cache and do not have to be downloaded on subsequent requests. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components.

Browsers (and proxies) use a cache to reduce the number and size of HTTP requests, making web pages load faster.  The Expires header in the HTTP response tells the client how long a component can be cached. This far future Expires header

Expires: Thu, 15 Apr 2020 20:00:00 GMT

tells the browser that this response won’t be stale until April 15, 2020.

Apache uses the ExpiresDefault directive to set an expiration date relative to the current date. So for example:

ExpiresDefault "access plus 10 years"

sets the Expires date 10 years out from the time of the request.

Using a far future Expires header affects page views only after a user has already visited a site for the first time or when the cache has been cleared. Therefore the impact of this performance improvement depends on how often users hit your pages with a primed cache. In the case of About Online Matters, we still do not get lots of visitors, so you would expect that the impact of this change to the server would have little impact on our performance and, indeed, that proved to be true.

Keep Alive Connections

The Keep-Alive extension to HTTP/1.0 and the persistent connection feature of HTTP/1.1 provide long-lived HTTP sessions which allow multiple requests to be sent over the same TCP connection. What this does is prevent an extra HTTP request/response for every object on a page, and instead allows multiple objects to be requested and retrieved in a single HTTP session.  HTTP requests require a three-way handshake and have built in algorithms for congestion control that restrict available bandwidth on the startup of an HTTP session.  Making multiple requests in a single session reduces the number of times congestion control is invoked.  As a result, in some cases, enabling keep-alive on an Apache server has been shown to result in an almost 50% speedup in latency times for HTML documents with many images.  To enable keep-alive add the following line to your Apache configuration:

KeepAlive On

Is The Configuration Correct?

When I make these various changes to the server configuration, how can I verify they have actually been implemented?  This is where the HTTP headers come into play.  Let’s take a look at the prior response we got from www.aboutonlinematters.com when we made a HEADERS request:

Response: HTTP/1.1 200 OK
Date: Sun, 04 Apr 2010 00:17:06 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.3.0
X-Pingback: http://www.aboutonlinematters.com/xmlrpc.php
Link: <http://wp.me/DbBZ>; rel=shortlink
Content-Encoding: gzip
Cache-Control: max-age=31536000
Expires: Mon, 04 Apr 2011 00:17:06 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: Chunked
Proxy-Connection: Keep-alive
x-ua-compatible: IE=EmulateIE7

The line items in blue show that Gzip, expires headers, and keep-alive switches have been implemented on our server.  ETags won’t show in this set of responses because ETags are associated with a specific entity on a page.  They show instead in tools that provide detailed analysis of HTTP requests and responses, such as Live HTTP Headers or Charles.  No ETags should be visible in an HTTP request or response if FileETag: None has been implemented.

Results

We made changes in two steps.  First we activated Gzip compression, Expires Headers and removed ETags.  These changes made only negligible changes in overall web site latency.  Then we implemented the keep-alive  setting.  Almost immediately, our site latency improved in the Google Page Speed tool from being better than 53% of similar sites to being better than 61%.

We’ll stop there for today and pickup on content compression in the next installment.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare
Posts By Date
November 2014
M T W T F S S
« Jul    
 12
3456789
10111213141516
17181920212223
24252627282930