PawPrint.net News

search main menuTechnobloggleWeb Site Optimization

November 21st, 2010

Technobloggle
1 Chirp

Web Site Optimization

Speeding delivery of content, PHP optimization and caching

Over the last 2 weeks PawPrint has dramatically enhanced our existing optimization strategy (built over several years) to deliver dynamic web sites faster and more efficiently. This article explores how we keep sites fast and efficient for all clients, with the goal of providing ideas for others.

PawPrint builds web sites on a platform based on PHP, MySQL, CSS, and Javascript technologies. This article will therefore focus on those areas but has application regardless which languages are used. This is just a high level technical discussion of the methodology we use to make our web sites fast. There are many other things that could be done given different circumstances but we must account for a variety of hosting environments are a variety of client needs and applications.

It is important to note that there is no single golden rule for generating the fastest web sites possible. Server speed and hosting bandwidth as well as the client browser also all play their part. These practices are what we have found work best, however this is an area that is consistently evolving and we are ever tweaking our approach.

Web optimization really can be broken down into 3 primary areas:

Server Side
Data Transmission
Client Side

An effective strategy for any web site will seek gains in all three areas.

Server Side

On the server side we are going to leave the optimization of the database, operating system, and platform to other discussion and focus on what can be done programatically in PHP. Effectively we focus on minimizing database access through caching and writing efficient queries and minimizing processing in general through caching.

Minimizing Database Access

In any large web application they may be literally thousands of chunks of data that must be collected to generate a single web page. Something as simple as looking up a person´s name or address may be required multiple times on the page. If this information is pulled from the database each time it is required that can waste a lot of time. To easily eliminate unnecessary database queries we employ a RAM Based cache for queries used to generate a single page, once a chunk of data has been collected a single time all subsequent requests for the same data will be pulled from RAM. This eliminates database access and saves a considerable amount of time.

Minimizing Processing

Generating a single web page in a complex application my involve literally thousands of function calls, computations, collection of data, and queries to external servers. Each time the page is built all this overhead is required again. Because the application is dynamic and things may change usually the page must be rebuilt each time it is requested by a user. We use a disk based caching procedure to prevent regenerating pages for subsequent requests for the same information. The system must intelligently update itself should anything change, but it is possible to implement this without too much difficulty through a careful combination of request and state tracking as well as by deferring some data loading through Ajax. The system employed by the XDe keeps specific copies of the pages generated for both users (which may contain data specific to them) and non-users (which can be cached more effectively) in a file on-disk. When the same user or guest makes an identical request the system returns that file rather then regenerating the entire page. Because this eliminates all processing and database access the speed-up is quite dramatic.
We take care to drop the disk files if the site is updated to prevent old content being delivered to the user. The system is smart enough to check if the nature of the update means that all disk cache files need to be erased or only some of them.

Data Transmission

There is more then one might think that can be done to reduce the overall size of the data being transmitted between the server and the client to speed up a web site. The simplest and most effective method is to compress the data being sent. Most modern browsers can accept web content sent gzipped and most web servers as well as PHP have built in methods to transparently gzip the content as it is sent. We always ensure this is activated for our web sites (content as well as CSS and JS) which reduces the size of content usually by close to 40-50%.
All CSS and JS can also be "minified" (which involves stripping away all comments, spaces, newlines, and any other characters that aid developers in visualizing the code, but have no impact on how it functions and therefore can be removed. We have a built in minimizer that does this once for each code update and stores the minified CSS and JS files to deliver those to the client rather then the actual files we develop with.
Of course, in addition, we are very careful about how we code both our CSS and JS to ensure that it is efficient and small in the first place.

Content Delivery Networks

A slightly more advanced way to speed content delivery is to use a content delivery network, or at the very least a separate domain for static content like images and video. This provides speed increases through several fronts. First most browsers limit the number of concurrent downloads from a single domain - using more then one domain will increase the number of items being downloaded for you web page at the same time. Second, if a content deliver network is used the content will come from servers closer to the end-user. This reduces latency and speeds the delivery all other factors being equal (however typical content delivery network servers are also connected via very fast links to the net and therefore are also faster) Finally because the separate domain will probably not have cookies attached to it like your actual web domain does, your browser will not need to send as much data when it actually requests the file therefore the request itself is faster too.

Client Side

On the client side there are several factors that can be used to make pages load faster, primarily: caching and efficient rendering.

Browser Caching

Client-side caching is one of the best things to leverage to gain speed because any files cached on the client need not be rendered on the server nor transmitted, their display is virtually instant. However, client side caching for a dynamic web site is also one of the most difficult things to accomplish because we must ensure that the client browser actually confirms that the site has not changed (which, because it´s a dynamic site - it may frequently do) As a result many content management systems merely ignore client caching and tell the web browsers to cache nothing. This may be an easy approach but it´s not a very good one. The XDe tries to trigger the browser to use something called a "conditional get" where it will ask the server if the content has changed and only download the page if it has. This doesn´t work all the time and because some browsers don´t follow the standards very well we must be very careful about how this is implemented for the page HTML itself. The images, css, and Javascript on the page are a different matter though. We have gone to great lengths to ensure these items are almost always cached by the client (thankfully these are also the largest objects on the page in most cases) If any of these items change on the server they are always renamed forcing the browser to download them again. However, if they have not changed, even when the page does, the browser is free to use it´s cached copy saving many Kb of data transfer and dramatically speeding up the site.

Efficient Rendering

This last speedup has more to do with how fast the browser can render the completed page after it has all the objects downloaded. We take care in how we add CSS and Javascript and what exactly both these items do to be sure that any manipulation of the content on the page is efficient and speedy. In some cases going so far as to defer loading to complex Javascript until after the entire page has already been loaded and has displayed to make the response for the end user "feel" faster. Advertising, dynamic frequently changing content and various other page components can also be downloaded dynamically after the main page has displayed reducing the waiting time for the initial page display.

Conclusion

How much of a difference does all this make? We have implemented many of these changes gradually over a multi-year period, but our best estimates show page speeds of 3-6 seconds 4-5 years ago now looking more in the 0.3-0.8 second range - a factor of 10! with client caching the times drop to virtually unmeasurable levels. For a different statistic, Google provides an add-on for Firefox called page speed (something we recommend enthusiastically) it gives sites a score out of 100 for how many optimizations they employ. currently the PawPrint.net site is achieving a 97/100 on the home page (and not to nit-pick one of the reasons it isn´t higher is because we use a single CSS file for the entire site so it can be cached and page speed is complaining there is too much unused css code on the home page - this is, of course, by design) For comparison: a personal page on Facebook gets 86/100, Microsoft.com gets 79/100 and CNN.com gets a disappointing 73/100.

RSS feed	Feed Description
All News RSS feed	Complete RSS feed
Technobloggle	RSS feed for: Technobloggle
A Rich Site Summary (RSS) feed is an xml data file that provides a summary of the information contained here. It is not designed to be viewed in your browser, but instead by rss reader software. If you do not know what this means - you can safely ignore it, as it is provided for advanced users with rss reader software only.