Gajula Mahesh Kumar: How to Create a Web Server Farm Cache

Introduction

Dynamic database-driven web sites are required if your site has large quantities of data to upload, manage, and serve back to your customers. However, IT departments typically say this creates too big a load for the database -- hence many people keep the files on the hard drive along with their script pages. Not only does this not scale well when you expand from one web server to a web farm, it is hard to back-up, and the file system makes for a poor database -- lacking the transactional qualities of a database.

You can keep your files and images in the database if you take advantage of client-side caching and implement a web server file cache. The benefits include:

All your content in one place: textual, images, and relationships are in the database.
The ability to deploy the files to multiple web servers, without additional replication work.
The ability to backup all the content in a single database backup.
Minimum database reads for files, on low-write / high-read systems.

This article will show a simple way to deploy caching so that you can keep all your content in the database efficiently.

Associated Article

This article is provided in conjunction with another article about streaming uploaded files. The caching article's portion of this web site is in the Cache.aspx file. The code for the upload portion of that web site and the associated file is called UploadBlob.aspx.

Requirements

You need an IIS web server and a SQL server. The SQL tables, and stored procedures are provided in the Upload.sql file. You will also need files to upload. The /images directory of the associated web site has three images that you can use. A small view of each of the three images is below.

The SQL statement for creating the table is:

CREATE TABLE [dbo].[UploadBytesOfBlob]( [FileGuid] [uniqueidentifier] NOT NULL, [Name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Content] [varbinary](max) NULL, [ContentType] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Size] [numeric](18, 0) NULL, [NumberOfChunks] [numeric](18, 0) NULL, [FirstModifiedDate] [datetime] NULL, [LastModifiedDate] [datetime] NULL, CONSTRAINT [PK_UploadBytesOfBlob] PRIMARY KEY CLUSTERED ( [FileGuid] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]

The primary key for the table in the sql database is called FileGuid and is of type UniqueIdentifier. This is a Guid in terms of asp.net.

Compile and Run the Site

Download, compile, and run the site. Make sure to create the SQL table and stored procedures using the Upload.sql file. You will be presented with the Default.aspx file which has a link to the UploadBlob.aspx page. Upload the ascent.jpg file found in the /images directory. After the upload, you will be redirected to the default.aspx and a GridView will show the image information. Do not click on either link just yet. Open a windows explorer and check that the file guid listed in the default.aspx is not found in the /CacheFolder of the web site. The folder should be empty. When the image is requested, the folder will have a file name guid.dat where the actual database guid is the prefix of the filename.

How the Cache System Works

For the rest of this article we will use the following web site idea. Assume we have an editor who uploads images as part of his work. The image is large and must eventually be loaded onto 12 web servers. The image may change from time to time so that the relevance of the image is current. When the image does change, it is important that the latest version of the image is used and not the image from the client-side cache or the web-server cache.

A customer of the web site then requests the page which includes the image. What happens? For the purposes of explaining the system, there will be several requests.

The First Request

On the very first request, the image is in the database but not in any cache. This system checks the client-side cache for the image including the relevant date of the image using the If-Modified-Since request header. After the file is not found there, since it is the first request, the system checks the web server file cache including the relevant date of the image. The file is not found there, so the last step is to grab the image from the database. Since the web server is grabbing the file, the file is placed in the web's cache folder (our own system, not the IIS cache), and returned to the client. As part of the brower's work to handle the image coming down, it is placed in the client-side cache.

The Second Request

The customer comes back a second time to request the file while the file on the client-side cache is still the most current version. The cache system responds with a 304 and the client's system fetches the image locally. This has saved a trip to the database and it has also saved a trip to the web server's file system.

The Content has Changed

Imagine the editor has now uploaded a new version of the image into the database. All twelve web servers need to be updated with this new version. Instead of uploading the file to all twelve servers at the time of the change, the change is made to each web server as the image is requested. This means that the work of moving the file from the database to the web server is only done when necessary and is based on a client-request. This saves the IT department from having to deal with replication of the content across the web servers.

The Third Request

On the next request, the file in the client's local cache is no longer current. The cache system checks the web server cache to see if that version is the most current. If the web server cache folder does have the most current, then that version is returned to the client. If that web server cache folder version is not the most current, the current image is fetched from the database, reloaded into the web server cache by writing the database content to a file, then returned to the client and placed in the client cache.

The URL for the system looks like cache.aspx?FileGuid=XYZ... The FileGuid is used instead of a name or incremental number for the content so that the database can grow to a large size while still managing the content.

The image URL on the page then looks like .

The Sample Web Site

The sample web site has three main files: default.aspx, cache.aspx, and uploadblob.aspx. Download, compile and run the web site. Make sure to create the SQL table and stored procedures using the Upload.sql file. Put a break point at the top of the Page_Load method of the cache.aspx.cs file. This is the area of code this article focuses on.

The first time you load the file, nothing will appear except a link to the upload page on the default.aspx. As you upload images, this page will show a GridView listing those images.

Click on the Upload File link, and you will go to the upload page. Once you upload the ascent.jpg image, you will be redirected to the default.aspx file which will now show a GridView with a row for the ascent.jpg file:

Set a break point in the cache.aspx at the top of the Page_Load function. Now click on the link in the FileGuid column. The breakpoint should activate in Visual Studio.

How the Code Works

The FileGuid of each row is used in the URL to the cache.aspx file. The URL looks like cache.aspx?FileGuid=91d... The FileGuid is not only the primary key for the database but is also the name of the file in the cache folder.

The code from cache.aspx for reading the FileGuid from the URL is:

// DFB: fileGuid is required if (Request["FileGuid"] == String.Empty) throw (new Exception("FileGuid missing")); // DFB: Capture fileGuid Guid fileguid = new Guid(Request["FileGuid"]); // DFB: fileGuid is required if (fileguid == Guid.Empty) throw (new Exception("FileGuid missing"));

Notice that the FileGuid value from the Request is cast to a Guid datatype in aspx. This allows you to deal with the Guid object more easily and convert to the appropriate SQL data type when you need to.

Fetch From the Client Cache

The client's browser cache is checked by comparing the If-Modified-Since request header against the LastModifiedDate from the database. If the client's browser cache has the most recent copy, a respond code of 304 is returned.

The code from cache.aspx for checking the client's cache If-Modified-Since request header is:

// DFB: connect to database using (SqlConnection sqlConnection = new System.Data.SqlClient.SqlConnection(ConfigurationManager.ConnectionStrings["Test"].ConnectionString)) { sqlConnection.Open(); // DFB: Get LastModifiedDate value only. We only fetch more from this row as needed. using (sqlCommand = new SqlCommand("select LastModifiedDate from SingleDataStore where FileGuid='" + fileguid.ToString() + "'", sqlConnection)) { lastModifiedDate = DateTime.Parse(sqlCommand.ExecuteScalar().ToString()); } // DFB: Get Client-side Browser Cache's If-Modified-Since Date if(!String.IsNullOrEmpty(Request.Headers["If-Modified-Since"])) { ifModifiedSince = DateTime.Parse(Request.Headers["If-Modified-Since"], System.Globalization.CultureInfo.InvariantCulture); // DFB: If the client-side browser cache has the most current version, // then return 304 and let the client fetch from its own cache if (lastModifiedDate <= ifModifiedSince) { Response.StatusCode = 304; Response.StatusDescription = "Not Modified"; Response.End(); return; } }

From this point on in the code, the content will either come from the local web server's cache or the database.

The Web Server's File System Cache

In order to reduce reads from the database and propagate the content to all web servers in the farm, the cache.aspx file will store a copy on the local file system of each web server in the farm as the content is requested.

The first time a request for the data is made, the data is served from the database and a file is created on the local web server. Each subsequent request is served from the local web server instead of making the unnecessary request for the content from the database if the LastModifiedDate value in the database hasn't changed.

The location of the cache folder is in the web.config. You will want to put the folder at the root of the website. Make sure the IUSR_servername, IWAM_servername, and the ASPNET local accounts have full permission to the cache folder.

// DFB: Build up cache path with file name where file name is guid.dat // and the guid is the primary key of the table cachePath = String.Format("{0}{1}.dat", ConfigurationManager.AppSettings["CacheFolder"], fileguid.ToString());

The file name will be FileGuid.dat. The file isn't given an extension associated with the type of content because we do not need to associate it to a type of content until it is requested. It is faster to read the database content type (mime type) than to look up the corresponding file extension in the registry and then determine its content type.

When we set the file's last modified date, the date will be the LastModifiedDate found in the database. This allows the code to determine if the file in the cache is out of sync with the database in future requests for the file.

If the directory is not found, it will be created. If the file is found in the directory, the date of the file is compared to the known LastModifiedDate returned from the database in the above code used for checking the client's cache. If the file is not found in the cache folder, the dirty flag is set to true.

// DFB: if the file system cache folder doesn't exist, create it if (!Directory.Exists(ConfigurationManager.AppSettings["CacheFolder"])) { Directory.CreateDirectory(ConfigurationManager.AppSettings["CacheFolder"]); dirty = true; } else { // DFB: Check to see if the file is in the cache if (File.Exists(cachePath)) { // DFB: Check the last write time of the file FileInfo fileInfo = new FileInfo(cachePath); // DFB: If times are different, mark the flag // DFB: marking dirty mean to go get new data out of database lastWriteTime = fileInfo.LastWriteTime; dirty = (lastWriteTime != lastModifiedDate); } else { // DFB: file is not found in cache dirty = true; } }

The dirty flag indicates that the content does need to be fetched from the database and written to the cache folder. It is important to understand that the content is read back from the database one chunk at a time inside a WHILE look. This allows for only the chunk size of the entire file to be in memory at a time.

if (dirty) { // DFB: Cached File Doesn't Exist or has old data so get data. // DFB:Write Database Content to file using (FileStream fileStream = File.Create(cachePath)) { using (BinaryWriter binaryWriter = new BinaryWriter(fileStream)) { Int64 sqlRead = 0; Int64 index = 0; Byte[] buffer = new Byte[1024]; using (sqlCommand = new SqlCommand("select Content from UploadBytesOfBlob where FileGuid='" + fileguid.ToString() + "'", sqlConnection)) { using (SqlDataReader sqlDataReader = sqlCommand.ExecuteReader(CommandBehavior.SequentialAccess)) { if (!sqlDataReader.HasRows) throw (new Exception("Invalid Id")); sqlDataReader.Read(); while ((sqlRead = sqlDataReader.GetBytes(0, index, buffer, 0, buffer.Length)) != 0) { // WWB: Cast Is Fine Size Buffer < int.MaxValue binaryWriter.Write(buffer, 0, (int)sqlRead); index += sqlRead; } } } } } // DFB: Time Stamp The Last Modified Time With // The Last Modified Time of the FileStore Object // So that we can track changes FileInfo fileInfo = new FileInfo(cachePath); fileInfo.LastWriteTime = lastModifiedDate; }

Once the file is in the cache folder, you need to construct the response to the client. This includes cache values, the content type, and the content.

// DFB: Setup To Successfully Cache The Page On the Client Response.Cache.SetLastModified((lastModifiedDate).ToUniversalTime()); Response.Cache.SetCacheability(HttpCacheability.Public); // DFB: ETag determines if data has changed on server // DFB: ETag can be server/programmer specific be should // be generated at the server to indicate data uniqueness // DFB: For this program, I'll has the row's id with the row's // last modified date Response.Cache.SetETag(Hash(Request["FileGuid"].ToString() + lastModifiedDate)); // DFB: Set The Content-Type So the Browser Knows How To Render // The Response using (sqlCommand = new SqlCommand("select ContentType from SingleDataStore where FileGuid='" + fileguid.ToString() + "'", sqlConnection)) { Response.ContentType = sqlCommand.ExecuteScalar().ToString(); } // DFB: Writing The Cached File Allows IIS To Stream The Output Response.WriteFile(cachePath); } }

Subsequent Requests for the Content

After the content is in a file in the cache folder, all subsequent requests will either come from the client's browser cache or the web server's cache folder until the content LastModifiedDate changes on the server. Since both situations check the LastModifiedDate in the database before returning the content, you can be sure that only the most current version is displayed on the client.

When the data is updated in the database, the client cache and the web server cache versions will be checked for date freshness and the dirty flag will be set to true. At this point, the entire content will be re-fetched from the database and the file recreated on the file system.

Clearing the Web Server Cache

Now that the file is in the web server cache with a file name that corresponds to the FileGuid, you can clear out the cache on a schedule of your choosing. You can delete all the contents, all contents before a certain date, or all contents of a certain content type.

Summary

This caching system illustrates how to build a database-driven web site using the single storage of a database while still using the cache features of the client and web server. The caches are checked using the If-Modified-Since of the cache content and the LastModifiedDate of the database file in order to deliver the most current data. Once the database content is modified, the cached file is no longer valid.

Gajula Mahesh Kumar

Wednesday, January 20, 2010

How to Create a Web Server Farm Cache

Labels

Labels