Posts by Adam Franco

 
 
 

More GO Info

Categories: Midd Blogosphere

Since we moved GO to its new home last week I’ve been busy fixing a number of bugs that have come up, as well as made a few improvements that I hope will be helpful.

Today’s big improvement is that the GOtionary now provides info pages for every shortcut.

The info page will tell you who created and who administers the shortcut so that you know who to contact to when a link is broken. As well, the info page will now serve as the landing page when trying to access a broken GO shortcut, rather than being presented with a blank screen.

Head to the GOtionary to check it out.

GO is moving to a new server

Categories: Midd Blogosphere

Over the past few years the GO shortcut and redirection application has become central to the college’s web infrastructure, allowing easy-to-remember permalinks that can be updated as resources are moved.

Tomorrow morning we will be migrating GO from a multi-use Windows server to its own RedHat server. The primary impetus for this move is to resolve a PHP-on-Windows memory leak bug that has taken out GO for several minutes every few months. In addition to this bug fix, migrating GO to its new environment allows a few additional improvements at this time:

  • GO will be on its own server, more isolated from interference from other applications
  • GO will now fail-over to a secondary database should its primary database become unavailable.
  • Improved user-information caching will dramatically speed up the self-service admin screens
  • Redirects will now be re-written internally, requiring one less round-trip to the GO application for every redirect.
  • go/shortcut should now work more reliably on the MIIS network without having to type the full go.miis.edu/shortcut URL in the address bar.

    Note: the full http://go.middlebury.edu/shortcut or http://go.miis.edu/shortcut URL should still be used when putting links in websites or email.

  • The GOtionary will now live under go.middlebury.edu and go.miis.edu, allowing go.miis.edu to have its own logo.

We do not anticipate that this migration process will result in any downtime as the new GO server and the old GO server will both continue to operate at the same time, against the same database. After we switch the DNS records for go.middlebury.edu and go.miis.edu users will slowly move over to the new GO server as their computers look up the address of go.middlebury.edu again. For on-campus users this may happen quickly, while for off-campus users it may take several weeks. After the vast majority of users are accessing the new GO server (likely two weeks or so), we will turn off the old GO server.

Update 1 – June 23rd

We successfully migrated go.middlebury.edu to the new host and haven’t had any problems. We’ll be waiting for a while for go.miis.edu to switch over.

Website Performance: Pressflow, Varnish, Oh-My!

Categories: Midd Blogosphere

Executive summary:

We’ve migrated from core Drupal-6 to Pressflow, a back-port of Drupal-7 performance features. Using Pressflow allows us to cache anonymous web-requests (about 77% of our traffic) for 5-minutes and return them right from memory. While this vastly improves the amount of traffic we can handle as well as the speed of anonymous page-loads it does mean that anonymous users may not see new versions of content for at most 5 minutes. Traffic for logged-in users will always continue to flow directly through to Drupal/Pressflow and will always be up-to-the-instant-fresh.

Read on for more details about what has change and where we are at with regard to website performance.


Background

When we first launched the new Drupal website back in February we went through some growing pains that necessitated code fixes (Round 1 and Round 2) as well as the addition of an extra web-server host and database changes (Round 2).

These improvements brought our site up to acceptable performance levels, but I was concerned that we might run into performance problems if the college ended up in the news and thousands of people suddenly went to view our site.

At DrupalCon a few weeks ago I attended a Drupal Performance Workshop where I learned a number of techniques that can be used to scale Drupal sites to be able to handle internet-scale traffic — not Facebook or Google-level traffic, but that of The Grammys, Economist, or World Bank.

Since before the launch of the new site we were already making use of optcode-caching via APC to speed code execution and were doing data caching with Memcache to reduce the load on the database. This system-architecture is far more performant than a baseline setup, but we still could only handle a sustained average of 20 requests each second before the web-host started becoming fully loaded. While this double our normal average of 10-requests per second, it is not nearly enough headroom to feel safe from traffic spikes.

Diagram of the execution flow through the web-host using normal Drupal page caching.

Request flow through our Drupal web-host prior to May 13th; using normal Drupal page-caching stored in Memcache. Click for full-size.

Switching to Pressflow

Last week we switched from the standard Drupal-6.16 to Pressflow-6.16.77, a version of Drupal 6 that has had a number of the performance-related improvements from Drupal-7 back-ported to it. Code changes in Pressflow such as dropping legacy PHP4 support and using only MySQL enable Pressflow execute about 27% faster than Drupal, a useful improvement but not enough to make a huge difference were we to get double or triple our normal traffic.

For us, the most important difference between Pressflow and Drupal-6 is that sessions are ‘lazily’ created. This means that rather than creating a new ’session’ on the server to hold user-specific information on the first page each user sees on the website, Pressflow instead only creates the session when the user hits a page (such as the login page) that actually has user-specific data to store. This change makes it very easy to differentiate between anonymous requests (no session cookies) and authenticated requests (that have session cookies) and enables the next change, Varnish page caching.

Varnish Page Caching

Varnish is a reverse-proxy server that runs on our web hosts and can return pages and images from its own in-memory cache so that they don’t have to execute in Drupal/Pressflow every single time. The default rule in Varnish is that if there are any cookies in the request, then the request is for a particular user and should be transparently passed through to the back-end (Drupal/Pressflow). If there are no cookies in the request, then Varnish assumes correctly that it is an anonymous request and tries to respond from its cache without bothering the back-end.

Request flow through our Drupal/Pressflow web-host after May 13th; using the Varnish proxy-server for caching. Click for full-size.

Request flow through our Drupal/Pressflow web-host after May 13th; using the Varnish proxy-server for caching. Click for full-size.

Since about 77% of our traffic is non-authenticated traffic, Varnish only sends about 30% of the total requests through to Apache/PHP/Drupal: all authenticated requests and anonymous requests where the cache hasn’t been refreshed in the past 5 minutes. Were we to have a large spike in anonymous traffic, virtually all of this increase would be served directly from Varnish’s cache, preventing any load-increase on Apache/PHP/Drupal or the back-end MySQL database. In my tests against our home-page varnish was able to easily handle more than 10,000 requests each second with the limiting factor being network speed rather than Varnish.

A histogram of requests to the website. Y-axis is the number of requests, X-axis is the time to return requests, '|' requests were handled by Varnish's cache and '#' were passed through to Drupal. The majority of our requests are being handled quickly by Varnish while a smaller portion are being passed-through to Drupal.

A histogram of requests to the website. Y-axis is the number of requests, X-axis is the time to return requests, '|' requests were handled by Varnish's cache and '#' were passed through to Drupal. The majority of our requests are being handled quickly by Varnish while a smaller portion are being passed-through to Drupal.

MySQL Improvements

During the scheduled downtime this past Sunday, Mark updated our MySQL server and installed the InnoBase InnoDB Plugin, a high-performance storage engine for MySQL that can provide twice the performance of the built-in InnoDB engine in MySQL for the types of queries done by Drupal.

Last week Mark and I also went through our database configuration and verified that the important parameters were tuned correctly.

As the MySQL database is not currently the bottleneck that limits our site performance these improvements will likely have a minor (though wide-spread) effect. Were our authenticated traffic to further increase (due to more people editing for instance) these improvements will be more important.

Where We Are Now

At this point the website should be able to handle at least 20,000 requests/second of anonymous users (10,000 on each of two web-hosts) at the same time that it is handling up to 40 requests/second from authenticated users (20 on each of two web-hosts).

While it is impossible to accurately translate these request rates into the number of users we can support visiting the site, a very rough estimation would be to divide the number of requests/second by 10 (a guess at the average number of requests needed for each page view) to get a number of page-views that can be handled each second. (1)

In addition to how many requests can be handled, how fast the requests are returned is also important. Our current response times for un-cached pages usually falls between 0.5 seconds and 2 seconds. If pages take much longer than 2 seconds, the site can “feel slow”. For anonymous pages cached in Varnish response times range from 0.001 seconds to 0.07 seconds, much faster than Apache/Drupal can do and more than fast enough for anything we need.

The last performance metric that we are concerned with is about the time it takes for the page to be usable by the viewer. Even if they receive all of the files for a page in only 0.02 seconds, it may still take their browser several seconds to parse these files, execute javascript code, and turn them into a displayable graphic. Due to these factors, my testing has shown that most pages on our site take between 1 and 3 seconds for users to feel that our pages are loaded. For authenticated users, this stretches to 2-4 seconds.

Finally please be aware that, anonymous users see pages that may be cached for up to 5 minutes. While this is fine for the vast majority of our content, there are a few cases where we may need to have the content shown be up-to-the-second fresh. We will address these few special cases over the coming months.

Future Performance Directions

Now that we have our caching system in place our system architecture is relatively complete for our current performance needs. While we may do a bit of tuning on various server parameters, our focus now shifts to PHP and Javascript code optimization to further improve server-side and client-side performance respectively.

One big impact on javascript performance (and hence perceived load-time) is that we currently have to include two separate versions of the jQuery Javascript Library due to different parts of the site relying on different versions. Phasing out the older version will reduce almost by half the amount of code that the browser has to parse.

Additional Notes

(1) As people browse the site their browser needs to load the main HTML page as well as make separate requests for Javascript files, style-sheet (CSS) files, and every image. After these have been loaded the first time, [most] browsers will cache these files locally and only request them again after 5 minutes or if the user clears their browser cache. CSS files and images that haven’t been seen before will need to be loaded as new pages are browsed to. For example, the first time someone loads the Athletics page, it requires about 40 requests to the server for a variety of files. A subsequent click on the Arts page would require an additional 13 requests, while a click back to the Athletics page would require on 1 additional request as the images would still be cached in the browser.

Introducing: The Identity Management Project

Categories: Midd Blogosphere

The Identity Management Project kicked off in December of 2009. The current project team (small ‘t’) is Tom Cutter, Adam Franco, Mike Lynch, Chris Norris, Carol Peddie, Mark Pyfrom, Jeff Rehbach, Mike Roy, and Marcy Smith.

The Identity Management (IDM) project seeks to organize our concept of a “person” or “identity” among our various systems (including Banner, the Active Directory, web-applications, hosted systems, and others). This project focuses on three facets of each identity:

Unique identifier:
Every identity would have a unique identifier. Currently, only people in Banner have one of its identifiers (guests and vendor-staff aren’t in Banner) and only people in AD have log-in names (alumni, parents, and others aren’t in the AD).
Unified Properties:
Each identity will have a set of properties (name, email, address, title, department, etc) that is consistent and available to all of our applications. Currently user properties may be different or unavailable depending on which source of user information is used; a person’s title is a good example of this inconsistency.
Roles:
Identities will gain zero or more “roles” that can be used to grant or deny access to our systems and services. We currently have no consistent way (in AD or web applications) of determining if a person is a current student, faculty, staff, or other role — the best we can do now is to look at membership in certain mailing lists like “All_Faculty”. With the IDM project, we will be able to access an authoritative list of the current roles for a person (visitors would have no roles) and will be able to ensure that access to services properly matches an individual’s relationship to the college.

In addition to organizing and improving the properties and roles of our current set of users (current students, faculty, staff, emeriti, vendors, spouses, and limited guests), the IDM project will also enable us to expand the number of usable (authenticate-able) accounts to include alumni, prospective students, and visitors. As well, we gain the potential to include users from other institutions via federated authentication systems such as Shibboleth.

Here is a list of a few things that will become possible with completion of the IDM project:

  • Rather than accounts being immediately deleted upon graduation, they instead would loose the “student” role and gain the “alumnus” role. These users would continue to use their same log-in credentials access alumni-only and public resources (i.e. commenting on blogs, renewing library books), but would loose access to student-only resources (i.e. course websites, JStore and other subscription library materials).
  • We will be able to grant access (individually or in groups) to many of our online systems for guests, alumni, emeriti, visitors, vendors, perspectives, and others with loose affiliations with the college.
  • Inter-institutional projects will be able to make use of any of our online systems as collaboration platforms.
  • A fan of Middlebury Hockey could create a visitor account to use for purchasing panther gear from the college book store, then come back and log in with the same account to purchase tickets from the box office, make comments on the coach’s blog, and fill out a form to sign up their kids for participation in the Winter Carnival ice show. Their name, email, mailing address, and other properties would be available to all of the systems.

Please note that some of these examples will require additional changes and development projects beyond the IDM project itself. However, all require aspects of the IDM project to be possible.

Website Improvements #3: Better Performance

Categories: Midd Blogosphere

During the week since the new website has launched you many have noticed slow page-load times, especially when logged in and saving edits. For the past week the Web Application Development team and our colleagues in Central Systems & Network Services have been working to improve the performance of the site and prevent low performance from overwhelming the servers and causing intermittent outages. We have made several fixes over the past few days that bring us out of the slow-site woods and into sunny pastures of snappy responses.

The first big change was documented by Ian in Website Improvement #1: Reducing home page load time by 80%.

The second big change this week was a fix to prevent Google and other search engines from crawling a particularly slow editing page. Repeated hits to this page were overwhelming one our web-servers and slowing down requests for everyone.

The big change today was to move the databases for other web applications off of the database-server used by Drupal. This change has drastically improved our query-cache hit-rate and been the main factor in speeding up saves and other editing operations.

Travis has reworked how the Athletics-roster images were fetched from their database, improving the image-load times from 11 seconds to 12 milliseconds. The Athletics page loads much faster now.

The last performance improvement this week came from a fix to the access denied page. This fix prevents browsers from periodically falling into a loop of redirects that never ended. Preventing a never-ending stream of redirects gives a better user experience when trying to access a restricted page, as well as leaves more server power available for handling pages that will load successfully.

At this point authenticated users should be experiencing page-load times between 300 milliseconds and 5 seconds for almost all view and edit operations (down from a range of 2-25 seconds). Unauthenticated users should be experiencing page-load times between 20 milliseconds and 3 seconds for all page views. We plan to improve performance even further in the coming weeks, but our hope is that page speed is no longer a major impediment to performing needed tasks.

Thank you to our whole community for your patience while we worked through these growing pains.

Website Improvements #2: Custom Redirects

Categories: Midd Blogosphere

Our GO service has been and will continue to be our supported way for maintaining permalinks to resources. By publishing GO links to resources online and in print, you are able to move your resources to new homes (such as a different location in the new site, a blog, or a wiki) and update the go link with the self-service GO management screens.

During the web-makeover project planning it was decided that we need to move forward with a new site architecture (where everything lives) and drop support for the old URLs from previous versions of the site that are 3-15+ years old. Most of the time links can and should be updated at their original locations, but if that is impossible (such as in a print mailing), you can now ensure that the correct link shows up on the main site’s 404 page.

404-with_link_annotated

Steps to add a link for a 404 page on the main site:

  1. Create a nice GO shortcut to the new destination if one doesn’t exist.
    Go to the GOtrol Panel and create a new go shortcut to the new destination URL.
    If a go shortcut for this destination already exists, then you can skip this step.
  2. In the GOtrol Panel, click on the ‘Create’ tab and add an alias for your shortcut from step one. The important thing here is that the alias ‘name’ is the path portion of the URL that is hitting the 404 page after the initial ‘/’.

    For example, if this URL is getting a 404 page:
    http://www.middlebury.edu/area/department/someimportantpage/default.htm
    then the alias name should be:
    area/department/someimportantpage/default.htm

    go_admin-alias

  3. Go back to the 404 page and verify that it now includes the GO link to your resource.
    404-with_link

We still recommend that you update the pages that link to the site to use their new URLs or GO links, but if that is impossible, you now have a work-around to direct users to the appropriate place.