<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Library &#38; Information Services &#187; MySQL</title>
	<atom:link href="http://sites.middlebury.edu/lis/tag/mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://sites.middlebury.edu/lis</link>
	<description>We Bring Knowledge to You</description>
	<lastBuildDate>Fri, 24 May 2013 14:57:22 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Website Performance: Pressflow, Varnish, Oh-My!</title>
		<link>http://sites.middlebury.edu/lis/2010/05/17/website-performance-pressflow-varnish-oh-my/</link>
		<comments>http://sites.middlebury.edu/lis/2010/05/17/website-performance-pressflow-varnish-oh-my/#comments</comments>
		<pubDate>Mon, 17 May 2010 18:33:00 +0000</pubDate>
		<dc:creator>Adam Franco</dc:creator>
				<category><![CDATA[LIS Staff Interest]]></category>
		<category><![CDATA[Central Systems & Network Services]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Enterprise Applications]]></category>
		<category><![CDATA[Middlebury]]></category>
		<category><![CDATA[MIIS]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[User Services]]></category>
		<category><![CDATA[Web Application Development]]></category>
		<category><![CDATA[website]]></category>

		<guid isPermaLink="false">http://sites.middlebury.edu/lis/?p=23170</guid>
		<description><![CDATA[Executive summary: We&#8217;ve migrated from core Drupal-6 to Pressflow, a back-port of Drupal-7 performance features. Using Pressflow allows us to cache anonymous web-requests (about 77% of our traffic) for 5-minutes and return them right from memory. While this vastly improves &#8230; <a href="http://sites.middlebury.edu/lis/2010/05/17/website-performance-pressflow-varnish-oh-my/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<h4>Executive summary:</h4>
<p>We&#8217;ve migrated from core Drupal-6 to Pressflow, a back-port of Drupal-7 performance features. Using Pressflow allows us to cache anonymous web-requests (about 77% of our traffic) for 5-minutes and return them right from memory. While this vastly improves the amount of traffic we can handle as well as the speed of anonymous page-loads it does mean that anonymous users may not see new versions of content for at most 5 minutes. Traffic for logged-in users will always continue to flow directly through to Drupal/Pressflow and will always be up-to-the-instant-fresh. </p>
<p>Read on for more details about what has change and where we are at with regard to website performance.</p>
<p><span id="more-23170"></span></p>
<hr />
<h4>Background</h4>
<p>When we first launched the new Drupal website back in February we went through some growing pains that necessitated code fixes (<a href="http://sites.middlebury.edu/lis/2010/02/08/website-improvements-1/">Round 1</a> and <a href="http://sites.middlebury.edu/lis/2010/02/12/website-improvements-3-better-performance/">Round 2</a>) as well as the addition of an extra web-server host and database changes (<a href="http://sites.middlebury.edu/lis/2010/02/18/website-improvements-4-previews/">Round 2</a>). </p>
<p>These improvements brought our site up to acceptable performance levels, but I was concerned that we might run into performance problems if the college ended up <a href="http://www.nytimes.com/2007/02/21/education/21wikipedia.html">in the news</a> and thousands of people suddenly went to view our site. </p>
<p>At DrupalCon a few weeks ago <a href="http://sites.middlebury.edu/lis/2010/04/21/drupalcon-2010-day0-performance/">I attended a Drupal Performance Workshop</a> where I learned a number of techniques that can be used to scale Drupal sites to be able to handle internet-scale traffic &#8212; not Facebook or Google-level traffic, but that of <a href="https://wiki.fourkitchens.com/display/PF/Who+uses+Pressflow">The Grammys, Economist, or World Bank</a>. </p>
<p>Since before the launch of the new site we were already making use of <a href="http://en.wikipedia.org/wiki/PHP_accelerator">optcode-caching via APC</a> to speed code execution and were doing data caching with <a href="http://memcached.org/">Memcache</a> to reduce the load on the database. This system-architecture is far more performant than a baseline setup, but we still could only handle a sustained average of 20 requests each second before the web-host started becoming fully loaded. While this double our normal average of 10-requests per second, it is not nearly enough headroom to feel safe from traffic spikes.</p>
<div id="attachment_23209" class="wp-caption aligncenter" style="width: 610px"><a href="http://sites.middlebury.edu/lis/files/2010/05/Page-Caching-Drupal-Memcache.png"><img src="http://sites.middlebury.edu/lis/files/2010/05/Page-Caching-Drupal-Memcache.png" alt="Diagram of the execution flow through the web-host using normal Drupal page caching." title="Page Caching - Drupal and Memcache" width="600" class="size-full wp-image-23209" /></a><p class="wp-caption-text">Request flow through our Drupal web-host prior to May 13th; using normal Drupal page-caching stored in Memcache. Click for full-size.</p></div>
<h4>Switching to Pressflow</h4>
<p>Last week we switched from the standard Drupal-6.16 to <a href="http://pressflow.org/">Pressflow-6.16.77</a>, a version of Drupal 6 that has had a number of the performance-related improvements from Drupal-7 back-ported to it. Code changes in Pressflow such as dropping legacy PHP4 support and using only MySQL enable Pressflow execute about 27% faster than Drupal, a useful improvement but not enough to make a huge difference were we to get double or triple our normal traffic. </p>
<p>For us, the most important difference between Pressflow and Drupal-6 is that sessions are &#8216;lazily&#8217; created. This means that rather than creating a new &#8216;session&#8217; on the server to hold user-specific information on the first page each user sees on the website, Pressflow instead only creates the session when the user hits a page (such as the login page) that actually has user-specific data to store. This change makes it very easy to differentiate between anonymous requests (no session cookies) and authenticated requests (that have session cookies) and enables the next change, Varnish page caching.</p>
<h4>Varnish Page Caching</h4>
<p>Varnish is a <a href="http://en.wikipedia.org/wiki/Reverse_proxy">reverse-proxy server</a> that runs on our web hosts and can return pages and images from its own in-memory cache so that they don&#8217;t have to execute in Drupal/Pressflow every single time. The default rule in Varnish is that if there are any cookies in the request, then the request is for a particular user and should be transparently passed through to the back-end (Drupal/Pressflow). If there are no cookies in the request, then Varnish assumes correctly that it is an anonymous  request and tries to respond from its cache without bothering the back-end.</p>
<div id="attachment_23218" class="wp-caption aligncenter" style="width: 610px"><a href="http://sites.middlebury.edu/lis/files/2010/05/Page-Caching-Varnish.png"><img src="http://sites.middlebury.edu/lis/files/2010/05/Page-Caching-Varnish.png" alt="Request flow through our Drupal/Pressflow web-host after May 13th; using the Varnish proxy-server for caching. Click for full-size." title="Page Caching - Varnish" width="600" class="size-full wp-image-23218" /></a><p class="wp-caption-text">Request flow through our Drupal/Pressflow web-host after May 13th; using the Varnish proxy-server for caching. Click for full-size.</p></div>
<p>Since about 77% of our traffic is non-authenticated traffic, Varnish only sends about 30% of the total requests through to Apache/PHP/Drupal: all authenticated requests and anonymous requests where the cache hasn&#8217;t been refreshed in the past 5 minutes. Were we to have a large spike in anonymous traffic, virtually all of this increase would be served directly from Varnish&#8217;s cache, preventing any load-increase on Apache/PHP/Drupal or the back-end MySQL database. In my tests against our home-page varnish was able to easily handle more than 10,000 requests each second with the limiting factor being network speed rather than Varnish.</p>
<div id="attachment_23249" class="wp-caption aligncenter" style="width: 610px"><a href="http://sites.middlebury.edu/lis/files/2010/05/Varnish-histogram.png"><img src="http://sites.middlebury.edu/lis/files/2010/05/Varnish-histogram.png" alt="A histogram of requests to the website. Y-axis is the number of requests, X-axis is the time to return requests, &#39;|&#39; requests were handled by Varnish&#39;s cache and &#39;#&#39; were passed through to Drupal. The majority of our requests are being handled quickly by Varnish while a smaller portion are being passed-through to Drupal." title="Varnish-histogram" width="600" class="size-full wp-image-23249" /></a><p class="wp-caption-text">A histogram of requests to the website. Y-axis is the number of requests, X-axis is the time to return requests, '|' requests were handled by Varnish's cache and '#' were passed through to Drupal. The majority of our requests are being handled quickly by Varnish while a smaller portion are being passed-through to Drupal.</p></div>
<h4>MySQL Improvements</h4>
<p>During the scheduled downtime this past Sunday, Mark updated our MySQL server and installed the <a href="http://www.innodb.com/products/innodb_plugin/">InnoBase InnoDB Plugin</a>, a high-performance storage engine for MySQL that can provide twice the performance of the built-in InnoDB engine in MySQL for the types of queries done by Drupal.</p>
<p>Last week Mark and I also went through our database configuration and verified that the important parameters were tuned correctly.</p>
<p>As the MySQL database is not currently the bottleneck that limits our site performance these improvements will likely have a minor (though wide-spread) effect. Were our authenticated traffic to further increase (due to more people editing for instance) these improvements will be more important.</p>
<h4>Where We Are Now</h4>
<p>At this point the website should be able to handle at least 20,000 requests/second of anonymous users (10,000 on each of two web-hosts) at the same time that it is handling up to 40 requests/second from authenticated users (20 on each of two web-hosts). </p>
<p>While it is impossible to accurately translate these request rates into the number of users we can support visiting the site, a very rough estimation would be to divide the number of requests/second by 10 (a guess at the average number of requests needed for each page view) to get a number of page-views that can be handled each second. <a href='#note1'>(1)</a></p>
<p>In addition to how many requests can be handled, how fast the requests are returned is also important. Our current response times for un-cached pages usually falls between 0.5 seconds and 2 seconds. If pages take much longer than 2 seconds, the site can &#8220;feel slow&#8221;. For anonymous pages cached in Varnish response times range from 0.001 seconds to 0.07 seconds, much faster than Apache/Drupal can do and more than fast enough for anything we need.</p>
<p>The last performance metric that we are concerned with is about the time it takes for the page to be usable by the viewer. Even if they receive all of the files for a page in only 0.02 seconds, it may still take their browser several seconds to parse these files, execute javascript code, and turn them into a displayable graphic. Due to these factors, my testing has shown that most pages on our site take between 1 and 3 seconds for users to feel that our pages are loaded. For authenticated users, this stretches to 2-4 seconds.</p>
<p>Finally please be aware that, anonymous users see pages that may be cached for up to 5 minutes. While this is fine for the vast majority of our content, there are a few cases where we may need to have the content shown be up-to-the-second fresh. We will address these few special cases over the coming months.</p>
<h4>Future Performance Directions</h4>
<p>Now that we have our caching system in place our system architecture is relatively complete for our current performance needs. While we may do a bit of tuning on various server parameters, our focus now shifts to PHP and Javascript code optimization to further improve server-side and client-side performance respectively. </p>
<p>One big impact on javascript performance (and hence perceived load-time) is that we currently have to include two separate versions of the <a href="http://jquery.com/">jQuery Javascript Library</a> due to different parts of the site relying on different versions. Phasing out the older version will reduce almost by half the amount of code that the browser has to parse.</p>
<h4>Additional Notes</h4>
<p><a name="note1"></a><strong>(1)</strong> As people browse the site their browser needs to load the main HTML page as well as make separate requests for Javascript files, style-sheet (CSS) files, and every image. After these have been loaded the first time, [most] browsers will cache these files locally and only request them again after 5 minutes or if the user clears their browser cache. CSS files and images that haven&#8217;t been seen before will need to be loaded as new pages are browsed to.  For example, the first time someone loads the <a href="http://www.middlebury.edu/athletics/">Athletics</a> page, it requires about 40 requests to the server for a variety of files. A subsequent click on the <a href="http://www.middlebury.edu/arts/">Arts</a> page would require an additional 13 requests, while a click back to the <a href="http://www.middlebury.edu/athletics/">Athletics</a> page would require on 1 additional request as the images would still be cached in the browser. </p>
]]></content:encoded>
			<wfw:commentRss>http://sites.middlebury.edu/lis/2010/05/17/website-performance-pressflow-varnish-oh-my/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DrupalCon 2010 Trip Report &#8211; Day 2</title>
		<link>http://sites.middlebury.edu/lis/2010/04/21/drupalcon2010-day2/</link>
		<comments>http://sites.middlebury.edu/lis/2010/04/21/drupalcon2010-day2/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 09:23:57 +0000</pubDate>
		<dc:creator>Adam Franco</dc:creator>
				<category><![CDATA[LIS Staff Interest]]></category>
		<category><![CDATA[Areas and Workgroups]]></category>
		<category><![CDATA[Conference Reports]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Enterprise Applications]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[Web Application Development]]></category>

		<guid isPermaLink="false">http://sites.middlebury.edu/lis/?p=22867</guid>
		<description><![CDATA[Here is an overview and some notes from day 2 of the DrupalCon conference that Ian and I are attending in San Francisco. As Ian mentioned in yesterday&#8217;s report, day 1 of DrupalCon was mostly focused on the future of &#8230; <a href="http://sites.middlebury.edu/lis/2010/04/21/drupalcon2010-day2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Here is an overview and some notes from day 2 of the DrupalCon conference that Ian and I are attending in San Francisco. As <a href="http://sites.middlebury.edu/lis/2010/04/20/drupalcon-2010-trip-report-day-1/">Ian mentioned in yesterday&#8217;s report</a>, day 1 of DrupalCon was mostly focused on the future of Drupal, specifically on the changes and improvements in the upcoming Drupal 7. Today&#8217;s sessions dealt much more with the current Drupal release, as well as with version-neutral topics.</p>
<p>Read on for more on the following topics:</p>
<ul>
<li>Drupal deployment strategies</li>
<li>The Chaos tools for Drupal module development</li>
<li>Drupal in Education</li>
<li>Searching with Apache Solr</li>
<li>Recent MySQL happenings</li>
</ul>
<p><span id="more-22867"></span></p>
<h3>Drupal deployment strategies</h3>
<p>The session <a href="http://sf2010.drupal.org/conference/sessions/dont-touch-server-toolkit-zero-touch-production-environments">&#8220;Don&#8217;t Touch that Server&#8221;: A toolkit for zero-touch production environments</a> focused on ways of deploying Drupal servers that don&#8217;t require SSHing into each machine and running updates. While this is hugely useful when managing 10-100s of servers, most of the techniques aren&#8217;t worth the effort for our little cluster of 4 webservers.</p>
<p>I did learn about a few neat tools and techniques that will likely be useful to improve the type of work that we do:</p>
<ul>
<li><a href="http://www.splunk.com/">Splunk</a> &#8212; A tool for aggregating, monitoring and viewing server logs from collected from many systems. This could make it much easier to see trends in the access and error logs of the three main-site web servers.</li>
<li>Syntax checking with <a href="http://www.icosaedro.it/phplint/">PHPLint</a> &#8212; <a href="http://luhman.org/blog/2010/02/12/cheap-php-lint-checking-git">Pre-commit hooks</a> can be set up in our source-control systems just to make sure that we never push a typo to production.</li>
</ul>
<h3>The Chaos tools for Drupal module development</h3>
<p>The session <a href="http://sf2010.drupal.org/conference/sessions/leveraging-chaos-tool-suite-module-development">Leveraging the Chaos tool suite for module development</a> discussed the variety of abilities included in the CTools module that make it much easier to build a variety of dynamic interfaces in Drupal.</p>
<p><strong><a href="http://zroger.com/node/30">Ajax without Javascript</a></strong><br />
AJAX Responder allows passing commands back and forth between JS and PHP.<br />
Drupal7 AJAX framework based off of this CTools implementation.</p>
<p><strong><a href="http://zroger.com/node/31">Ajax modal windows, the easy way</a></strong><br />
Modal Dialogs &#8211; built on top of the AJAX responder. Allows building modal windows with forms all with just a bit of PHP. Handles form validation and submission.</p>
<p>Object cache &#8211; non-volitile cache useful for &#8216;unsaved states&#8217; during multi-step forms.</p>
<p>Form Wizard &#8211; makes it much easier to create multi step forms. Conceptually, its a workflow of a separate single-page forms. Gives you back, finish, cancel, save buttons and controlling widgets and code.</p>
<p>CSS Tools &#8211; disassemble, reassemble, filter by properties/values, reassemble/render-css, compress CSS.</p>
<p>Dependent Fields &#8211; add two additional properties to form fields and the form will be dynamically changed based on choices.</p>
<p>Drop-down links &#8211; basically a single theme function, theme(&#8216;ctools_dropdown&#8217;, &#8230;), to create drop-down js menus like the contextual options  menus in D7 (panels &#8216;cogs&#8217;).</p>
<h3>Drupal in Education</h3>
<p>Before lunch Ian and I both went to a &#8220;Birds of a Feather&#8221; discussion on Drupal usage at Colleges and Universities. I split off with a sub-group to discuss the potential of Drupal as an LMS platform. To kick off efforts in this area, we formed <a href="http://groups.drupal.org/lms-learning-management-system">a new LMS group at groups.drupal.org</a> to discuss what features are needed in Drupal for it to replace Blackboard, Moodle, Sakai, and other LMS systems.</p>
<p>In conversations with Amherst developers over lunch we were reminded that their Ed-Tech group has already built a Gradebook and a Quiz module for Drupal. While these modules are currently tied somewhat to Amherst&#8217;s ERP system (Datatel), with some work they could likely be generalized to work with our Drupal installation as well as those at other schools.</p>
<p>There is another <a href="http://drupal.org/project/quiz">Quiz Module</a> available for Drupal as well.</p>
<h3>Keynote: Tim O&#8217;Reilly</h3>
<p>The keynote today was a talk by Tim O&#8217;Reilly called <a href="http://sf2010.drupal.org/conference/sessions/open-source-cloud-era">Open Source in the Cloud Era</a>. This was a nice talk, but not earth-shattering if you&#8217;ve heard Tim speak at his Web 2.0 conference or elsewhere. Good stuff, familiar theme.</p>
<h3>Searching with Apache Solr</h3>
<p>Ian and I both attended the <a href="http://sf2010.drupal.org/conference/sessions/apache-solr-search-mastery">Apache Solr Search Mastery</a> session. We recently set up a test instance of the Apache Solr search engine and Ian tried to get it operational for doing faceted searching of custom content types on our site. Unfortunately the documentation on how to do this is scattered all over the internet and an operational system wasn&#8217;t created. This session answered all of our questions and should allow us to proceed with setting up a faceted search system as well as other custom search abilities (like section-scoped search) in the future.</p>
<p>Blog posts from the presenters:<br />
<a href="http://acquia.com/blog/advanced-apache-solr-example-ip-based-access">http://acquia.com/blog/advanced-apache-solr-example-ip-based-access</a><br />
<a href="http://evolvingweb.ca/story/apache-solr-mastery-how-add-custom-search-paths-hookmenu">http://evolvingweb.ca/story/apache-solr-mastery-how-add-custom-search-paths-hookmenu</a><br />
<a href="http://acquia.com/blog/understanding-apachesolr-cck-api">http://acquia.com/blog/understanding-apachesolr-cck-api</a></p>
<h4>Notes</h4>
<p><strong>Fixed Fields:</strong><br />
Use the site &amp; hash fields to enable using a single search index for multiple sites.<br />
String type is for exact-matched strings like taxonomy terms rather than partial-matched text.</p>
<p><strong>Dynamic Fields:</strong><br />
Allows you to avoid customizing the the schema for custom content fields. These are set up by having a wild-card field for each data-type used in CCK.<br />
<em>CopyFields</em> for strings allow sorting on string fields.<br />
<em>NodeAccess</em> dynamic field allows restricting results based on permissions for most common node-access modules. Not sure if this will work with MM.</p>
<p><strong>APIs:</strong><br />
<code>hook_apachesolr_update_index</code>: Allows adding extra data (such as thumbnail image URLs) to the search index). </p>
<p><code>hook_apachesolr_node_exclude</code>: Allows custom logic for excluding nodes from search results.</p>
<p><strong>Custom search paths:</strong><br />
Use <code>hook_menu</code> to build up the nice search paths.<br />
Use <code>hook_menu_alter</code> to change the layout of the search page.</p>
<p>Theme search results with custom theme functions.<br />
<em>Note: solr doesn&#8217;t do any security filtering of results.</em></p>
<p><strong>Indexing CCK field info</strong><br />
<a href="http://acquia.com/blog/understanding-apachesolr-cck-api">http://acquia.com/blog/understanding-apachesolr-cck-api</a><br />
6.1 branch &#8212; Fields captured by default: strings in select/options fields<br />
6.2 branch &#8212; Adds date fields.</p>
<p><code>hook_apachesolr_cck_fields_alter(&amp;$mappings)</code>: Used to add/change which fields are indexed and how they are indexed.</p>
<h3>Recent MySQL happenings</h3>
<p>The <a href="http://sf2010.drupal.org/conference/sessions/future-mysql-forks-patches-and-decisions">The Future Of MySQL: Forks, Patches And Decisions</a> session was a good overview of the state of the MySQL database world now that pluggable storage engines are getting more common, Oracle bought Sun (and by extension MySQL-AB), and other developments.</p>
<p>The Oracle InnoDB plugin<br />
<a href="http://www.innodb.com/products/innodb_plugin/">http://www.innodb.com/products/innodb_plugin/</a><br />
- Higher performance version of the InnoDB engine. &#8220;Amazing&#8221;. Upgrade if at all possible. </p>
<p>XtraDB plugin from Percona<br />
<a href="http://www.mysqlperformanceblog.com/2008/12/16/announcing-percona-xtradb-storage-engine-a-drop-in-replacement-for-standard-innodb/">http://www.mysqlperformanceblog.com/2008/12/16/announcing-percona-xtradb&#8230;</a><br />
- A fork of the Oracle InnoDB plugin.<br />
- A big benefit from splitting Buffer Pool Mutex into typed mutexes for each operation so that non-conflicting operations won&#8217;t lock.<br />
- Rewrite of RW Locks.<br />
- More configuration for IO Thread Numbers, IO Capacity.</p>
<p>Ourdelta/Open Query &#8211; provides builds of MySQL with patch-sets from various sources (Google, etc).</p>
<p>MySQL 5.1 &#8211; What&#8217;s new?<br />
- Row level Replication rather than SQL-based replication. Removes the need for a lot of strange locks that were required to get SQL-based replication working. Removes the need for repeatability of SQL statements during replication. Still many stability issues, but hopefully they will be fixed soon.<br />
- InnoDB Plugin</p>
<p>MariaDB/MontyProgram<br />
- Pool of threads like apache rather than forking.</p>
<h4>Upcoming stuff:</h4>
<p>MySQL5.5 &#8211; SemiSynch Replication. Allows you to know that the data has been replicated to at least one slave.</p>
<p>Upcoming in MariaDB 5.2: varchar/blob for heap to prevent temporary tables from going to disk.</p>
<h4>Other Notes:</h4>
<p>Do not put a UNION inside a view! &#8211; Performance nightmare.</p>
]]></content:encoded>
			<wfw:commentRss>http://sites.middlebury.edu/lis/2010/04/21/drupalcon2010-day2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>DrupalCon 2010 Trip Report &#8211; Day 0</title>
		<link>http://sites.middlebury.edu/lis/2010/04/21/drupalcon-2010-day0-performance/</link>
		<comments>http://sites.middlebury.edu/lis/2010/04/21/drupalcon-2010-day0-performance/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 08:22:38 +0000</pubDate>
		<dc:creator>Adam Franco</dc:creator>
				<category><![CDATA[LIS Staff Interest]]></category>
		<category><![CDATA[Areas and Workgroups]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[Conference Reports]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Enterprise Applications]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Web Application Development]]></category>

		<guid isPermaLink="false">http://sites.middlebury.edu/lis/?p=22861</guid>
		<description><![CDATA[Here is an overview and some notes from the Drupal Scalability and Performance Workshop I attended before the start of the DrupalCon conference that Ian and I are attending in San Francisco. As the title suggests, this workshop was focused &#8230; <a href="http://sites.middlebury.edu/lis/2010/04/21/drupalcon-2010-day0-performance/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Here is an overview and some notes from the <a href="http://sf2010.drupal.org/conference/pre-conference-trainings/drupal-scalability-and-performance-workshop">Drupal Scalability and Performance Workshop</a> I attended before the start of the DrupalCon conference that Ian and I are attending in San Francisco. As the title suggests, this workshop was focused on making Drupal (and web-applications in general) run fast. Really fast. I hope to apply the techniques learned in this workshop over the next weeks and months to make our sites run fast enough to handle any traffic load that might be thrown at them, even were an event to occur that would send major public traffic to our sites.</p>
<p><em>Read on if you are interested in the performance and scalability of Drupal, MySQL databases, and web applications in general.</em></p>
<p><span id="more-22861"></span></p>
<p>After a two-hour overview of the various areas of application, server, database, and proxy configurations that can affect performance we spent much of the rest of the day deploying a basic Drupal installation on virtual machines and walking through the entirety of the discussed performance optimizations, running benchmarks at each stage.</p>
<h3>Benchmarking</h3>
<p>While many tools such as <a href="http://jakarta.apache.org/jmeter/">Apache JMeter</a> can be used to build complex test plans that more accurately simulate loads seen in production, the <a href="http://httpd.apache.org/docs/2.0/programs/ab.html">Apache Bench (<code>ab</code>)</a> tool is a very simple way to benchmark the effect a change has on the performance of a single page. This is critical for determining if a change that has performance implications increases or decreases the speed at which pages are served. Apache Bench can be run from the command line with arguments for the number of requests to make, the number of concurrent requests to run at the same time, and the URL of the page to test:</p>
<pre>ab -n 1000 -c 50 http://host.example.edu/drupal/</pre>
<p>As well, a cookie (copied from an authenticated session) can be used to measure the responsiveness for pages that require authentication:</p>
<pre>ab -n 1000 -c 50 -C 'SESSd1e57edd95f34cfedb49c219e55ddf26=f35afe5e50caf15cba72c448afe72f81'  http://host.example.edu/drupal/some/page/</pre>
<h3>Code Profiling</h3>
<p>Once slow pages are identified, the location of performance issues can be determined by code profiling.</p>
<p>The FourKitchens wiki has <a href="https://wiki.fourkitchens.com/display/PF/Code+profiling+and+load+testing+Pressflow+on+CentOS+5">a good page</a> on how to profile Drupal code using XDebugToolkit and generate traces like the one below that help us developers to easily see where in the PHP code most of the processing time is spent.<br />
<div id="attachment_22869" class="wp-caption aligncenter" style="width: 310px"><a href="http://sites.middlebury.edu/lis/files/2010/04/cg-1.png"><img src="http://sites.middlebury.edu/lis/files/2010/04/cg-1-300x250.png" alt="A code trace weighted by execution time." title="xdebug trace" width="300" height="250" class="size-medium wp-image-22869" /></a><p class="wp-caption-text">A code trace weighted by execution time.</p></div></p>
<h3>MySQL Database Tuning</h3>
<p>The first step in setting up a high-performing MySQL database server is to swap out the built-in InnoDB engine with the <a href="http://www.innodb.com/wp/products/innodb_plugin/">plug-in version</a> from InnoBase (a subsidiary of Oracle). The InnoDB Plugin performs two or more times faster than the built-in InnoDB engine and is a fully compatible drop-in replacement. (Packages of the InnoDB Plugin for RHEL are available as part of the <a href='http://blog.famillecollet.com/pages/Config-en'>Remi YUM Repository</a>). Other InnoDB engine implementations such as <a href="https://launchpad.net/percona-xtradb">XtraDB from Percona</a> add additional performance improvements and configuration abilities, but the InnoBase InnoDB plugin is itself a big win.</p>
<p>With the InnoDB Plugin installed, next up is tuning the configuration parameters of the database. In general, all of the configuration tuning is done to ensure that appropriate caches, data, indexes, and table-metadata all remain resident in system RAM rather than requiring the database server to load these from hard disk. For details, see my example <a href='http://sites.middlebury.edu/lis/files/2010/04/my.cnf_.txt'>my.cnf </a> configuration file with notes as well as these step-by-step <a href="https://wiki.fourkitchens.com/display/PF/MySQL+configuration+and+tuning">tuning instructions from Four Kitchens</a>.</p>
<p>One side note that came up in the workshop was the performance differences between the MyISAM engine and the InnoDB engine. The &#8216;conventional wisdom&#8217; is that the older, simpler, MyISAM engine is higher performing for read queries than the InnoDB engine since it has less overhead taken up by ensuring data integrity. What was discussed in the workshop is that this supposed edge of the MyISAM engine only actually applies in a tiny subset of cases: where the server is severely memory constrained (such as on small slice of a shared host) or where the data set is massive and cannot be reasonably held in RAM (such as some multi-terabyte reporting databases). Because the MyISAM engine is designed as a file-based engine and the InnoDB engine is designed to hold data in memory, it was said that in cases where much or all of the data and indexes can be held in memory the InnoDB engine will always out-perform the MyISAM engine &#8212; even on simple select statements.</p>
<h3>Opcode Caching</h3>
<p>One performance improvement we already have in place for most of our web applications is to cache compiled PHP &#8216;opcodes&#8217; in RAM using the <a href="http://php.net/manual/en/book.apc.php">PHP APC extension</a> so that PHP scripts don&#8217;t need to be read off of the hard disk and compiled each page load.</p>
<p>Having APC enabled also allows very lightweight page-load statistics monitoring to be done with the Drupal Devel-Performance module, giving use summary statistics of the actual page-load times that users are getting from the production webservers.</p>
<p><em>Note: APC &lt;= 3.0 has an issue where the entire cache is cleared if it becomes full.</em></p>
<h3>Memcache</h3>
<p>The second performance improvement that we are already using is to use <a href="http://memcached.org/">Memcache</a> and the <a href="http://drupal.org/project/memcache">Drupal Memcache Module</a> to allow cached data to live in memory on the the webservers and not require queries the database server. Since the [MySQL] database server is generally the performance bottle neck and is hard to scale horizontally, even moving simple cache look-ups off of it can be a big performance improvement.</p>
<p><em>Note: Use the dev version of the Drupal Memcache Module so that one large memory bin (possibly spanning multiple machines) can be used rather than having to separate cache &#8216;tables&#8217; into multiple bins. The &#8216;stable&#8217; version of the module didn&#8217;t prefix the cache &#8216;tables&#8217;, resulting in excess clearing of cached data when unrelated data was cleared.</em></p>
<h3>PressFlow</h3>
<p><a href="http://pressflow.org/">PressFlow</a> is a Drupal distribution designed for high performance that has several differences from the main release of Drupal:</p>
<ul>
<li>Legacy support for PHP4 is dropped, allowing better performance in PHP5</li>
<li>Support for databases other than MySQL is dropped, allowing PressFlow to take advantage of higher-performance features than the lowest-common denominator would support</li>
<li>Rather than being created on the first web request, sessions are only created when data is stored in them. This &#8220;Lazy Session Creation&#8221; allows proxies like Squid and Varnish to differentiate between anonymous and authenticated traffic and only cache content for anonymous users</li>
<li>It allows splitting of read-queries to read-only slave databases, taking load off of the read-write master database.</li>
</ul>
<p>With these changes noted, PressFlow is a drop-in replacement for Drupal Core.<br />
<a href="http://fourkitchens.com/pressflow-makes-drupal-scale">See this page for more information on PressFlow.</a></p>
<h3>Varnish</h3>
<p><a href="http://varnish-cache.org/">Varnish</a> is a reverse proxy* that can be run on the web-host in front of the Apache webserver to cache content destined for anonymous users and return that cached data to other anonymous users from memory, without the cached requests ever having to be processed by Drupal code.</p>
<p>* A &#8220;Forward Proxy&#8221; is a proxy-server that usually sits close to the clients (often at an ISP) and caches requests coming from those clients to the broader internet. Varnish, as a &#8220;Reverse Proxy&#8221; sits close to the webserver (even on the same machine) and caches requests coming into it from the broader internet.</p>
<p>Since anonymous traffic is often orders of magnitude greater than authenticated traffic, serving pages for anonymous users from a cache before PHP code is even touched can result in huge speed improvements for anonymous users. The cache performance can be as high as ~5,000-10,000 pages per second compared to a max of about 100-300 pages per second for executed Drupal code, even with its internal page caching turned on. Authenticated users get a speed boost as well since the total traffic being processed through the Drupal PHP code drops.</p>
<p><a href="https://wiki.fourkitchens.com/display/PF/Configure+Varnish+for+Pressflow">This wiki page</a> has a configuration file for Varnish working with Drupal. The basic idea is that the presence of Google Analytics cookies is ignored for the purpose of determining if the traffic is authenticated or anonymous. If any other cookies are set, then Varnish assumes that the traffic is for an authenticated user and it passes the request through to Apache/Drupal and doesn&#8217;t cache the result when sending it back to the client. If no additional cookies are present, then a result from cache is returned if found.</p>
<p>One downside with varnish is that caching is based on Expire headers, meaning that anonymous users may see slightly stale content (often set to be refreshed every 5 minutes or so). The Drupal Varnish module includes options that allow much longer Expire times to be set and pages to be actively cleared from cache when they are changed, but at the cost of additional complexity. Were active cache clearing desired, this would likely involve a lot of testing to ensure that all caches are appropriately cleared in all cases of content change.</p>
<h3>Hudson</h3>
<p><a href="http://wiki.hudson-ci.org/display/HUDSON/Meet+Hudson">Hudson</a> is a continuous integration tool that can be used as a kind of cron-on-steroids, allowing nice features like statistics of cron-run execution times and emails sent to admins if cron-runs have failures. It also prevents cron-runs from stepping on each other if previous executions haven&#8217;t completed yet.</p>
<h3>Other modules and notes related to performance</h3>
<ul>
<li><a href="https://launchpad.net/mv">Materialized Views</a></li>
<li>Boost &#8211; Writes out Drupal pages as static HTML. Varnish is a vastly better solution, but not possible on some web hosting providers (not an issue for Middlebury).</li>
<li>Panels &#8211; Can allow caching of blocks and other content pieces even for authenticated users</li>
<li>Views &#8211; Can cache results, just be careful for views where there may be node-access restrictions</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://sites.middlebury.edu/lis/2010/04/21/drupalcon-2010-day0-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
