Tag Archives: search

DrupalCon 2010 Trip Report – Day 3

After attending a conference, I usually think, “Wow, we’re so far ahead here at Middlebury!” Not this time! DrupalCon was incredibly helpful in demonstrating all of the ways we can improve our site with better performance, better search, better content, and better code. I’m also really excited about the upcoming release of Drupal 7 and both confident we can move our site onto this new version and eager to use all the new features.

Here are the highlights from the last day: Continue reading

DrupalCon 2010 Trip Report – Day 2

Here is an overview and some notes from day 2 of the DrupalCon conference that Ian and I are attending in San Francisco. As Ian mentioned in yesterday’s report, day 1 of DrupalCon was mostly focused on the future of Drupal, specifically on the changes and improvements in the upcoming Drupal 7. Today’s sessions dealt much more with the current Drupal release, as well as with version-neutral topics.

Read on for more on the following topics:

  • Drupal deployment strategies
  • The Chaos tools for Drupal module development
  • Drupal in Education
  • Searching with Apache Solr
  • Recent MySQL happenings

Continue reading

Website Improvements #5: Search

When Middlebury first started using a Content Management System to organize its site in 2003 we added a local search engine for the site, operated by Atomz. This search engine wasn’t very popular, people weren’t finding the information they needed. At a meeting a couple years later, Barbara Merz remarked, “Why don’t we just get Google!?” So we purchased a Google Search Appliance (GSA) and set that up as our local search engine. Going into the Web Makeover Project, we thought we were safe on this subject. After all, the GSA was a Google project, it indexed all of our site’s content, we had put in Key Matches for the most relevant pages, people must be satisfied with this as our search engine.

Nope.

The Strategy

After “the font is too small” and “it’s too hard to edit”, search results were the top complaint about our old site during the web makeover’s requirements gathering phase. We heard that people got better results about our site from Google.com than they did from the GSA. The designers we worked with to build the new site proposed a solution in three parts: Continue reading

Search Statistics from the GSA

The Google Search Appliance lets us create “collections” of portions of the site that can be searched. These collections are what you see in the drop-down field on http://search.middlebury.edu. The LIS collection is also the one being searched if you enter a query on the LIS or Library home pages on the CMS. This collection is searched much, much less frequently than the main search, however we may find the results interesting.

Here are the top 15 queries of the LIS search collection in the last year:

jstor 51
tigercat 15
eres 11
psychinfo 10
special collections 9
mla 8
citation 8
JSTOR 8
oxford english dictionary 8
lexis nexis 7
thesis 7
printing 7
library hours 7
music library 7
segue 6

And here is the same for the LIS Wiki collection:

novell 2
server 2
tigercat 2
computer upgrade schedule 1
inurl:pdf 1
proprietary name 1
freeze 1
video 1
antivirus software 1
Novell, Tigercat 1
How to use Novell and Tigercat 1
wordpress 1
blog 1
after graduating 1
Controller’s Office 1

For comparison, and so you can see how infrequently the LIS search collection is used, here are the top 15 LIS-related queries from the “All” collection, which as you might expect indexes all the other collections:

segue 1152
library 388
email 306
tigercat 246
webmail 214
bannerweb 163
library hours 129
INB 119
jstor 108
inb 103
eres 102
banner web 83
netstorage 77
banner 72
computers 71

(Note on this last one: midcat is the 16th term with 70 queries).

You can see the full reports for each, which include more information at these on-campus-only links:

Report for LIS Collection 07/24/2008-07/24/2009

Report for LIS Wiki Collection 07/24/2008-07/24/2009

Report for All Collection 07/24/2008-07/24/2009

I also sent an email to Chris Norris asking for assistance getting us some information from Google Analytics. He was out yesterday and today, but appears to be in his office next week, just booked straight through with meetings. I’ll keep you posted on this item. Here is the list of questions I sent him:

1. What are the top 5 search terms within the LIS? (I’ll get this from the GSA)

2. What are the top 5 pages on the LIS site?

3. What is the most common click path from /academics/lis to the LISt blog and the LIS Wiki?

4. What links on /academics/lis (the landing page) are clicked on the least?

5. Same as (4), but applied to /academics/lis/lib, /academics/lis/help, and /academics/lis/about.

Directory Updates for September

The first meeting to review the feedback posted to the Web Makeover Blog for the online Directory occurred this afternoon. We agreed to a set of updates that will be made to the Directory prior to the start of the academic year in anticipation of the retirement of the print version of the Directory. These changes will focus on ensuring that the information included in the print Directory is accessible in the online version and small improvements in the online interface. We won’t be completely revising how the Directory work at this time, as we expect further changes to occur as a result of the Web Redo Project’s revisions to our overall Search strategy.

Here is what I’ll be working on changing:

  • Add the Department contact information to the Directory as a downloadable PDF. This is already done! We heard loud and clear that the information in the front of the Directory needed to be accessible online, and we’ve added a link to a PDF containing this information. The added advantage of this being in PDF form, for now, is that we recognize that this is the type of contact information people might need when they are not able to access a computer, either because the power’s out, or they’re working in a location without access to a machine, or traveling. You can print out this information at your leisure as a quick contact list for these situations.
  • Add A-Z links at the top of the Directory interface. Clicking on a letter will show a list of people whose last name begins with that letter and you can click on their name to see their record. This will give you a quick way to glance through the Directory.
  • We will add a field to the search form that lets you specify whether you want to search for only Faculty, Staff, Students, Language School personnel, MMLA, etc.
  • We will add a field to let you search by just first name, or just last name.
  • Approval of new photos will transfer to HR, and possibly other departments as makes sense.
  • In coordination with HR, we’ll review the current display settings for each field. There may be changes to how display settings permissions are handled in the Directory.
  • We’ll investigate a way to provide access to an online form year round that lets people update their Directory information.

This may not appear particularly ambitious, but we wanted to focus on what could be completed by the start of the academic year and not set ourselves up to be in a position where we’d have to redo this all depending on the work we’ll be doing for the rollout of the new website in January. I personally think this list helps address many of the concerns raised in the comments on the Web Redo Blog.

My Response to the Digital Media Tutor's Search Interface Feedback

Joe Antonioli recently asked the DMTs to evalute the search interface prototype I blogged about in a previous post. The responses I received from the DMTs were extremely thoughtful and helpful. I want to take this opportunity to provide some additional information about the search service and respond to their suggestions. First, there are two things about the new search service that may not have been entirely clear in the assignment.

1. The interface choices (colors, fonts, placement, images) on the prototype were chosen for expedience rather than good design. I reused several elements of past design to make something that would functionally work. Several DMTs responded and said they liked aspects of the design, which I certainly appreciate. However, we’ll be receiving new font, color and framing suggestions from White Whale as part of the Web Redo Project and when we do those choices will overwrite what is currently displayed on the search prototype.

2. The search engine is purposefully stupid and minimal. There will very likely be a box in the upper right hand corner of our new website where you enter search terms, click a search button and then are taken to a page with search results. This field won’t have additional selection boxes or drop downs or anything. That’s not to say that there can’t be an advanced search as well, but that the search needs to do *something* with very little information.

Those two things said, here are the main points I took from each of the DMT’s responses and my feedback on them.

You don’t always want to search all the search engines all the time. If I’m looking for a person, I might just want to search the directories and the website. If I’m looking for a book, I may only want to search the library catalog.

Excellent observation. There were two suggestions for improving this: (1) have the search results from each search engine appear in different tabs or pages so that you only have to look at one at a time, and (2) allow the user to select which search engines they want to use. I definitely want to do (2) and then see if that improves the experience enough or if we need to do (1) as well. We can implement (2) by adding a set of checkboxes under the search box that allows the user to choose which engines they want to use as one of the DMTs suggested.

There was also the suggestion that the labels of these selection checkboxes be hyperlinks that, when clicked, take you to the results from that engine. I would prefer not to do this because the suggested action for clicking on a checkbox label on a page is to check or uncheck the box associated with that label. This would break that design convention for sites. I’d rather add another row of labels under the selection area that act as these hyperlinks, if the navigation column on the left is insufficient to carry this action.

Show more results from Google, show fewer results from the library and the directories.

I agree that the Google results should list at least ten pages. I like the suggestion of limiting the results from the directories and the library by adding pagination to those results. I will not be able to effectively paginate the results from Google because their API only returns a small number of results to me – Google obviously doesn’t want you to be able to replicate their search engine on your site without displaying any of the ads they use to generate revenue. There are other restrictions with using results from Google: I cannot change the order of the results and I cannot intermix results from other search engines with results from Google. I will, however, change the script to show the top 10 results from Google, rather than the top 4.

Show more relevent results from the directory search. If I search from Ian McBride, I don’t want to have to scroll though all the Ians and all the McBrides to find him, I just want to see the information about the person I searched for.

As I said, the search is intentionally stupid. When you search for “Ian McBride” in the directory it doesn’t know if that’s a person’s name, job title, department, building location, or telephone number. So the directory search looks through an index of all those fields and spits back whatever it finds. Ideally, the directory search would list this information in order of relevence, so you’d see my entry at the top of the list, then the rest of the McBrides and then the rest of the Ians. This is really hard to do, but that doesn’t mean we shouldn’t try to do it.

Show fewer results from the library and tell me whether that resource is available or not and the branch location or whether it’s at another library.

The library search is really problematic. In order to search the current catalog system, I have to send a request to the catalog website with the search terms, parse the response looking for bibliography numbers and then send an additional request to the catalog for each bibliography number I found to get its author, title, etc. Our library staff has been looking at next generation library catalog front ends, like Scriblio, with the intention of providing a better search interface to the catalog. When we have a system like this set up that allows me greater programmatic access to the catalog information, I’ll be able to greatly improve the results from that system. I will change the current interface so that it only lists the first ten or so results, rather than the first 50.

Tell me what format the result document is in. If I’m opening a Word document by clicking on a link, I want to know.

Excellent suggestion. This should be handled by icons (with appropriate alt and tooltip text of course) for each file or media type.

Allow me to choose the number of results.

Absolutely, but with the recognition that this will be part of an advanced search interface.

The Back button on my browser doesn’t work with this search engine.

This is a usability issue which I need to correct. Clicking Back should bring you to the list of your last search results.

There were also two questions asked by the DMTs that deserve answers. The first was a compliment from one who liked the graphic of the mountains in the title background of the search result bubbles and wanted to know who came up with that design. The answer is Mark Zelis, the Web Producer for College Communications. Mark developed that design for the News Portal. You might also recognize Mark’s work on the new Institutional Diversity site and many email campaigns for the College.

The other question was, “I’m curious about Search. What is it, exactly? What search feature is in the works?” The answer to this can be found in the Strategic Recommendations Document from White Whale, which can be read by Middlebury community members at the Web Redo Blog.

I want to again thank the Digital Media Tutors for their feedback and encourage them to continue to test our systems and provide feedback on them. The more eyes there are on these interfaces, the more we can improve them and make them easier to use.

Can we get rid of the paper Directory?

Here’s an extract from an email I sent out recently in response to this question. Some of the suggestions here would also help us improve how we structure user permissions, return search results generally, and consolidate how we display information about people accross multiple institutions:There are both programmatic and culture issues with the current Directory. Here are the things I think we’d need to change to really be able to get rid of the print Directory. By the way, I think the suggestion to have a PDF (or plain HTML with no search) version of the Directory is a really good one. This would deal with the issue of printing costs and provide a usable alternative to the search interface.

1. Cultural: You can hide information from the Directory. My phone number isn’t listed because I don’t like receiving phone calls and the Directory, ever since its first online version, has allowed people to hide whatever information they like (including their whole record). If we eliminate the print Directory, we need to reach an understanding on campus that certain information will always be displayed (which fields are default will differ between faculty, staff, and students) and that people can’t choose to hide their records. It’d be great, too, if we could encourage people to have a visible Directory photo, but I won’t push my luck.

2. Programmatic: There are no numbers to call for departments. Robert Armstrong in Public Safety feels the pain here, since he’s the first result when you search for that department and gets all the call directly, instead of people calling a central line. Bob Clagett gets a lot of this too, since there’s no contact information for “Admissions”, people just assume that they should send emails right to him. This is a rather easy problem to fix: I just need to compile a list of department contact information and program the directory so that you see it as the first listing if you search for only a department.

3. Programmatic: The search algorithm kind of stinks. It is incorrect, by the way, that you can’t browse all the T’s. Enter T into the “Person’s Name” field and see for yourself (please don’t actually do this). Realistically, we should be using the GSA: providing it an RSS feed of Directory information to crawl and letting it handle Directory search results. I’m not sure how well the results would come out, but we could have this system operate side-by-side with the current search to provide more options.

4. Cultural/Programmatic: The org chart at Middlebury is a closely guarded secret, for reasons I’ve never fully understood. It would seem to make sense on the Directory that some level of organization is presented. For instance, I should be able to search the Directory for the Web Services (or whatever we’re calling ourselves) group and see Joe listed at the top as manager with Adam’s, mine, and Travis’s profiles listed below in alphabetical order. Ideally, the page would also have a link to the ETI listing, with Jeff at the top. This is a bit trickier to implement than the others because none of this information is tracked in the AD.

I’ve written four versions of the Directory application so far while working here and these same issues keep coming up. Another thing to note: the Directory is the only design template we received from Big Bad that was implemented in an application outside of the CMS (it was actually in the CMS for one of those four versions). I would expect WW to include some form of design for the Directory in their deliverables.

Note: since I wrote that email, I’ve found ways to improve how the Directory search works and implemented some of those ideas in the custom search interface referred to in my post yesterday. I no longer strictly believe that we should feed Directory search results through the GSA.

New Search Interface

One of the strategic recommendations we’ve heard back from White Whale is that our search interface can be improved by incorporating search results from multiple search sources, rather than just the GSA’s internal index. Additionally, they’ve recommended that we use Google.com’s search results scoped to our site, rather than the on-campus tool. Joe Antonioli asked me to spend some time this week building out a rough prototype of how this system could work. The first version of the prototype is available (on-campus or via VPN only) at: http://chisel.middlebury.edu/search/

Please try out this prototype and let me know how it could be improved. Don’t worry about the look-and-feel right now since those changes will be delivered by White Whale as part of the design specification. Instead, try some different searches, examine the results, and think of things the service could do to make the results more understandable to you. If the results of your search are way off-base for the search terms you used, let me know about that too. And, of course, this is such a rough prototype that there are bound to be bugs, which I’d be happy to fix.

Here are some things I already know (but if you agree or disagree, let me know):

  1. The Directory search is *really* slow. This is because we’re doing a simple search, rather than the advanced search you see on the general Directory interface. I don’t know if you’re searching for a name, a phone number, or a department, so I search all the fields for all the terms. As you might imagine, this isn’t a very fast search algorithm. What might we do to improve this?
  2. You only get 4 Google search results. I’m using Google’s RESTful search API, rather than the AJAX search API. I wanted to avoid use of JavaScript on this interface so that it would be as accessible as possible for all users. The RESTful API is much less flexible in terms of what results you can receive. They’re only giving me four at a time and I suspect this is because its not really in Google’s interest to let us create a version of their search results page without any of their advertising. I could improve this by using the AJAX version, which would require using JavaScript. But if I start using JS for this, I could use it for all the searches and do them asynchronously, which would improve response time on the page (you wouldn’t need to wait for the Directory searches to load to see the library catalog results, for example). What are people’s thoughts on this?
  3. I want to extend this to include other search sources. GO is the next up, but it doesn’t have a search interface yet, so I have to write that before I can include it here. What other services might we include in this search interface?

You can comment on this at my blog in the comments field for this entry (http://chisel.middlebury.edu/wordpress/imcbride/2009/06/02/new-search-interfacenew-search-interface/) or by emailing me directly. I’d prefer comments to be posted to the blog so that we can have a public discussion about this, but I’ll respect the confidence of anyone who would prefer not to have their thoughts published.