Beyond the ACLS Report: An interview with John Unsworth

by Kevin Guthrie, Ithaka

I sat down with John Unsworth for 90 minutes at the American Library Association‘s conference in Washington, DC. John is Dean and Professor of the Graduate School of Library and Information Science at the University of Illinois, Urbana-Champaign and chaired the ACLS Commission on Cyberinfrastructure for the Social Sciences and the Humanities. Its report, Our Cultural Commonwealth, was released in December 2006. See Gary Wells’s review of the report in this issue. While the Commission’s report does not focus directly on liberal arts colleges, the development of a working cyberinfrastructure for the humanities and social sciences would definitely affect them, presumably in positive ways that would enhance their teaching and learning capacities.

Kevin Guthrie: I have to start with the somewhat obvious question. You have been thinking about this and talking about this for a long time: what exactly is cyberinfrastructure?
John Unsworth: We worked hard on the definition of cyberinfrastructure for the report and there’s a pretty good one there that builds on the one established in the 2003 National Science Foundation (NSF) report, Revolutionizing Science and Engineering Through Cyberinfrastructure (to which the ACLS Report was one of many responses). I like to think of cyberinfrastructure as the middle layer of a cake. The base layer is all of the hardware and basic operating systems-level technology on the network. Fiber optic cables, storage devices, things like that. The icing is made up of specific applications to serve a particular purpose. Software applications and tools that can be shared by different people for different purposes represent the middle layer of the cake and are what we mean by cyberinfrastructure. It is important to point out that cyberinfrastructure is not just equipment or software, it also includes the human interactions, protocols, standards, work processes, and so on, needed to make the system work and to structure collaborative or related activities. So, in the case of highways, the infrastructure is not just the roads, it is the maintenance crews that keep them functional, the speed limits and police cruisers that ensure safety, the understanding that you pass on the left. All of these elements are part of the “infrastructure.”

One of the compelling quotes from the NSF report is “if infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy.”[1] Your definition is helpful, but I do sometimes feel challenged to get my arms around how to build such an infrastructure on a system-wide basis. Are there any real-world examples of cyberinfrastructure yet? Would you say that eBay is a kind of cyberinfrastructure that facilitates the online exchange of goods? It has software applications, services for exchanging money (such as PayPal), acceptable mechanisms for posting items for auction, social protocols for evaluating the quality and reliability of sellers and buyers. Is that a reasonable example?
Yes, I think that’s a good one. One of the other ways of thinking about the development of cyberinfrastructure is to recognize that it is not something that gets established in one fell swoop. It’s so broad and encompassing that it tends to get built incrementally. We’ve seen the first wave in the building of the cyberinfrastructure that will transform scholarship in the form of digitized content. Over the last decade or so, libraries, publishers, individuals and nonprofits have created great quantities of digitized content with tools and applications to facilitate their use. When that content is combined with the born-digital content being created every day on the Web you have a huge layer of content on which to build valuable tools. Search engines are a layer on top of that content. Together these establish a base component of cyberinfrastructure upon which we can build.

Your example highlights the fast pace and relentless nature of technological progress. What starts out as new and innovative becomes a commodity layer so fast. A publisher might have built unique content and a home-grown search engine in the late 1990s and could have built a loyal and growing following based on the value of searching that content. Then, enterprises like Google enter the search business and their tools become widely used so that publishers will have to add a new layer to the cyberinfrastructure to continue to be useful and valuable.

Let me step back for a moment to the process of creating the cyberinfrastructure report. One thing I noticed in the description of that process was that there was international representation on the commission. Was there a difference in how representatives from other parts of the world viewed the challenge as compared to here in the U.S.?
I was struck by the difference in the funding structures for higher education and scholarship. For the most part, in other parts of the world, there is very little private philanthropy. There are some exceptions, for example the Wellcome Trust in the U.K, but for the most part there are no major foundations providing recurring grant funding into the environment, and government funding may not distinguish between the sciences and the humanities. I have to admit that initially the international model struck me as the better model, because it means that (apparently) you are competing in the same larger funding environment as your computer-science colleagues–and when funding is dominated by government funded agencies, especially a single agency as is the case in many places, there is a huge opportunity to ensure that activities are coordinated and even integrated. But over the course of watching reactions to our report play out in the community, I’ve come to appreciate the flexibility and vibrancy of the system here in the U.S. and the value in having diverse funding options and opportunities. Yes, there is less vertical integration, but there is some protection in that. The announcement earlier this year of a reduction of funds to the Library of Congress’s NDIIPP program[2] is one example of the perils in relying too much on single-source government funding, as are the cutbacks to the Arts and Humanities Data Service in the U.K.[3] So I have come to believe that one system is not necessarily better than the other; each has its pros and cons.

Continuing on the international theme, did the commission engage in issues related to cyberinfrustructure in under-resourced parts of the world, such as in developing countries?
You might be surprised at what counts as an under-resourced part of the world, with respect to technology. It’s not just countries in the developing world that have a long way to go in building cyberinfrastructure. For example, when I was at the University of Virginia, we brought in a number of American Studies scholars from around the world to ask what they would like to have networked access to in our library’s special collections. Their response was that the collections were great, but in Ireland, for one example, students wouldn’t have the lab or classroom facilities to make working with networked resources practical.

More than just the base layer, though, what is definitely in short supply at some institutions are the human resources that are needed to make cyberinfrastructure work. There are simply not enough skilled people out there, and this is very true of small colleges in this country as well as overseas. And even if you are fortunate at a small place to have some capability to help enable digital scholarship, that human capacity is mobile. You can invest a lot in it and it can leave you. If a key person leaves a small place, you can be totally back to square one. That is why it is important to remember that cyberinfrastructure includes people.

So there was the cyberinfrastructure report for the sciences, and then there is this report for the humanities and social sciences. Is there a difference? Why is there a need for more than one report?
In many ways there is convergence and the distinctions are blurring. But there are some very real differences in the way people do their work across disciplines that have to be taken into account in the resources and tools they need and will use. In some ways, this is less about the discipline per se, and more about the nature of the resources they depend on. So, for example, in areas of the sciences that depend heavily on massive quantities of observed data (say, from the Hubble telescope) there is already considerable collaboration and tools are needed to facilitate that. In areas of the humanities there are areas of research that are more singular.

Having said that, though, I do believe we are moving to a world where there will be massive amounts of data that humanists will need to sort and understand. Like the data from the telescope, it will have to be processed in a way that will require large-scale data mining activity and that will promote more collaboration. There is going to be much more “distant reading” of texts.[4] By that I mean computers will “read” texts and process them in a variety of ways, in accordance with more sophisticated semantic comparison and search tools, and prepare them for higher level analysis by scholars. The mass digitization projects are going to accelerate this process.

Speaking of the mass digitization projects, Google announced its Google Books Library Project while the commission was writing up the report. Did that affect your deliberations and your final conclusions?
It didn’t impact our report that much. We saw the potential and pointed to Google’s effort and the Open Content Alliance as important examples. I think Google’s announcement had a bigger impact on the report’s reception, as humanities scholars could actually imagine that millions of books would be available in digital form. The idea of cyberinfrastructure became much more real to them. Prior to those projects such a thing was only the stuff of dreams.

Some concluding thoughts: what other impacts has the report had?
We have been encouraged by the way the report has been received as a framework to think about these issues, and maybe even to help inform foundations’ and government agencies’ grantmaking strategies. I have been told on numerous occasions that the report has been helpful in this regard. Last spring, the Council on Library and Information Resources convened a group of federal funding agencies and foundations to discuss follow-up to the report and there was some good progress. We then had a meeting to discuss the need and potential for centers of excellence to take components of cyberinfrastructure forward. One of the things that I learned through those conversations is that the funding agencies tend to operate like any other enterprise (colleges and universities included), and collaboration is hard. He who starts it, owns it. If the report can provide a structure that helps guide even a small amount of collective action or coordination of resources, that would be a very good thing. The requirements of cyberinfrastructure are beyond the means of any single funding agency; in fact they exceed the resources of those agencies combined. We require sustained investment from the private, governmental and nonprofit and university sector to realize the great potential that digital and network technologies offer for scholarship for the next century.

Kevin Guthrie is President of Ithaka.
[1] National Science Foundation (NSF), Revolutionizing Science and Engineering Through Cyberinfrastructure (2003), 5.

[2] $47 million was rescinded from the budget of the National Digital Information Infrastructure and Preservation Program, Feb 17, 2007. See, for example, “LC Hit By $47 Million Cut in Digital Preservation Funds,” Library Journal (March 20, 2007), (accessed September 2,  2007).

[3] The Arts and Humanities Research Council announced that it would cease funding the Arts and Humanities Data Service as of March 31, 2008. The JISC is engaged in a review to determine the future of the AHDS and has stated that it will not fund the service in its current form alone. See (accessed September 2, 2007).

[4] See Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (Verso), 2005.

Sites DOT MiddleburyThe Middlebury site network.