Web Archive Cooperative (WAC) at Harding University Research | Home

The Web Archive Cooperative (WAC) is a three-year NSF sponsored project (1008492) whose goal is to address the barriers of accessing and sharing disparate archives of web data (e.g., archived web pages, Twitter updates, user-generated tags, and newsgroup postings). WAC is a joint venture with Hector Garcia-Molina and Andreas Paepcke at Stanford University, Michael L. Nelson at Old Dominion University, and Frank McCown at Harding University. See the Stanford WAC website for work performed at Stanford University and the Web Science and Digital Libraries Research Group blog to keep up-to-date with work performed at ODU.

Introduction to Web Science

Dr. McCown taught a course entitled Introduction Web Science in Spring 2011. This was a survey course for junior and senior Computer Science majors which covered some of the fundamental concepts of Web Science: web architecture, web characterization and analysis, web archiving, Web 2.0, web search engines, analyses of social networks, collective intelligence, recommender systems, and clustering algorithms.

See the class web page for slides, homework problems, and project descriptions.

Dr. McCown presented a talk entitled Teaching an Introduction to Web Science Course to CS Undergraduates at the Teaching the Web with Web Science workshop at the Web Science 2012 conference (June 21, 2012 in Evanston, IL).

Summer Research

Harding comp sci majors are hired as researchers each summer to work on WAC projects. Below is a summary of the research performed the past few summers.

Vivens Ndatinya

Vivens Vivens worked on a project that would automatically locate non-accessible music videos from YouTube. For example, if someone posted a video to YouTube that was later taken down (because of copyright violations or other reasons), Viven's project would help locate where the same or similar video would be located in YouTube.

Date: Summer 2011

Richard Schneider

Richard Richard performed some research examining the Mobile Web from a web archiving perspective. He built a program that would find web pages designed for mobile devices and measured the similarity with standard web pages. He also built a classifier to determine if a web page was designed for a mobile device or desktop based on some web page features.

JCDL 2013 paper: First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites

Date: Summer 2012

Daniel Sebastian

Daniel Daniel continued the work performed by Vivens the year before. He has designed a Firefox add-on called Volitrax which will automatically redirect the user to a new location of a music video on YouTube if the video is removed. The add-on communicates with a web server who stores information about music videos in Twitter, tumblr, and delicious so the data is likely to survive long after the application itself has deceased.

JCDL 2013 poster: Semi-Automated Rediscovery of Lost YouTube Music Videos

Date: Summer 2012

Heather Tweedy

Heather Heather completed work on an iOS version of the Memento Browser. The web browser uses the Memento protocol to allow users to see archived versions of web pages in a seemless mannor. The initial browser code was developed by Dr. Steve Baber and completed by Heather in August 2011. It is available for download from iTunes and the Google Play Store, and you can download the source code from here.

JCDL 2013 poster: A Memento Web Browser for iOS

Date: Summer 2012

Monica Yarbrough

Monica Monica continued Richard's work and built a web service called MobileFinder that will return the mobile website URL for a given desktop URL. Monica also ran an experiment to see how many popular websites follow the Google's guidelines for mobile-optimized websites.

Date: Summer 2013

Keith Enlow

Keith Keith worked on archiving mobile websites using Heritrix. He made some modifications to Heritrix so the user could indicate if they wanted the desktop or mobile version of the site. The modified Heritrix crawler uses the MobileFinder web service.

Date: Summer 2013

Exploring the WAC: Challenges in Providing Access to the World's Web Archives

The WAC Summer Workshop was held at Stanford University on June 29-30, 2012. Twenty-one graduate and undergraduate students from a number of universities attended the workshop free of charge (thanks to NSF grant 1009916). The workshop featured speakers from Stanford, Los Alamos National Laboratory, Internet Archive, UC Berkeley School of Law, California Digital Library, Microsoft Research, and other research labs.

You can read Scott Ainsworth's summary of the workshop here and my write-up about the trip here.

Group photo

Web Archiving and Digital Libraries Workshop (WADL 2013)

Keith Enlow, Monica Yarbrough, and Frank McCown attended WADL 2013 in Indianapolis immediately following JCDL 2013. McCown gave an overview of archiving the mobile web followed by Monica who shared her work developing a MobileFinder web service for web archivists that can automatically locate mobile websites. Keith concluded the talk showing how he had used the web service with Heritrix to archive several mobile websites. Here are the slides from the talk.

WADL 2013 group photo

This material is based upon work supported by the National Science Foundation under Grant No. 1008492 (WAC). Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).