acm-header
Sign In

Communications of the ACM

ACM TechNews

Putting the Web in a Spreadsheet


View as: Print Mobile App Share:

IBM researchers have developed BigSheets, a data analysis tool based on Hadoop designed to help users analyze large Web data sets. BigSheets uses Hadoop to comb through Web pages, looking for key terms and other data. BigSheets organizes the information in a very large spreadsheet, where users can analyze the data by using normal spreadsheet software.

BigSheets also works with an IBM visualization tool called Many Eyes, as well as other visualization software.

IBM first tested BigSheets at the British Library, which has been working to create an archive of about eight million U.K. Web sites. In less than eight hours, BigSheets took 4.5 terabytes of archived files and processed them using a Hadoop cluster of four machines.

University of Michigan professor Eytan Adar says BigSheets is useful because it compares data from many different pages as well as over time. He says effective visualizations are "crucial for letting users quickly understand large collections of data."

View a video of IBM Vice President of Emerging Technology Rod Smith discussing BigSheets.

From Technology Review
View Full Article

 

Abstracts Copyright © 2010 Information Inc., Bethesda, Maryland, USA


 

No entries found