Use Cases
Articles
sclachar
Aysun Akarsu
Paola Villarrela
Kalev Leetaru
Julien Nioche
Jed Sundwall, Sebastian Nagel, Dave Rocamora
Alexander Bezzubov
Robert Meusel, Christian Bizer
Introduction of the distributed, parallel extraction framework provided by the Web Data Commons project.Dave Lester
Overview of Common Crawl with some example use cases.Gulliame LeBourgeois
Mapping French open data actors on the web with Common Crawl.Oskar Singer
Description of using Common Crawl data and NLP techniques to improve grammar and spelling correction, specifically homophones.Ahad Rana
Overview of the original Common Crawl crawler (in use 2008-2013) discussing the Hadoop data processing pipeline, PageRank implementation, and the techniques used to optimize Hadoop.Jesse Wang, Chris Bizer, Oliver Grisel, Soren Auer
Overview of Web Science including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.Stephen Merity
Using the Common Crawl data to perform wide-scale analysis over billions of web pages to investigate the impact of Google Analytics and what this means for privacy on the web at large.Amazon Web Services
Discussion of how open, public datasets can be harnessed using the AWS cloud. Covers large data collections (such as the 1000 Genomes Project and the Common Crawl) and explains how you can process billions of web pages and trillions of genes to find new insights into society.Primal Pappachan
Centipede: Analyzing web crawl data for context of a locationOpen Analytics
A tutorial on democratizing data development, references Common CrawlLisa Green
Common Crawl an Open Repository of Web DataJoe Griffin
Learn how iAcquire scaled identification of credible content producers – with credibility being based on authorship proliferation. CC used as seed sourceHannes Mühleisen
AWS Summit Berlin 2012 Talk on Web Data Commons. Large-Scale Web Analysis now possible with Common Crawl datasetsChris Bizer
Large focus on Common Crawl Corpus and Web Data Commons ProjectAndreas Maletti
References Common Crawl Corpus
Steve Salevan
In this screencast, we’ll show you how to go from having no prior experience with scale data analysis to being able to play with 40TB of web crawl information, and we’ll do it in five minutes.Stephen Merity
C205: Efficiently Tackling Common Crawl Using MapReduce & Amazon EC2Sebastian Spiegler
Sebastian Spiegler, leader of the data team at SwiftKey talks about the value of web crawl data, his research, and why open data is important.Lisa Green
Lisa Green, “Digital Preservation for Machine-Scale Access and Analysis”Lisa Green
“Data Track” Keynote at Data Days 2012 by Lisa Green from Common Crawl Foundation, recorded in Berlin, October 1st 2012.Lisa Green
“Data Track” Panel at Data Days 2012 with Stephan Baumann (German Science Institute for Artificial Intelligence), Daniel Dietrich (Open Data Foundation), Lisa Green (Common Crawl Foundation, San Francisco), Christopher Steiner (Best Selling Author, Chicago), Matt Turck (Bloomberg Ventures, NYC)Prashant Sharma
A demo of how to process big data on spark in a shell. Demo of Ngrams (with N=6) data of common crawl corpus and some interesting possibilities with queries.Jordan Mendelson
The general topic will be around utilizing open data and cloud computing resources so that everyone can benefit from modern big data methods.Lisa Green and Jordan Mendelson
Lisa Green and Jordan Mendelson present Common Crawl, a Web crawl made publicly accessible for further research and dissemination. In a second talk, Peter Adolphs introduces MIA, a Cloud-based platform for analyzing Web-scale data sets with a toolbox of natural language processing algorithms.
Articles
Aysun Akarsu
Paola Villarrela
Kalev Leetaru
Julien Nioche
Slide Presentations
Aysun Akarsu
Paola Villarrela
Kalev Leetaru
Julien Nioche
Videos
Do you like what you see here?
If you need further answers don't hesitate to get in touch.