Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
The Common Crawl team attended the 2nd Conference on Language Modeling in Montréal, organizing a workshop, giving invited talks, and strengthening links with the research community.
Malte Ostendorff
Malte is a Senior Research Engineer at Common Crawl, based in Berlin, Germany. He holds a Ph.D. in computer science from the University of Göttingen.