Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
Thom Vaughan and Pedro Ortiz Suarez discussed the power of Common Crawl’s open web data in driving research and innovation during two notable presentations last week.