Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
A recent article in The Atlantic makes several false and misleading claims about the Common Crawl Foundation, including the accusation that our organization has “lied to publishers” about our activities.
Rich Skrenta
Rich is Executive Director of the Common Crawl Foundation, an experienced technologist and serial entrepreneur with a background in the search and social spaces.