The Data
Overview
Web Graphs
Latest Crawl
Statistics
Resources
Get Started
Blog
Examples
Use Cases
CCBot
Infra Status
FAQ
Community
Research Papers
Mailing List Archive
Hugging Face
Discord
Collaborators
About
Team
Mission
Impact
Privacy Policy
Terms of Use
Search
Contact Us
Read about the Increase of Common Crawl citations in academic research
Research Papers
Web Graph Strategies Against Unreliable News
Peter Carragher, Evan M. Williams, Kathleen M. Carley
Misinformation Resilient Search Rankings with Webgraph-based Interventions
Analyzing the Australian Web with Web Graphs
Xian Gong, Paul X. McCarthy, Marian-Andrei Rizoiu, Paolo Boldi
Harmony in the Australian Domain Space
The Dangers of Hijacked Hyperlinks
Kevin Saric, Felix Savins, Gowri Sankar Ramachandran, Raja Jurdak, Surya Nepal
Hyperlink Hijacking: Exploiting Erroneous URL Links to Phantom Domains
Enhancing Computational Analysis
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Computation and Language
Asier Gutiérrez-Fandiño, David Pérez-Fernández, Jordi Armengol-Estapé, David Griol, Zoraida Callejas
esCorpius: A Massive Spanish Crawling Corpus
The Web as a Graph (Master's Thesis)
Marius Løvold Jørgensen, UiT Norges Arktiske Universitet
BacklinkDB: A Purpose-Built Backlink Database Management System
Internet Censorship
University of Maryland, Nourin, Sadia, et al
Measuring and Evading Turkmenistan’s Internet Censorship
Internet Security: Phishing Websites
Asadullah Safi, Satwinder Singh
A Systematic Literature Review on Phishing Website Detection Techniques
More on Google Scholar
Curated BibTeX Dataset
Text Link