< Back to Blog
December 17, 2024

Common Crawl Foundation at NeurIPS 2024: Expanding Horizons and Building Connections

Note: this post has been marked as obsolete.
The Common Crawl Foundation attended NeurIPS 2024, connecting with organisations, hosting a social event on tech and social impact, and showcasing contributions to AI research and data access.
Stephen Burns
Stephen Burns
Stephen Burns is an accomplished marketing leader with a comprehensive background in digital and event marketing.

Last week, members of the Common Crawl Foundation team—Chris, Greg, Jason, Rich, Sam, Stephen, and Wayne—attended the Neural Information Processing Systems (NeurIPS) Conference at the Vancouver Convention Centre in downtown Vancouver, BC. Set against the backdrop of Vancouver’s stunning waterfront, with snow-capped mountains and vibrant cityscape, the conference drew over 7,000 attendees from around the world.

Attendees at the Neurips 2024 conference in Vancouver BC.
Attendees at the NeurIPS 2024 conference in Vancouver BC.

Meaningful Connections and Opportunities

We attended NeurIPS with the goal of understanding potential partnerships and learning from the AI research community. During the conference, we had the opportunity to meet with people from over 40 organizations, each conversation offering insights into potential collaborations and ways we might support the broader AI ecosystem.

Common Crawl and Wikimedia Social: Bridging Tech and Social Impact

Our signature event at NeurIPS was a compelling social gathering titled "Nonprofits Bridging Tech and Social Impact." This two-hour event brought together over 60 participants from academia and industry, showcasing the critical work of nonprofit technology organizations.

Common Crawl’s CTO Greg Lindahl presenting at NeurIPS 2024.
Common Crawl’s CTO Greg Lindahl presenting at NeurIPS 2024.

Presentations

  • An introduction to Wikimedia and Common Crawl, illuminating our respective missions
  • An exploration of Common Crawl's dataset quality and the complexities of web crawling presented by Greg Lindahl
  • Chris Petrillo provided a deep dive into Wikipedia's editing landscape, exploring community dynamics
  • An interactive Q&A session that sparked robust discussion

The event transitioned into roundtable discussions, also providing a unique networking opportunity. Participants from various backgrounds exchanged their ideas about AI, technology, and social impact.

Additional Conference Highlights

We were excited to support our colleague Professor Ludwig Schmidt, who delivered a highly effective tutorial titled "Advancing Data Selection for Foundation Models: From Heuristics to Principled Methods." His presentation explored critical approaches to data selection in foundation model training, discussing everything from algorithmic foundations to practical data curation techniques. The session delved into attribution-based approaches, diversity-based methods, and emerging strategies for optimizing model performance through intelligent data selection.

Prof. Ludwig Schmidt presenting his slide featuring his annotations on the celebrated xkcd 2347.
Prof. Ludwig Schmidt presenting his slide featuring his annotations on the celebrated xkcd 2347.

We were also lucky to meet Dr. Fei-Fei Li, a key industry leader often referred to as the "Godmother of AI", and co-founder of the Stanford HAI (Human-centered Artificial Intelligence) Department where she reiterated how critical Common Crawl’s work is in the industry. Dr. Li’s was also one of the Key Invited Talks at the conference and highlighted Common Crawl in Slide #104 of her presentation.

Dr. Fei-Fei Li presenting at NeurIPS 2024.
Dr. Fei-Fei Li presenting at NeurIPS 2024.

Team members participated in several standout events. Rich Skrenta and Greg Lindahl attended a dinner with MLCommons which was sponsored by Tola Capital. The evening featured an impressive lineup of speakers, including Lora Aroyo from Google, Sarah Hooker from Cohere, Rishi Bommasani from Stanford University, and Peter Mattson from MLCommons and Google.

Left-to-right: Sam Reddy, Stephen Burns, Wayne Yamamoto, Jason Grey, Greg Lindahl, Chris Tolles, Rich Skrenta

Looking Forward

The NeurIPS conference was a resounding success, strengthening our connections and highlighting Common Crawl’s role in the AI research community. We look forward to building on these partnerships and continuing to provide high-quality, open-access web data to support innovation in AI.

This release was authored by:
No items found.