< Back to Blog
April 16, 2025

IIPC General Assembly & Web Archiving Conference 2025

Note: this post has been marked as obsolete.
The Common Crawl team attended the 2025 IIPC General Assembly and Web Archiving Conference in Oslo, presenting recent work and participating in discussions on web preservation.
Thom Vaughan
Thom Vaughan
Thom is Principal Technologist at the Common Crawl Foundation.

Last week members of the Common Crawl Foundation team (Sebastian Nagel, Pedro Ortiz Suarez, and Thom Vaughan) attended the 2025 IIPC General Assembly (GA) and Web Archiving Conference (WAC), hosted by the National Library of Norway in Oslo.  As new members of the IIPC, we are thrilled to join a global community of organizations committed to preserving the web for future generations, and to have the chance to present some of our work among colleagues in the web archiving space.

National Library of Norway: the home of Norwegian knowledge and heritage in Oslo
National Library of Norway: the home of Norwegian knowledge and heritage in Oslo

Common Crawl delivered a range of contributions including poster presentations, lightning talks, and a workshop. These were very well received, and we appreciated the many conversations that followed.

Sebastian Nagel presenting for Common Crawl at the IIPC General Assembly 2025 at the National Library of Norway
Sebastian Nagel presenting for Common Crawl at the IIPC General Assembly 2025 at the National Library of Norway

We had the opportunity to (re)connect with representatives from several national libraries, including those of Norway, Sweden, Denmark, France, and the Netherlands, as well as researchers and professionals from industry and academia.

Pedro Ortiz Suarez presenting at the IIPC WAC 2025 for Common Crawl on ARC and WARC formats
Pedro Ortiz Suarez presenting at the IIPC WAC 2025 for Common Crawl on ARC and WARC formats

Among our lightning talks, posters, and workshops, our team gave presentations during the General Assembly and Web Archiving Conference on:

Thom Vaughan presenting at the IIPC WAC 2025 for Common Crawl on the Robots Exclusion Protocol
Thom Vaughan presenting at the IIPC WAC 2025 for Common Crawl on the Robots Exclusion Protocol

Our team also met with Stephan Oepen from the University of Oslo, and colleagues from the End of Term Archive project with whom we’ve collaborated on the EOT 2024: Ilya Kreymer of Webrecorder, Sawood Alam of the Internet Archive, and Mark Phillips of the University of North Texas Libraries.

Left to right: Thom Vaughan, Ilya Kreymer, Sawood Alam, Mark Phillips, Pedro Ortiz Suarez, Sebastian Nagel: members of the End of Term Archive team met at the IIPC WAC 2025
Left to right: Thom Vaughan, Ilya Kreymer, Sawood Alam, Mark Phillips, Pedro Ortiz Suarez, Sebastian Nagel: members of the End of Term Archive team met at the IIPC WAC 2025

We’re looking forward to more discussions with our friends (new and old) from IIPC in the near future.

This release was authored by:
No items found.

Erratum: 

Content is truncated

Originally reported by: 
Permalink

Some archived content is truncated due to fetch size limits imposed during crawling. This is necessary to handle infinite or exceptionally large data streams (e.g., radio streams). Prior to March 2025 (CC-MAIN-2025-13), the truncation threshold was 1 MiB. From the March 2025 crawl onwards, this limit has been increased to 5 MiB.

For more details, see our truncation analysis notebook.