< Back to Errata

Erratum

WARC-Target-URI May Include Non-ASCII Characters

Originally reported by 
.

The WARC-Target-URI header in WARC record, but also corresponding WAT, WET and URL index records may include non-ASCII characters, not encoded using percent-encoding or Punycode. The issue has been fixed for June 2024 (CC-MAIN-2024-26). Additional information is provide in the corresponding Crawler Issue Report.

Affected Crawls
Affected Web Graphs
No items found.