Erratum
WARC-Target-URI May Include Non-ASCII Characters
Originally reported by
.
The WARC-Target-URI header in WARC record, but also corresponding WAT, WET and URL index records may include non-ASCII characters, not encoded using percent-encoding or Punycode. The issue has been fixed for June 2024 (CC-MAIN-2024-26). Additional information is provide in the corresponding Crawler Issue Report.
Affected Crawls
Affected Web Graphs
No items found.