ISO 2022 JIS Japanese encoding fails #17
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hi, thanks very much for your work on this repository, it's incredibly useful. We use it as the main character encoding library for CyberChef.
We've recently noticed an issue when trying to encode into
ISO 2022 JIS Japanese
where only null bytes are returned.The affected CP numbers are
50220
,50221
and50222
.Example code
Expected output
Actual output
Can you shed any light on this behaviour?
Another example that also fails:
Code
Expected output
Actual output
Thanks for sharing! The ISO 2022 codepages 5022{0,1,2,5,7} are definitely incorrect -- hiragana require a control sequence and those are not currently supported. Based on ECMA-35, the first kana "こ" should be encoded as
1B 24 42 24 33
(1B 24 42
to switch to the JIS double byte encoding,24
for the Hiragana subset and43
for the actual character). This will require a direct implementation of control sequences and a new set of LUTs for the various character subsets.PS: All of the generated codepages with source listed as "Windows 7" are assumed to either be single-byte or double-byte. Clearly that wasn't the case here.