+ACM Codepages for JS +AFs-Codepages+AF0(https://en.wikipedia.org/wiki/Codepage) are character encodings. In many contexts, single- or double-byte character sets are used in lieu of Unicode encodings. The codepages map between characters and numbers. +AFs-unicode.org+AF0(http://www.unicode.org/Public/MAPPINGS/) hosts lists of mappings. The build script automatically downloads and parses the mappings in order to generate the full script. The +AGA-pages.csv+AGA description in +AGA-codepage.md+AGA controls which codepages are used. +ACMAIw Setup In node: var cptable +AD0 require('codepage')+ADs In the browser: +ADw-script src+AD0AIg-cptable.js+ACIAPgA8-/script+AD4 +ADw-script src+AD0AIg-cputils.js+ACIAPgA8-/script+AD4 Alternatively, use the full version in the dist folder: +ADw-script src+AD0AIg-cptable.full.js+ACIAPgA8-/script+AD4 The complete set of codepages is large due to some Double Byte Character Set encodings. A much smaller file that just includes SBCS codepages is provided in this repo (+AGA-sbcs.js+AGA), as well as a file for other projects (+AGA-cpexcel.js+AGA) If you know which codepages you need, you can include individual scripts for each codepage. The individual files are provided in the +AGA-bits/+AGA directory. For example, to include only the Mac codepages: +ADw-script src+AD0AIg-bits/10000.js+ACIAPgA8-/script+AD4 +ADw-script src+AD0AIg-bits/10006.js+ACIAPgA8-/script+AD4 +ADw-script src+AD0AIg-bits/10007.js+ACIAPgA8-/script+AD4 +ADw-script src+AD0AIg-bits/10029.js+ACIAPgA8-/script+AD4 +ADw-script src+AD0AIg-bits/10079.js+ACIAPgA8-/script+AD4 +ADw-script src+AD0AIg-bits/10081.js+ACIAPgA8-/script+AD4 All of the browser scripts define and append to the +AGA-cptable+AGA object. To rename the object, edit the +AGA-JSVAR+AGA shell variable in +AGA-make.sh+AGA and run the script. The utilities functions are contained in +AGA-cputils.js+AGA, which assumes that the appropriate codepage scripts were loaded. +ACMAIw Usage The codepages are indexed by number. To get the unicode character for a given codepoint, use the +AGA-dec+AGA property: var unicode+AF8-cp10000+AF8-255 +AD0 cptable+AFs-10000+AF0.dec+AFs-255+AF0AOw // +Asc To get the codepoint for a given character, use the +AGA-enc+AGA property: var cp10000+AF8-711 +AD0 cptable+AFs-10000+AF0.enc+AFs-String.fromCharCode(711)+AF0AOw // 255 There are a few utilities that deal with strings and buffers: var +bEdgOw +AD0 cptable.utils.decode(936, +AFs-0xbb,0xe3,0xd7,0xdc+AF0)+ADs var buf +AD0 cptable.utils.encode(936, +bEdgOw)+ADs var sushi+AD0 cptable.utils.decode(65001, +AFs-0xf0,0x9f,0x8d,0xa3+AF0)+ADs // +2DzfYw var sbuf +AD0 cptable.utils.encode(65001, sushi)+ADs +AGA-cptable.utils.encode(CP, data, ofmt)+AGA accepts a String or Array of characters and returns a representation controlled by +AGA-ofmt+AGA: - Default output is a Buffer (or Array) of bytes (integers between 0 and 255). - If +AGA-ofmt +AD0APQ 'str'+AGA, return a String where +AGA-o.charCodeAt(i)+AGA is the ith byte - If +AGA-ofmt +AD0APQ 'arr'+AGA, return an Array of bytes +ACMAIw Known Excel Codepages A much smaller script, including only the codepages known to be used in Excel, is available under the name +AGA-cpexcel+AGA. It exposes the same variable +AGA-cptable+AGA and is suitable as a drop-in replacement when the full codepage tables are not needed. In node: var cptable +AD0 require('codepage/dist/cpexcel.full')+ADs +ACMAIw Rolling your own script The +AGA-make.sh+AGA script in the repo can take a manifest and generate JS source. Usage: bash make.sh path+AF8-to+AF8-manifest output+AF8-file+AF8-name JSVAR where - +AGA-JSVAR+AGA is the name of the exported variable (generally +AGA-cptable+AGA) - +AGA-output+AF8-file+AF8-name+AGA is the output file (e.g. +AGA-cpexcel.js+AGA, +AGA-cptable.js+AGA) - +AGA-path+AF8-to+AF8-manifest+AGA is the path to the manifest file. The manifest file is expected to be a CSV with 3 columns: +ADw-codepage number+AD4,+ADw-source+AD4,+ADw-size+AD4 If a source is specified, it will try to download the specified file and parse. The file format is expected to follow the format from the unicode.org site. The size should be +AGA-1+AGA for a single-byte codepage and +AGA-2+AGA for a double-byte codepage. For mixed codepages (which use some single- and some double-byte codes), the script assumes the mapping is a prefix code and generates efficient JS code. Generated scripts only include the mapping. +AGA-cat+AGA a mapping with +AGA-cputils.js+AGA to produce a complete script like +AGA-cpexcel.full.js+AGA. +ACMAIw Building the complete script This script uses +AFs-voc+AF0(npm.im/voc). The script to build the codepage tables and the JS source is +AGA-codepage.md+AGA, so building is as simple as +AGA-voc codepage.md+AGA. +ACMAIw Generated Codepages The complete list of hardcoded codepages can be found in the file +AGA-pages.csv+AGA. Some codepages are easier to implement algorithmically. Since these are hardcoded in utils, there is no corresponding entry (they are +ACI-magic+ACI) +AHw CP+ACM +AHw Information +AHw Description +AHw +AHw --: +AHw :----------: +AHw :---------- +AHw +AHw 37+AHw unicode.org +AHw-IBM EBCDIC US-Canada +AHw 437+AHw unicode.org +AHw-OEM United States +AHw 500+AHw unicode.org +AHw-IBM EBCDIC International +AHw 620+AHw NLS +AHw-Mazovia (Polish) MS-DOS +AHw 708+AHw-MakeEncoding.cs+AHw-Arabic (ASMO 708) +AHw 720+AHw-MakeEncoding.cs+AHw-Arabic (Transparent ASMO)+ADs Arabic (DOS) +AHw 737+AHw unicode.org +AHw-OEM Greek (formerly 437G)+ADs Greek (DOS) +AHw 775+AHw unicode.org +AHw-OEM Baltic+ADs Baltic (DOS) +AHw 850+AHw unicode.org +AHw-OEM Multilingual Latin 1+ADs Western European (DOS) +AHw 852+AHw unicode.org +AHw-OEM Latin 2+ADs Central European (DOS) +AHw 855+AHw unicode.org +AHw-OEM Cyrillic (primarily Russian) +AHw 857+AHw unicode.org +AHw-OEM Turkish+ADs Turkish (DOS) +AHw 858+AHw-MakeEncoding.cs+AHw-OEM Multilingual Latin 1 +- Euro symbol +AHw 860+AHw unicode.org +AHw-OEM Portuguese+ADs Portuguese (DOS) +AHw 861+AHw unicode.org +AHw-OEM Icelandic+ADs Icelandic (DOS) +AHw 862+AHw unicode.org +AHw-OEM Hebrew+ADs Hebrew (DOS) +AHw 863+AHw unicode.org +AHw-OEM French Canadian+ADs French Canadian (DOS) +AHw 864+AHw unicode.org +AHw-OEM Arabic+ADs Arabic (864) +AHw 865+AHw unicode.org +AHw-OEM Nordic+ADs Nordic (DOS) +AHw 866+AHw unicode.org +AHw-OEM Russian+ADs Cyrillic (DOS) +AHw 869+AHw unicode.org +AHw-OEM Modern Greek+ADs Greek, Modern (DOS) +AHw 870+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Multilingual/ROECE (Latin 2) +AHw 874+AHw unicode.org +AHw-Windows Thai +AHw 875+AHw unicode.org +AHw-IBM EBCDIC Greek Modern +AHw 895+AHw NLS +AHw-Kamenick+AP0 (Czech) MS-DOS +AHw 932+AHw unicode.org +AHw-Japanese Shift-JIS +AHw 936+AHw unicode.org +AHw-Simplified Chinese GBK +AHw 949+AHw unicode.org +AHw-Korean +AHw 950+AHw unicode.org +AHw-Traditional Chinese Big5 +AHw 1026+AHw unicode.org +AHw-IBM EBCDIC Turkish (Latin 5) +AHw 1047+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Latin 1/Open System +AHw 1140+AHw-MakeEncoding.cs+AHw-IBM EBCDIC US-Canada (037 +- Euro symbol) +AHw 1141+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Germany (20273 +- Euro symbol) +AHw 1142+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Denmark-Norway (20277 +- Euro symbol) +AHw 1143+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Finland-Sweden (20278 +- Euro symbol) +AHw 1144+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Italy (20280 +- Euro symbol) +AHw 1145+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Latin America-Spain (20284 +- Euro symbol) +AHw 1146+AHw-MakeEncoding.cs+AHw-IBM EBCDIC United Kingdom (20285 +- Euro symbol) +AHw 1147+AHw-MakeEncoding.cs+AHw-IBM EBCDIC France (20297 +- Euro symbol) +AHw 1148+AHw-MakeEncoding.cs+AHw-IBM EBCDIC International (500 +- Euro symbol) +AHw 1149+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Icelandic (20871 +- Euro symbol) +AHw 1200+AHw magic +AHw-Unicode UTF-16, little endian (BMP of ISO 10646) +AHw 1201+AHw magic +AHw-Unicode UTF-16, big endian +AHw 1250+AHw unicode.org +AHw-Windows Central Europe +AHw 1251+AHw unicode.org +AHw-Windows Cyrillic +AHw 1252+AHw unicode.org +AHw-Windows Latin I +AHw 1253+AHw unicode.org +AHw-Windows Greek +AHw 1254+AHw unicode.org +AHw-Windows Turkish +AHw 1255+AHw unicode.org +AHw-Windows Hebrew +AHw 1256+AHw unicode.org +AHw-Windows Arabic +AHw 1257+AHw unicode.org +AHw-Windows Baltic +AHw 1258+AHw unicode.org +AHw-Windows Vietnam +AHw 1361+AHw-MakeEncoding.cs+AHw-Korean (Johab) +AHw-10000+AHw unicode.org +AHw-MAC Roman +AHw-10001+AHw-MakeEncoding.cs+AHw-Japanese (Mac) +AHw-10002+AHw-MakeEncoding.cs+AHw-MAC Traditional Chinese (Big5) +AHw-10003+AHw-MakeEncoding.cs+AHw-Korean (Mac) +AHw-10004+AHw-MakeEncoding.cs+AHw-Arabic (Mac) +AHw-10005+AHw-MakeEncoding.cs+AHw-Hebrew (Mac) +AHw-10006+AHw unicode.org +AHw-Greek (Mac) +AHw-10007+AHw unicode.org +AHw-Cyrillic (Mac) +AHw-10008+AHw-MakeEncoding.cs+AHw-MAC Simplified Chinese (GB 2312) +AHw-10010+AHw-MakeEncoding.cs+AHw-Romanian (Mac) +AHw-10017+AHw-MakeEncoding.cs+AHw-Ukrainian (Mac) +AHw-10021+AHw-MakeEncoding.cs+AHw-Thai (Mac) +AHw-10029+AHw unicode.org +AHw-MAC Latin 2 (Central European) +AHw-10079+AHw unicode.org +AHw-Icelandic (Mac) +AHw-10081+AHw unicode.org +AHw-Turkish (Mac) +AHw-10082+AHw-MakeEncoding.cs+AHw-Croatian (Mac) +AHw-12000+AHw magic +AHw-Unicode UTF-32, little endian byte order +AHw-12001+AHw magic +AHw-Unicode UTF-32, big endian byte order +AHw-20000+AHw-MakeEncoding.cs+AHw-CNS Taiwan (Chinese Traditional) +AHw-20001+AHw-MakeEncoding.cs+AHw-TCA Taiwan +AHw-20002+AHw-MakeEncoding.cs+AHw-Eten Taiwan (Chinese Traditional) +AHw-20003+AHw-MakeEncoding.cs+AHw-IBM5550 Taiwan +AHw-20004+AHw-MakeEncoding.cs+AHw-TeleText Taiwan +AHw-20005+AHw-MakeEncoding.cs+AHw-Wang Taiwan +AHw-20105+AHw-MakeEncoding.cs+AHw-Western European IA5 (IRV International Alphabet 5) 7-bit +AHw-20106+AHw-MakeEncoding.cs+AHw-IA5 German (7-bit) +AHw-20107+AHw-MakeEncoding.cs+AHw-IA5 Swedish (7-bit) +AHw-20108+AHw-MakeEncoding.cs+AHw-IA5 Norwegian (7-bit) +AHw-20127+AHw magic +AHw-US-ASCII (7-bit) +AHw-20261+AHw-MakeEncoding.cs+AHw-T.61 +AHw-20269+AHw-MakeEncoding.cs+AHw-ISO 6937 Non-Spacing Accent +AHw-20273+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Germany +AHw-20277+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Denmark-Norway +AHw-20278+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Finland-Sweden +AHw-20280+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Italy +AHw-20284+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Latin America-Spain +AHw-20285+AHw-MakeEncoding.cs+AHw-IBM EBCDIC United Kingdom +AHw-20290+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Japanese Katakana Extended +AHw-20297+AHw-MakeEncoding.cs+AHw-IBM EBCDIC France +AHw-20420+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Arabic +AHw-20423+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Greek +AHw-20424+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Hebrew +AHw-20833+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Korean Extended +AHw-20838+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Thai +AHw-20866+AHw-MakeEncoding.cs+AHw-Russian Cyrillic (KOI8-R) +AHw-20871+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Icelandic +AHw-20880+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Cyrillic Russian +AHw-20905+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Turkish +AHw-20924+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Latin 1/Open System (1047 +- Euro symbol) +AHw-20932+AHw-MakeEncoding.cs+AHw-Japanese (JIS 0208-1990 and 0212-1990) +AHw-20936+AHw-MakeEncoding.cs+AHw-Simplified Chinese (GB2312-80) +AHw-20949+AHw-MakeEncoding.cs+AHw-Korean Wansung +AHw-21025+AHw-MakeEncoding.cs+AHw-IBM EBCDIC Cyrillic Serbian-Bulgarian +AHw-21027+AHw NLS +AHw-Extended/Ext Alpha Lowercase +AHw-21866+AHw-MakeEncoding.cs+AHw-Ukrainian Cyrillic (KOI8-U) +AHw-28591+AHw unicode.org +AHw-ISO 8859-1 Latin 1 (Western European) +AHw-28592+AHw unicode.org +AHw-ISO 8859-2 Latin 2 (Central European) +AHw-28593+AHw unicode.org +AHw-ISO 8859-3 Latin 3 +AHw-28594+AHw unicode.org +AHw-ISO 8859-4 Baltic +AHw-28595+AHw unicode.org +AHw-ISO 8859-5 Cyrillic +AHw-28596+AHw unicode.org +AHw-ISO 8859-6 Arabic +AHw-28597+AHw unicode.org +AHw-ISO 8859-7 Greek +AHw-28598+AHw unicode.org +AHw-ISO 8859-8 Hebrew (ISO-Visual) +AHw-28599+AHw unicode.org +AHw-ISO 8859-9 Turkish +AHw-28600+AHw unicode.org +AHw-ISO 8859-10 Latin 6 +AHw-28601+AHw unicode.org +AHw-ISO 8859-11 Latin (Thai) +AHw-28603+AHw unicode.org +AHw-ISO 8859-13 Latin 7 (Estonian) +AHw-28604+AHw unicode.org +AHw-ISO 8859-14 Latin 8 (Celtic) +AHw-28605+AHw unicode.org +AHw-ISO 8859-15 Latin 9 +AHw-28606+AHw unicode.org +AHw-ISO 8859-15 Latin 10 +AHw-29001+AHw-MakeEncoding.cs+AHw-Europa 3 +AHw-38598+AHw-MakeEncoding.cs+AHw-ISO 8859-8 Hebrew (ISO-Logical) +AHw-50220+AHw-MakeEncoding.cs+AHw-ISO 2022 JIS Japanese with no halfwidth Katakana +AHw-50221+AHw-MakeEncoding.cs+AHw-ISO 2022 JIS Japanese with halfwidth Katakana +AHw-50222+AHw-MakeEncoding.cs+AHw-ISO 2022 Japanese JIS X 0201-1989 (1 byte Kana-SO/SI) +AHw-50225+AHw-MakeEncoding.cs+AHw-ISO 2022 Korean +AHw-50227+AHw-MakeEncoding.cs+AHw-ISO 2022 Simplified Chinese +AHw-51932+AHw-MakeEncoding.cs+AHw-EUC Japanese +AHw-51936+AHw-MakeEncoding.cs+AHw-EUC Simplified Chinese +AHw-51949+AHw-MakeEncoding.cs+AHw-EUC Korean +AHw-52936+AHw-MakeEncoding.cs+AHw-HZ-GB2312 Simplified Chinese +AHw-54936+AHw-MakeEncoding.cs+AHw-GB18030 Simplified Chinese (4 byte) +AHw-57002+AHw-MakeEncoding.cs+AHw-ISCII Devanagari +AHw-57003+AHw-MakeEncoding.cs+AHw-ISCII Bengali +AHw-57004+AHw-MakeEncoding.cs+AHw-ISCII Tamil +AHw-57005+AHw-MakeEncoding.cs+AHw-ISCII Telugu +AHw-57006+AHw-MakeEncoding.cs+AHw-ISCII Assamese +AHw-57007+AHw-MakeEncoding.cs+AHw-ISCII Oriya +AHw-57008+AHw-MakeEncoding.cs+AHw-ISCII Kannada +AHw-57009+AHw-MakeEncoding.cs+AHw-ISCII Malayalam +AHw-57010+AHw-MakeEncoding.cs+AHw-ISCII Gujarati +AHw-57011+AHw-MakeEncoding.cs+AHw-ISCII Punjabi +AHw-65000+AHw magic +AHw-Unicode (UTF-7) +AHw-65001+AHw magic +AHw-Unicode (UTF-8) Note that MakeEncoding.cs deviates from unicode.org for some codepages. In the case of direct conflicts, unicode.org takes precedence. In cases where the unicode.org listing does not prescribe a value, MakeEncoding.cs value is used. NLS refers to the National Language Support files supplied in various versions of Windows. In older versions of Windows (e.g. Windows 98) these files followed the pattern +AGA-CP+AF8AIw.NLS+AGA, but newer versions use the pattern +AGA-C+AF8AIw.NLS+AGA. +ACMAIw Sources - +AFs-Unicode Consortium Public Mappings+AF0(http://www.unicode.org/Public/MAPPINGS/) - +AFs-Code Page Enumeration+AF0(http://msdn.microsoft.com/en-us/library/cc195051.aspx) - +AFs-Code Page Identifiers+AF0(http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756.aspx) +ACMAIw Badges +AFsAIQBb-Build Status+AF0(https://travis-ci.org/SheetJS/js-codepage.svg?branch+AD0-master)+AF0(https://travis-ci.org/SheetJS/js-codepage) +AFsAIQBb-Coverage Status+AF0(https://coveralls.io/repos/SheetJS/js-codepage/badge.png)+AF0(https://coveralls.io/r/SheetJS/js-codepage) +AFsAIQBb-Analytics+AF0(https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/js-codepage?pixel)+AF0(https://github.com/SheetJS/js-codepage)