wrong currency parsing
#78
Open
opened 9 years ago by sysarchitect
·
23 comments
No Branch/Tag Specified
master
sankhavaramsaitulasiram/feat-fix-2752
maybeswapnil/issue2737
gh-pages
scottysseus/2560_2
ivan-trusov/fix-basedate
nandanv2702/issue_1300
protobi/master
ThomasChan/master
grantfayvor/master
tom-groves/bug-1105/rounding-error
mgreter/master
v0.87
v0.9.9
v0.9.8
v0.9.7
v0.9.6
v0.9.5
v0.9.4
v0.9.3
v0.9.2
v0.9.13
v0.9.12
v0.9.11
v0.9.10
v0.9.1
v0.9.0
v0.8.8
v0.8.7
v0.8.6
v0.8.5
v0.8.4
v0.8.3
v0.8.2
v0.8.1
v0.8.0
v0.7.9
v0.7.7
v0.7.6-i
v0.7.6-h
v0.7.6-a
v0.7.6
v0.7.5
v0.7.4
v0.7.3
v0.7.2
v0.7.11
v0.7.10
v0.7.1
v0.5.9
v0.5.8
v0.5.7
v0.5.10
v0.5.0
v0.4.3
v0.18.6
v0.18.5
v0.18.4+deno
v0.18.4
v0.18.3
v0.18.2
v0.18.1
v0.18.0+deno
v0.17.5
v0.17.0
v0.16.8
v0.16.7
v0.16.6
v0.16.5
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.15.6
v0.15.5
v0.15.2
v0.14.0
v0.13.5
v0.13.4
v0.13.3
v0.13.1
v0.13.0
v0.12.9
v0.12.8
v0.12.7
v0.12.6
v0.12.5
v0.12.4
v0.12.3
v0.12.2
v0.12.13
v0.12.12
v0.12.11
v0.12.10
v0.12.1
v0.12.0
v0.11.9
v0.11.8
v0.11.7
v0.11.6
v0.11.5
v0.11.4
v0.11.3
v0.11.2
v0.11.19
v0.11.18
v0.11.17
v0.11.16
v0.11.15
v0.11.14
v0.11.13
v0.11.12
v0.11.11
v0.11.10
v0.11.1
v0.11.0
v0.10.9
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
Milestone
Set milestone
Clear milestone
No items
No Milestone
Projects
Set Project
Clear projects
No project
Assignees
Assign users
Clear assignees
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#78
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
No
Yes
Hello,
Cell formatted as shown (Russian currency format) parses as:
Can you fix please? Thank you.
Based on some small tests (you can play around with the number formatting library: http://oss.sheetjs.com/ssf/) I think the issue here boils down to the codepage encoding of the number format and the lack of localization (to be sure, the number 12345.6789 in the US version renders as
12,346p.
, which is definitely not what your version shows).I would like to take a peek at those files and figure out the best strategy for the localization. Can you make a sample sheet with a cell from that format and save it as (in this particular order):
If it warns before saving in any of the formats, that's OK. If you could share the files with me (either email or putting them somewhere) I can take a look.
I deeply appreciate your help :)
See attached files.
I tested on XLS and XLSX only:
Column H, I - wrong parsing.
Column J - ok.
"Ñ." instead of local currency symbol is not really big issue but it
appears only with XLSX format.
Note that there are two additional sheets with objects map - to be ignored.
Original file was in XLS 2003 format.
Thank you.
On 01.07.2014 11:35, SheetJSDev wrote:
Unfortunately github doesn't support attachments in the email. Not to worry though -- I was able to produce a file by changing my computer settings to Russian.
The XLSX
"Ñ."
issue is a very simple fix: the utf characters have to be processed in the number format:https://github.com/SheetJS/js-xlsx/blob/master/bits/47_styxml.js#L57
The other half of the problem (thousands separator and decimal character) is locale specific. To see this, in Windows "Region and Language" settings, the separators are in the "Additional Settings" pane.
If you look at the file (unzip the xlsx and look at the file xl/styles.xml), you'll see the format is stored assuming that the comma is the thousand separator and the dot is the decimal. The fix probably looks like this:
The formatting library should take some parameters to localize the output
The parsers should accept locale options that would control the output separators
If possible, the parsers should try to guess the locale of the person who last saved the file.
Attempt to resolve a conflict between east asian versions of excel (some number formats like 56 are custom formats in the US version but have different meanings in the various east asian languages)
Run down the list of currencies and make sure that every currency is properly formatted
@elad In Hebrew, are numbers/currencies written right-to-left or left-to-right?
Numbers are written left-to-right, and currency unit appears on the left:
@sysarchitect you've opened a big can of worms. I took a look at the number format test, testing with different location settings, and found:
the month names should be in the specified locale
the VBA Format function is not fully localized
the Excel understanding of built-in formats is broken: formats which show up as standard in the US locale are treated as custom formats in Russian
There appears to be no mention of the locale in the XLSX file.
Good news is that this setting can be controlled very easily in Excel 2013, so it shouldn't be too hard to create a list for each of the locales.
Hello,
Glad to hear.
Understanding is the half of solution )
I do believe js-xlsx is the only really working client tool. Hope issues
can be solved.
For most purposes 2013 is enough I think.
Thank you.
On 04.07.2014 8:00, SheetJSDev wrote:
@sysarchitect @elad when you save as "CSV", does Excel save with semicolons or commas?
For example, this is what I saw when saving the number_format baseline as csv in Russian: https://github.com/SheetJS/test_files/blob/master/number_format_russian.0.csv
For me the default is commas, but I don't use a localized version of Excel, so I don't know if that bit of information is of any help. :)
@elad I should have tested Hebrew before asking -- Excel is horribly inconsistent. For example, this is the date
18/10/1933 16:17:37
(number12345.6789
under formatdd-mmm-yy
) under Excel 2013 when the computer location is set to Israel and the language is set to Hebrew:Excel saves it as
18-???-33
in the CSV. This is the unicode text:Which is the correct rendering? The US date is "October 18, 1933"
For Hebrew CSV files, I think I found out you had to have a BOM character first. At least that's what I had to do in my code for the files to actually have readable contents.
The first rendering (
18-???-33
) is obviously wrong, the second is correct but written in reverse. So the middle word, which should read "oct," actually reads "tco." As you can see the day and year are also in the wrong sides.The first thing I would try would be to put in a BOM character, if that doesn't work we can debug further.
@elad mystery solved: when saving a file as CSV, Excel attempts to use the local codepage. That's actually controlled by a different setting (for "non-Unicode applications", strange since Excel is clearly unicode aware).
So when I generated the baseline using codepage 1252 (the standard US codepage) the hebrew characters are invalid (so they were rendered as
?
). When the local codepage is 1255 (which is what the Hebrew actually requires), then it saves as random ascii characters (likeéåí øáéòé 18 àå÷èåáø 1933
). Incidentally, if you convert from the Hebrew codepage 1255 to UTF8 things look better:Hello,
In Excel:
1234567 1234567
1234567,00 1234567,00
1 234 567,00 р. 1 234 567,00 р.
Saved in CSV as:
1234567;1234567
1234567,00;1234567,00
1 234 567,00 р.;1 234 567,00 р.
On 08.07.2014 22:50, SheetJSDev wrote:
@sysarchitect I haven't forgotten this :)
With regards to the XLSX
Ñ
format character, can you check against the latest version (0.7.8)?With regards to the actual format processing, there are two sub-problems.
A) Determine the location information from the file. This is the status:
B) Use the location information to generate properly formatted text.
The current snag is that the date information is localized as well. For example, consider the month format
mmm
. In different locales (even if the language may be the same), they produce different month strings (on a side note, in many languages the text is a phonetic transcription of the english name). This database of names and related logic to switch locales needs to be stored somewhere, and currently take up nearly 100KB (which I'm working to shrink).Also, can you directly send me a throwaway set of files (XLS, XLSX, XLSB, XML) using the problematic formats that I can add to the test suite? Replying to this email unfortunately doesn't forward the attachments, so you have to send it to dev -- sheetjs -- com
Hello,
H.N.Y. is less expected then new version )
Green cells - OK
Yellow - OK but JS-XLSX adds " (recognizes number as text). This is
not critical.
Red - error
I tested on XLSX only.
All other formats you listed are just "save as" copies.
Saving in XML is not allowed somehow.
Currently I'm slightly modifying sheet_to_csv function to support
XML, but it is rewritten is each new JS-XLSX version )
If yes, I ready to propose you XML-output format. My projects uses
MSSQL which allow to send and parse data in XML - very powerful solution
with JS-XSLX.
Thank you.
On 30.07.2014 17:47, SheetJSDev wrote:
@sysarchitect Feel free to send the code so we can review :)
The "github way" to do this is to fork the repo (hit the fork button), commit changes, push them to your fork and create a "pull request". Alternatively, you can just paste the function body in a reply and we can take a look. If you want to add it a reply, add three backticks (
```
) before and after the code:Hello,
OK
Note that my code would not be applicable for you because it uses GUIDs
and tabs as row separators to prepare XML for some project reason.
The idea is that browser can read XML directly through DOM and MSSQL can
use XML as stored procedure input parameter.
In fact it is the only way to paste data table into SQL at one step.
Where ID is еру Excel row number. It is necessary for back messages to
show user is something wrong in some rows.
_1 _2 _3 etc are attributes for Excel columns (A, B, C, ...)
The ideal solution would be to implement XML function that returns the
format above at library level.
My modification of sheet_to_csv function:
Application calls handleFileSelect, where Caller is the HTML element
which fires event, XMLDoc is the empty DOM XML like
Script manipulations in handleFileSelect are all about avoiding
browser caching and using properly Excel library depending of input file.
sysarchitect
On 31.07.2014 16:57, SheetJSDev wrote:
@sysarchitect if I understand what you want to do correctly, you can make your own function that uses sheet_to_json (this function can be in its own script, so you don't have to change xlsx.js or xls.js to do this):
If you need UTF-16 encoding, then use codepage:
Hello,
We're going to production with our project in one two months.
Do we have to wait for currency formats fix or use current version?
(this is just a question :)
Best Regards, Ilya Loskutov
On 09.07.2014 10:14, Ilya Loskutov wrote:
@sysarchitect we are very close to pushing a boatload of logic to fix currency as well as date/time and other localization issues. Stay tuned :)
Long story short, the SSF module will mirror the C localization functions (e.g. setlocale, LC_NUMERIC). After checking every locale (Windows does not make it easy to switch regions and languages) I have a rough sense for how windows locale information affects the formatting, and it appears to mirror the C localization system.
Glad to hear )
Thank you
On 13.10.2014 9:44, SheetJSDev wrote:
Hello I have looked through this thread to find an answer, but semicolon to comma is the closest to what I have gotten because what happens to me is the following.
XLSX format output
Software,"$6,526",$227,"$45,797","$42,354","$7,931","$15,802","$45,210","$49,233","$32,049","$7,519","$49,916","$21,721"
CSV format output
Software,$6.526,$227,$45.797,$42.354,$7.931,$15.802,$45.210,$49.233,$32.049,$7.519,$49.916,$21.721
As you can see that the XLSX for some reason surrounds the cell that contains the "." with guotes (which I don't mind), but what my problem is that it changes it to comma which makes it really hard to split cell by cell so that I can recreate it.
Have you encountered this before and if yes do you know a possible reason/fix for the mentioned problem.
Any input would be highly appreciated,
Thank you :)
@mandros1 I've found same issue, any solution for this?
@MiqueiasGFernandes yikes, a long time has passed since I posted this, so I don't even remember what was the project/case I had issue with as described above. I do remember that I ultimately did a hacky fix by splitting and replacing, but don't remember what exactly. I also do know that I didn't solve it using this library, but I think this might be fixable by adapting some configurations or passing the format type for the parser to use surely (I just wasn't keen on reading the documentation back then).
Sorry I couldn't be of more help, good luck!