How to get data from xlsx as percentage ? #608

Closed
opened 2017-03-29 08:00:57 +00:00 by mjdb3d · 13 comments
mjdb3d commented 2017-03-29 08:00:57 +00:00 (Migrated from github.com)

Is it possible to get data from (xlsx) cell that formatted as percentage ?
I've tried and I got numbers divided by 100.

xlsx file 12.23%
readed data 0.12
Result required 12.23 or 12.23%

thanks in advance.

Is it possible to get data from (xlsx) cell that formatted as percentage ? I've tried and I got numbers divided by 100. xlsx file 12.23% readed data 0.12 Result required 12.23 or 12.23% thanks in advance.
SheetJSDev commented 2017-03-29 09:57:31 +00:00 (Migrated from github.com)

Can you share a file?

If you are reading from a file, the formatted text should be in the w field of the cell (the v field stores the raw numeric value)

Can you share a file? If you are reading from a file, the formatted text should be in the `w` field of the cell (the `v` field stores the raw numeric value)
mjdb3d commented 2017-03-29 10:16:43 +00:00 (Migrated from github.com)

file.xlsx

#include "shim.js";
#include "jszip.js";
var JSZip = app.JSZip;
#include "xlsx.flow.js";


var jsx_path = File($.fileName).parent+'/';

var infile = File(jsx_path + 'file.xlsx');
infile.open("r");
infile.encoding = "binary";
var data = infile.read();
var workbook = XLSX.read(data, {type:"binary"});
var sheet_name = workbook.SheetNames[0];
var worksheet = workbook.Sheets[sheet_name];
var data = XLSX.utils.sheet_to_json(worksheet, {header:1});


$.writeln(data);

result:
0.01,0.22,-0,-0,0.25,-0,0.224,0.132,0.1755,0.704,,,,,,,,,,,,,,,,,,,,

[file.xlsx](https://github.com/SheetJS/js-xlsx/files/878626/file.xlsx) ``` #include "shim.js"; #include "jszip.js"; var JSZip = app.JSZip; #include "xlsx.flow.js"; var jsx_path = File($.fileName).parent+'/'; var infile = File(jsx_path + 'file.xlsx'); infile.open("r"); infile.encoding = "binary"; var data = infile.read(); var workbook = XLSX.read(data, {type:"binary"}); var sheet_name = workbook.SheetNames[0]; var worksheet = workbook.Sheets[sheet_name]; var data = XLSX.utils.sheet_to_json(worksheet, {header:1}); $.writeln(data); ``` result: 0.01,0.22,-0,-0,0.25,-0,0.224,0.132,0.1755,0.704,,,,,,,,,,,,,,,,,,,,
SheetJSDev commented 2017-03-29 10:23:01 +00:00 (Migrated from github.com)

@mjdb3d The extendscript support is relatively new and we haven't stress tested every feature. Thanks for reporting, we will look into this

@mjdb3d The extendscript support is relatively new and we haven't stress tested every feature. Thanks for reporting, we will look into this
mjdb3d commented 2017-03-29 11:14:51 +00:00 (Migrated from github.com)

@SheetJSDev you’re welcome.
I hope it will be fixed soon.
Thank you.

@SheetJSDev you’re welcome. I hope it will be fixed soon. Thank you.
carcinocron commented 2017-04-11 18:51:57 +00:00 (Migrated from github.com)

This seems to be part of a larger problem (with apparently all excel readers) where the type information is available (I've read the source code), but for some unknown reason the data is abstracted away from the end-user and ignored for output. For example, for this spreadsheet in xlsx:

$40, 12%

I'm expecting one of these 2 outputs from sheet_to_json:

{[
  "$40",
  "12%",
]}
// or:
{[
  { type: "CURRENCY", value: "40" },
  { type: "PERCENT", value: "12" },
]}

The actual result is:

{[
  "40",
  "0.12",
]}

This also results in data-loss for sheet_to_csv.

Here we are checking the primary type t:

			switch(val.t){
				case 'e': continue;
				case 's': break;
				case 'b': case 'n': break;
				default: throw 'unrecognized type ' + val.t;
			}

but there is not any checking of the secondary type 's' (currency, percent, etc..)

This seems to be part of a larger problem (with apparently all excel readers) where the type information is available (I've read the source code), but for some unknown reason the data is abstracted away from the end-user and ignored for output. For example, for this spreadsheet in xlsx: ``` $40, 12% ``` I'm expecting one of these 2 outputs from `sheet_to_json`: ``` {[ "$40", "12%", ]} // or: {[ { type: "CURRENCY", value: "40" }, { type: "PERCENT", value: "12" }, ]} ``` The actual result is: ``` {[ "40", "0.12", ]} ``` This also results in data-loss for `sheet_to_csv`. Here we are checking the primary type `t`: ``` switch(val.t){ case 'e': continue; case 's': break; case 'b': case 'n': break; default: throw 'unrecognized type ' + val.t; } ``` but there is not any checking of the secondary type 's' (currency, percent, etc..)
SheetJSDev commented 2017-04-11 19:03:30 +00:00 (Migrated from github.com)

@InstanceOfMichael can you share a file? I just threw together a sample file:

608.xlsx

The CSV and JSON does exactly what's expected. This is from http://oss.sheetjs.com/js-xlsx/ (just drag and drop into the box):

screen shot 2017-04-11 at 14 55 15

This is from node:

> var XLSX = require('xlsx');
> var wb = XLSX.readFile('608.xlsx');
> XLSX.utils.sheet_to_csv(wb.Sheets.Sheet1);
'$40 ,12%\n' <-- formatted text
> XLSX.utils.sheet_to_json(wb.Sheets.Sheet1, {header:1})
[ [ '$40 ', '12%' ] ] <-- formatted text
> XLSX.utils.sheet_to_json(wb.Sheets.Sheet1, {header:1, raw:true});
[ [ 40, 0.12 ] ] <-- raw numbers

The problem covered in this issue is extendscript not playing nice with the formatting functions. We've run into issues in the past with regular expressions, for example https://github.com/davidchambers/Base64.js/issues/31#issue-216606310 , but if you are using the web browser or node you should get the correct formatted data or raw data.

@InstanceOfMichael can you share a file? I just threw together a sample file: [608.xlsx](https://github.com/SheetJS/js-xlsx/files/914378/608.xlsx) The CSV and JSON does exactly what's expected. This is from http://oss.sheetjs.com/js-xlsx/ (just drag and drop into the box): <img width="137" alt="screen shot 2017-04-11 at 14 55 15" src="https://cloud.githubusercontent.com/assets/6070939/24925655/e7744a54-1ec6-11e7-995c-9943aaa76bdd.png"> This is from node: ```js > var XLSX = require('xlsx'); > var wb = XLSX.readFile('608.xlsx'); > XLSX.utils.sheet_to_csv(wb.Sheets.Sheet1); '$40 ,12%\n' <-- formatted text > XLSX.utils.sheet_to_json(wb.Sheets.Sheet1, {header:1}) [ [ '$40 ', '12%' ] ] <-- formatted text > XLSX.utils.sheet_to_json(wb.Sheets.Sheet1, {header:1, raw:true}); [ [ 40, 0.12 ] ] <-- raw numbers ``` The problem covered in this issue is extendscript not playing nice with the formatting functions. We've run into issues in the past with regular expressions, for example https://github.com/davidchambers/Base64.js/issues/31#issue-216606310 , but if you are using the web browser or node you should get the correct formatted data or raw data.
carcinocron commented 2017-04-11 20:54:09 +00:00 (Migrated from github.com)

using { raw: false }, I am now getting this from my imaginary spreadsheet:

{[
  "40",
  "12%",
]}

Which is good enough for my purposes. Thank you!

using `{ raw: false }`, I am now getting this from my imaginary spreadsheet: ``` {[ "40", "12%", ]} ``` Which is good enough for my purposes. Thank you!
SheetJSDev commented 2017-04-11 21:00:56 +00:00 (Migrated from github.com)

That still looks wrong. Can you share the worksheet object (JSON.stringify(worksheet))? If the cell has formatted text (the cell object for 40 has a w key with value "$40 ") you should see the same text value in the output.

That still looks wrong. Can you share the worksheet object (`JSON.stringify(worksheet)`)? If the cell has formatted text (the cell object for `40` has a `w` key with value `"$40 "`) you should see the same text value in the output.
carcinocron commented 2017-04-11 21:42:29 +00:00 (Migrated from github.com)
https://gist.github.com/InstanceOfMichael/220f5eb3b74ff3770c4d309ccf8be16f
SheetJSDev commented 2017-04-11 22:15:18 +00:00 (Migrated from github.com)

@InstanceOfMichael thanks for sharing! I can reproduce, it's a bug in the number formatter.

At a high level, Excel doesn't distinguish between "currency" and "percent" and "date" and other cell types. There is a number cell type, and the number format dictates the interpretation.

If you want to see the number format, you can set cellNF:true in the options object. Then every cell will have a z key corresponding to the raw number format. If you look at the end of the output you'll see:

"164":"General",
"165":"[$$-409]#,##0.00;[RED]\\-[$$-409]#,##0.00",
"166":"0.00%"

Those are the formats that LibreOffice stored. Normally Excel would store a format like $#,##0.00 for the currency.

http://oss.sheetjs.com/ssf/ lets you test the formats:

using Excel's normal currency:
screen shot 2017-04-11 at 18 08 49

Using the LO currency format:
screen shot 2017-04-11 at 18 09 22

@InstanceOfMichael thanks for sharing! I can reproduce, it's a bug in the number formatter. At a high level, Excel doesn't distinguish between "currency" and "percent" and "date" and other cell types. There is a number cell type, and the number format dictates the interpretation. If you want to see the number format, you can set `cellNF:true` in the options object. Then every cell will have a `z` key corresponding to the raw number format. If you look at the end of the output you'll see: ``` "164":"General", "165":"[$$-409]#,##0.00;[RED]\\-[$$-409]#,##0.00", "166":"0.00%" ``` Those are the formats that LibreOffice stored. Normally Excel would store a format like `$#,##0.00` for the currency. http://oss.sheetjs.com/ssf/ lets you test the formats: using Excel's normal currency: <img width="165" alt="screen shot 2017-04-11 at 18 08 49" src="https://cloud.githubusercontent.com/assets/6070939/24933003/15569e20-1ee2-11e7-8460-77504222119d.png"> Using the LO currency format: <img width="235" alt="screen shot 2017-04-11 at 18 09 22" src="https://cloud.githubusercontent.com/assets/6070939/24933015/217bf880-1ee2-11e7-9ced-87f17fb6c1cd.png">
SheetJSDev commented 2017-05-10 03:12:20 +00:00 (Migrated from github.com)

@mjdb3d @firas3d thanks for reporting! Finally figured out the issue, and it's a really really strange ExtendScript bug! Consider the following expression:

1 && 2 || 3 && 4 && 5 && 6

In JS this is interpreted as

(1 && 2) || (((3 && 4) && 5) && 6)     // --> 2

In ExtendScript Toolkit (you can test in the JavaScript console) and Photoshop/Illustrator the expression is interpreted as

((((1 && 2) || 3) && 4) && 5) && 6 // --> 6

The formatter library uses a complex expression of the same form:

o = c; while(++i < fmt.length && "0#?.,E+-%".indexOf(c=fmt.charAt(i)) > -1 || c=='\\' && fmt.charAt(i+1) == "-" && i < fmt.length - 2 && "0#".indexOf(fmt.charAt(i+2))>-1) o += c;

Since we use similar expressions throughout the codebases we have to make some more changes, but based on some simple tests I'm fairly confident that is the origin of the problem

@mjdb3d @firas3d thanks for reporting! Finally figured out the issue, and it's a really really strange ExtendScript bug! Consider the following expression: ```js 1 && 2 || 3 && 4 && 5 && 6 ``` In JS this is interpreted as ```js (1 && 2) || (((3 && 4) && 5) && 6) // --> 2 ``` In ExtendScript Toolkit (you can test in the JavaScript console) and Photoshop/Illustrator the expression is interpreted as ```js ((((1 && 2) || 3) && 4) && 5) && 6 // --> 6 ``` The formatter library uses a [complex expression of the same form](https://github.com/SheetJS/ssf/blob/master/ssf.flow.js#L658): ```js o = c; while(++i < fmt.length && "0#?.,E+-%".indexOf(c=fmt.charAt(i)) > -1 || c=='\\' && fmt.charAt(i+1) == "-" && i < fmt.length - 2 && "0#".indexOf(fmt.charAt(i+2))>-1) o += c; ``` Since we use similar expressions throughout the codebases we have to make some more changes, but based on some simple tests I'm fairly confident that is the origin of the problem
praveen-wal commented 2020-04-29 09:22:38 +00:00 (Migrated from github.com)

@SheetJSDev Still getting the same issue in the 0.15.6 version. Any idea what to do? @mjdb3d @InstanceOfMichael

@SheetJSDev Still getting the same issue in the `0.15.6` version. Any idea what to do? @mjdb3d @InstanceOfMichael
anoop-chauhan commented 2021-12-01 11:23:55 +00:00 (Migrated from github.com)

Hi @praveen-wal @mjdb3d @InstanceOfMichael , have you got any solution for this?

Hi @praveen-wal @mjdb3d @InstanceOfMichael , have you got any solution for this?
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#608
No description provided.