hyperlink cannot be parsed correctly #2860
Labels
No Label
DBF
Dates
Defined Names
Features
Formula
HTML
Images
Infrastructure
Integration
International
ODS
Operations
Performance
PivotTables
Pro
Protection
Read Bug
SSF
SYLK
Style
Write Bug
good first issue
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: sheetjs/sheetjs#2860
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I've tried use
FileReader
andsheetjs
to read xlsx file. And when I parse external link in cell object, it turns out to be different from what displayed in excel file.code snippet to read excel:
code snippet to get hyperlink:
expected output for external link:
https://mail.google.com/mail/u/0/#inbox/FMfcgzGrcFjbSrPQLvwvqSDgJHkZStDv&test=true
actual output for external link:
https://mail.google.com/mail/u/0/#inbox/FMfcgzGrcFjbSrPQLvwvqSDgJHkZStDv&test=true
Now I just use Regex replacement as a quick solution. However, I'm not sure if there is any other characters been encoded incorrectly.
Can you help me to find one solution to get the raw external link from excel file? Thanks very much.
Are you sure some other step is not automatically encoding?
https://jsfiddle.net/ohwnbxqr/ is a small demo with a simple file element:
JS code prints out every link:
To see if this is working properly, we can look at the demos page. https://docs.sheetjs.com/docs/csf/features/hyperlinks go to the first live editor and just change the Target to the expected output. The screenshot shows what the code looked like, and the attached XLSX file is what the script created. To confirm Excel read the file correctly, open it, right click cell A1 and select "Edit Hyperlink" to see
https://mail.google.com/mail/u/0/#inbox/FMfcgzGrcFjbSrPQLvwvqSDgJHkZStDv&test=true
.The reason to suspect some process on your end is encoding text is in the fiddle itself. The second screenshot shows the result of selecting the file. You will see the page itself (assigned via
innerText
) shows the correct&
. The actual browser console (right side) shows the correct&
. However the JSFiddle fake console shows the encoded value. That, it seems, is a bug in JSFiddle.If you run the same exact test in NodeJS:
You will see the correct value:
I've tried in your small demo and still got the same error. That's really weird. Need more time to figure out the excel parsing issue. Thanks for your help.
here is the excel file with external link
here is the fiddle & console output
Thanks for sharing! There's a key difference between the two examples.
The URI fragment (the part after the
#
) is properly decoded and re-encoded. In your first example:The underlined part is the fragment and that part is correctly round-tripped.
In your second example,
There is no fragment. The body is correctly escaped when writing (you can verify that by writing the file and opening in Excel) but the parser is not unescaping.
https://git.sheetjs.com/sheetjs/sheetjs/src/branch/master/bits/31_rels.js#L55 the patch is:
Feel free to send a PR.
This was pushed in 0.19.2, please do an update. An updated version of the fiddle (using the new library version): https://jsfiddle.net/xhtdrscq/