Fix Cell Objects to always have html-encoded data for "h" key #629

Closed
xkr47 wants to merge 2 commits from patch-1 into master
xkr47 commented 2017-04-13 11:01:42 +00:00 (Migrated from github.com)

Some cases left characters like & and < unencoded in the "h" key, making it unreliable for direct embedding in html.

Some cases left characters like `&` and `<` unencoded in the "h" key, making it unreliable for direct embedding in html.
SheetJSDev commented 2017-04-13 15:37:33 +00:00 (Migrated from github.com)

What should happen with other entities? Here's a fun test case we threw together for testing entities in plaintext formats:

CODES.xlsx

Excel's XML uses a special form _xHHHH_ for encoding other entities, as demonstrated in B2 in Sheet1 from the attached workbook:

      <c r="B2" t="str">
        <f>CHAR(A2)</f>
        <v>_x0001_</v>
      </c>
What should happen with other entities? Here's a fun test case we threw together for testing entities in plaintext formats: [CODES.xlsx](https://github.com/SheetJS/js-xlsx/files/920106/CODES.xlsx) Excel's XML uses a special form `_xHHHH_` for encoding other entities, as demonstrated in B2 in Sheet1 from the attached workbook: ```xml <c r="B2" t="str"> <f>CHAR(A2)</f> <v>_x0001_</v> </c> ```
SheetJSDev commented 2017-04-13 18:39:24 +00:00 (Migrated from github.com)

Squashed the two commits and added a few tests in 616d2e534f

The open question is whether more HTML entities should be supported. For example, take the unicode BLACK HEART SUIT ♥ There are two acceptable HTML encodings:

  • &hearts; <html>♥</html>
  • &#x2665; <html>♥</html>
Squashed the two commits and added a few tests in https://github.com/SheetJS/js-xlsx/commit/616d2e534fc8736040fbb4fbdeb5ff97a515aad4 The open question is whether more HTML entities should be supported. For example, take the unicode BLACK HEART SUIT ♥ There are two acceptable HTML encodings: - `&hearts;` <html>&hearts;</html> - `&#x2665;` <html>&#x2665;</html>
xkr47 commented 2017-04-14 17:14:51 +00:00 (Migrated from github.com)

@SheetJSDev Thanks for adding tests!
If we are only talking about encoding html, here's my opinion:

I don't think it makes sense to encode more entities than is strictly necessary. Other tools outside xlsx can be used if more encoding is needed. I think entities were more popular when people didn't use UTF-8 but some other charset. Special Excel XML entities like xHHHH should be decoded before encoding to HTML.

@SheetJSDev Thanks for adding tests! If we are only talking about *encoding* html, here's my opinion: I don't think it makes sense to encode more entities than is strictly necessary. Other tools outside xlsx can be used if more encoding is needed. I think entities were more popular when people didn't use UTF-8 but some other charset. Special Excel XML entities like _xHHHH_ should be decoded before encoding to HTML.
xkr47 commented 2017-04-16 18:42:51 +00:00 (Migrated from github.com)

Thanks!

Thanks!
SheetJSDev commented 2017-04-16 18:51:24 +00:00 (Migrated from github.com)

@xkr47 there are some more corner cases to work out, but that's for another day. 0.9.11 also integrates an HTML writer that properly handles merge cells as rowspan/colspan.

Even though the PR shows up as "closed" rather than "merged", you are marked as the author of commit 616d2e5

@xkr47 there are some more corner cases to work out, but that's for another day. 0.9.11 also integrates an HTML writer that properly handles merge cells as rowspan/colspan. Even though the PR shows up as "closed" rather than "merged", you are marked as the author of commit 616d2e5
xkr47 commented 2017-04-19 23:34:00 +00:00 (Migrated from github.com)

Ok, cool!
Yeah I noticed, 👍

Ok, cool! Yeah I noticed, :+1:

Pull request closed

Sign in to join this conversation.
No description provided.