Reading Excel HTML files will misalign cells in some cases #1621

Closed
opened 2019-09-12 08:06:23 +00:00 by KurtMar · 1 comment
KurtMar commented 2019-09-12 08:06:23 +00:00 (Migrated from github.com)

I noticed a bug in the html_to_sheet function. When reading cells that are empty, the cell index is correctly incremented, but if the cell has HTML tags and is observed to be empty after stripping the tags, the cell index is not incremented. This results in misalignment of the subsequent cells:

e3c5eac99c/xlsx.js (L18956-L18960)

I noticed a bug in the _html_to_sheet_ function. When reading cells that are empty, the cell index is correctly incremented, but if the cell has HTML tags and is observed to be empty after stripping the tags, the cell index is not incremented. This results in misalignment of the subsequent cells: https://github.com/SheetJS/js-xlsx/blob/e3c5eac99c3b2be6929adfca455c1be87fab792b/xlsx.js#L18956-L18960
SheetJSDev commented 2019-09-12 21:06:56 +00:00 (Migrated from github.com)

Thanks for reporting @KurtMar , indeed that line should be the same as the previous check. We'd accept a PR, please change the line in bits/79_html.js

Simple repro:

<table>
	<tr>
		<td>abc</td>
		<td><b> </b></td>
		<td>def</td>
	</tr>
<table>
Thanks for reporting @KurtMar , indeed that line should be the same as the previous check. We'd accept a PR, please change the line in [`bits/79_html.js`](https://github.com/SheetJS/js-xlsx/blob/master/bits/79_html.js#L40) Simple repro: ```html <table> <tr> <td>abc</td> <td><b> </b></td> <td>def</td> </tr> <table> ```
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1621
No description provided.