Parsing HTML table string with XLSX.read ignores <th> elements #1090

Closed
opened 2018-05-03 09:01:12 +00:00 by GigiSan · 1 comment
GigiSan commented 2018-05-03 09:01:12 +00:00 (Migrated from github.com)

I'm trying to convert an HTML table to a CSV file. I have to do the conversion server-side so I pass the table's outerHTML as a string via an $.ajax request to the Node.js server.

It seems like the <th> tags are ignored and not transferred to the workbook. Is there a way to import them aswell or are they not managed by the library itself? I tried doing a quick search on the codebase and couldn't find any "th", but I'm pretty new to GitHub and modules' structure so I might be missing something.

The table looks something like this:

<table>
  <thead>
    <tr>
      <th>Row #</th>
      <th>Label</th>
      <th>Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>SAMPLE_TEXT</td>
      <td>SUCCESS</td>
    </tr>
    <tr>
      <td>2</td>
      <td>SAMPLE_TEXT_WITH_STRING</td>
      <td>ERROR</td>
    </tr>
  </tbody>
</table>

The parsing code, which is executed server-side with Node.js, is the following:

(...)
var workbook = XLSX.read(table, {
  type: "string"
});
return resolve(XLSX.write(workbook, {
  bookType: "csv",
  type: "buffer"
}));

The resulting CSV is the following:

1,SAMPLE_TEXT,SUCCESS
2,SAMPLE_TEXT_WITH_STRING,ERROR

UPDATE:
Replacing <th> with <td> works, even if inside <thead>

CSV:

Row #,Label,Result
1,SAMPLE_TEXT,SUCCESS
2,SAMPLE_TEXT_WITH_STRING,ERROR

Still, it would be nice if <th> was parsed too. ☺

I'm trying to convert an HTML table to a CSV file. I have to do the conversion server-side so I pass the table's `outerHTML ` as a string via an `$.ajax` request to the Node.js server. It seems like the `<th>` tags are ignored and not transferred to the workbook. Is there a way to import them aswell or are they not managed by the library itself? I tried doing a quick search on the codebase and couldn't find any "th", but I'm pretty new to GitHub and modules' structure so I might be missing something. The table looks something like this: ``` <table> <thead> <tr> <th>Row #</th> <th>Label</th> <th>Result</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>SAMPLE_TEXT</td> <td>SUCCESS</td> </tr> <tr> <td>2</td> <td>SAMPLE_TEXT_WITH_STRING</td> <td>ERROR</td> </tr> </tbody> </table> ``` The parsing code, which is executed server-side with Node.js, is the following: ``` (...) var workbook = XLSX.read(table, { type: "string" }); return resolve(XLSX.write(workbook, { bookType: "csv", type: "buffer" })); ``` The resulting CSV is the following: ``` 1,SAMPLE_TEXT,SUCCESS 2,SAMPLE_TEXT_WITH_STRING,ERROR ``` **UPDATE:** Replacing `<th>` with `<td>` works, even if inside `<thead>` CSV: ``` Row #,Label,Result 1,SAMPLE_TEXT,SUCCESS 2,SAMPLE_TEXT_WITH_STRING,ERROR ``` Still, it would be nice if `<th>` was parsed too. ☺
SheetJSDev commented 2018-05-03 20:03:19 +00:00 (Migrated from github.com)

Good catch, we're pushing a fix today

Good catch, we're pushing a fix today
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1090
No description provided.