Is it possible to start parsing not from the first row? #404

Closed
opened 2016-04-18 20:29:43 +00:00 by ronilitman · 3 comments
ronilitman commented 2016-04-18 20:29:43 +00:00 (Migrated from github.com)

Hey,

I am trying to import an excel file which the first 2 rows are not the content I need, and I somehow need to start from the 3rd row(my 3rd row is the columns, and so on).

Is it possible? If not, Deleting the entire first and second rows is a good solution as well..

Tought about putting it as an option on XLSX.read but with no success..

Any suggestions?

Thanks!

Hey, I am trying to import an excel file which the first 2 rows are not the content I need, and I somehow need to start from the 3rd row(my 3rd row is the columns, and so on). Is it possible? If not, Deleting the entire first and second rows is a good solution as well.. Tought about putting it as an option on XLSX.read but with no success.. Any suggestions? Thanks!
SysProfile commented 2016-05-11 14:33:44 +00:00 (Migrated from github.com)

Try this

function sheet_from_array_of_arrays(data, row_start) {
  if (row_start == undefined) {
    row_start = 0;
  }
  var row_num = data.length;
  var keys = Object.keys(data[0])
  var col_num = keys.length;
  var ws = {};
  var range = { s: { c: 0, r: 0 }, e: { c: col_num, r: row_start + row_num } };
  ws['!ref'] = XLSX.utils.encode_range(range);
  for (var R = 0; R < row_num; R++) {
    for (var C = 0; C < col_num; C++) {
      var cell_ref = XLSX.utils.encode_cell({ c: C, r: R + row_start });
      var cell = { v: data[R][keys[C]] };
      if (cell.v == null || cell.v == undefined) {
        cell.v = '';
        cell.t = 's';
      } else {
        if (typeof cell.v === 'number') {
          cell.t = 'n';
        } else if (typeof cell.v === 'boolean') {
          cell.t = 'b';
        } else {
          if (isNaN(cell.v) === true && cell.v.toString().substring(11, 10) == "T" && cell.v.toString().substring(19, 25) == ".000Z") {
            cell.t = 'n';
            cell.z = XLSX.SSF._table[14];
            cell.v = datenum(new Date(cell.v.toString().substring(0, 10)));
          } else {
            cell.t = 's';
          }
        }
      }
      ws[cell_ref] = cell;
    }
  }
  return ws;
}
Try this ``` function sheet_from_array_of_arrays(data, row_start) { if (row_start == undefined) { row_start = 0; } var row_num = data.length; var keys = Object.keys(data[0]) var col_num = keys.length; var ws = {}; var range = { s: { c: 0, r: 0 }, e: { c: col_num, r: row_start + row_num } }; ws['!ref'] = XLSX.utils.encode_range(range); for (var R = 0; R < row_num; R++) { for (var C = 0; C < col_num; C++) { var cell_ref = XLSX.utils.encode_cell({ c: C, r: R + row_start }); var cell = { v: data[R][keys[C]] }; if (cell.v == null || cell.v == undefined) { cell.v = ''; cell.t = 's'; } else { if (typeof cell.v === 'number') { cell.t = 'n'; } else if (typeof cell.v === 'boolean') { cell.t = 'b'; } else { if (isNaN(cell.v) === true && cell.v.toString().substring(11, 10) == "T" && cell.v.toString().substring(19, 25) == ".000Z") { cell.t = 'n'; cell.z = XLSX.SSF._table[14]; cell.v = datenum(new Date(cell.v.toString().substring(0, 10))); } else { cell.t = 's'; } } } ws[cell_ref] = cell; } } return ws; } ```
SheetJSDev commented 2017-02-23 06:30:02 +00:00 (Migrated from github.com)

@ronilitman if you want to skip the first two rows (so your header is in the third row and the data starts in the fourth row), rewrite the sheet's range:

function skip_first_n_rows(worksheet, n) {
    var old_range = worksheet['!ref'];  // <-- this is the old range, say A1:E1000
    var range = XLSX.utils.decode_range(old_range); // <-- this is an object e.g. {s:{r:0,c:0},e:{r:999,c:4}}
    range.s.r = n; // <-- this line sets the starting row
    worksheet['!ref'] = XLSX.utils.encode_range(range); // <-- this assigns the range back to the worksheet
}

Then all of the utility functions will automatically skip those rows.

@ronilitman if you want to skip the first two rows (so your header is in the third row and the data starts in the fourth row), rewrite the sheet's range: ```js function skip_first_n_rows(worksheet, n) { var old_range = worksheet['!ref']; // <-- this is the old range, say A1:E1000 var range = XLSX.utils.decode_range(old_range); // <-- this is an object e.g. {s:{r:0,c:0},e:{r:999,c:4}} range.s.r = n; // <-- this line sets the starting row worksheet['!ref'] = XLSX.utils.encode_range(range); // <-- this assigns the range back to the worksheet } ``` Then all of the utility functions will automatically skip those rows.
SheetJSDev commented 2017-06-02 00:53:27 +00:00 (Migrated from github.com)

As stated in the earlier comment, you can always directly manipulate the worksheet range and the various utilities do the right thing.

However, there are certain features like merge cells which make it really difficult to do properly. For example, suppose you had a worksheet where the merge extends from a row you want to skip into a row you want to keep:

screen shot 2017-06-01 at 20 45 11

With this example, how do you handle that merge cell B1:C2 if you intend to skip the first row? Do you skip it? Do you retain it? Do you twist the cell to fit in the range (in this case, rewriting the merge cell as if it only merge B2:C2)? There is no good answer to the question. Manipulating the range is currently equivalent to the first case because any other cell in the range is undefined (ws.B1 is defined but ws.B2 and ws.C1 and ws.C2 are all undefined).

In absence of an obvious strategy, I'm inclined to close. @ronilitman if you can give a compelling argument in favor of one of the strategies, it's not too difficult to add in.

As stated in [the earlier comment](https://github.com/SheetJS/js-xlsx/issues/404#issuecomment-281907217), you can always directly manipulate the worksheet range and the various utilities do the right thing. However, there are certain features like merge cells which make it really difficult to do properly. For example, suppose you had a worksheet where the merge extends from a row you want to skip into a row you want to keep: <img width="318" alt="screen shot 2017-06-01 at 20 45 11" src="https://cloud.githubusercontent.com/assets/6070939/26706427/5985b436-470b-11e7-9a9c-bfba8e1bc18b.png"> With this example, how do you handle that merge cell B1:C2 if you intend to skip the first row? Do you skip it? Do you retain it? Do you twist the cell to fit in the range (in this case, rewriting the merge cell as if it only merge B2:C2)? There is no good answer to the question. Manipulating the range is currently equivalent to the first case because any other cell in the range is undefined (ws.B1 is defined but ws.B2 and ws.C1 and ws.C2 are all undefined). In absence of an obvious strategy, I'm inclined to close. @ronilitman if you can give a compelling argument in favor of one of the strategies, it's not too difficult to add in.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#404
No description provided.