Unable to parse file, if first header cell starts with '<' symbol #1238

Closed
opened 2018-08-25 09:51:12 +00:00 by GienekGenin · 3 comments
GienekGenin commented 2018-08-25 09:51:12 +00:00 (Migrated from github.com)
  1. CSV file content:

untitled

  1. Code
// const workbook = XLSX.readFile(path);
const parseHeaders = workbook => {
	const sheet = workbook.Sheets[workbook.SheetNames[0]];
	const range = XLSX.utils.decode_range(sheet['!ref']);
	let headers = [];
	let C;
	const R = range.s.r;
	/* start in the first row */
	/* walk every column in the range */
	let countBreak = 0;
	let buffer = [];
	for (C = range.s.c; C <= range.e.c; C += 1) {
		const cell = sheet[XLSX.utils.encode_cell({ c: C, r: R })];
		let hdr = 'UNKNOWN'; // <-- replace with your desired default
		if (!cell || cell.v === '' || cell.v === ' ') {
			countBreak += 1;
			/* if 3 cell in a row undefined - break */
			if (countBreak === 3) {
				return renameFiles(headers);
			}
			buffer.push(hdr);
		}
		/* find the cell in the first row */
		if (cell && cell.t && cell.v !== '' && cell.v !== ' ') {
			countBreak = 0;
			hdr = XLSX.utils.format_cell(cell);
			if (buffer.length >= 1) {
				headers = headers.concat(buffer);
			}
			buffer = [];
			headers.push(hdr);
		}
	}
	return renameFiles(headers);
};
  1. Error: UnhandledPromiseRejectionWarning: TypeError: Cannot read property '!ref' of undefined
workbook.Sheets // { }
workbook.SheetNames // [ ]
  1. Without this < works just fine

untitled
1
Thanks for your work. This library is truly awesome.

1. CSV file content: ![untitled](https://user-images.githubusercontent.com/27743401/44617139-ea3b9b80-a85d-11e8-84e9-860c411fa94c.png) 2. Code ``` // const workbook = XLSX.readFile(path); const parseHeaders = workbook => { const sheet = workbook.Sheets[workbook.SheetNames[0]]; const range = XLSX.utils.decode_range(sheet['!ref']); let headers = []; let C; const R = range.s.r; /* start in the first row */ /* walk every column in the range */ let countBreak = 0; let buffer = []; for (C = range.s.c; C <= range.e.c; C += 1) { const cell = sheet[XLSX.utils.encode_cell({ c: C, r: R })]; let hdr = 'UNKNOWN'; // <-- replace with your desired default if (!cell || cell.v === '' || cell.v === ' ') { countBreak += 1; /* if 3 cell in a row undefined - break */ if (countBreak === 3) { return renameFiles(headers); } buffer.push(hdr); } /* find the cell in the first row */ if (cell && cell.t && cell.v !== '' && cell.v !== ' ') { countBreak = 0; hdr = XLSX.utils.format_cell(cell); if (buffer.length >= 1) { headers = headers.concat(buffer); } buffer = []; headers.push(hdr); } } return renameFiles(headers); }; ``` 3. Error: UnhandledPromiseRejectionWarning: TypeError: Cannot read property '!ref' of undefined ``` workbook.Sheets // { } workbook.SheetNames // [ ] ``` 4. Without this ```<``` works just fine ![untitled](https://user-images.githubusercontent.com/27743401/44617132-d09a5400-a85d-11e8-931c-904cb0e8b6de.png) ![1](https://user-images.githubusercontent.com/27743401/44617113-95982080-a85d-11e8-8299-349a675294b6.png) Thanks for your work. This library is truly **```awesome```**.
SheetJSDev commented 2018-08-25 10:46:20 +00:00 (Migrated from github.com)

When the file starts with < the library currently guesses that the file is either HTML or Flat ODS or SpreadsheetML 2003).

If you have access to Excel, try saving the contents of that file to test.csv and test.xls

<Title,New Title.
text,some text

Excel's interpretation is unexpected!

CSV (as you would expect)

screen shot 2018-08-25 at 6 41 03 am

XLS (surprising)

screen shot 2018-08-25 at 6 41 21 am

It's even more amusing if you put an actual HTML tag in there, like <table>,New Title. (blank file)

The current behavior is clearly incorrect, but what is the correct behavior?

When the file starts with `<` the library currently guesses that the file is either HTML or Flat ODS or SpreadsheetML 2003). If you have access to Excel, try saving the contents of that file to `test.csv` and `test.xls` ``` <Title,New Title. text,some text ``` Excel's interpretation is unexpected! CSV (as you would expect) <img width="142" alt="screen shot 2018-08-25 at 6 41 03 am" src="https://user-images.githubusercontent.com/6070939/44617527-f3af0e80-a831-11e8-8f5e-86af803a9b87.png"> XLS (surprising) <img width="108" alt="screen shot 2018-08-25 at 6 41 21 am" src="https://user-images.githubusercontent.com/6070939/44617528-f90c5900-a831-11e8-9930-97e1d0da2355.png"> It's even more amusing if you put an actual HTML tag in there, like `<table>,New Title.` (blank file) The current behavior is clearly incorrect, but what is the correct behavior?
GienekGenin commented 2018-08-25 14:27:04 +00:00 (Migrated from github.com)

I would expect to get string '<Title' header.

I use parseHeaders function to get headers from CSV and Excel files wich user can parse from file or link, and subsequently form a payload wich you can see above, but apparently if first header's cell contains symbol < or any <tag>, the XLSX.readFile(path) function can't properly create workbook object.

What can You advise on this?

I would expect to get string `'<Title'` header. I use `parseHeaders` function to get headers from CSV and Excel files wich user can parse from file or link, and subsequently form a payload wich you can see above, but apparently if first header's cell contains symbol `<` or any `<tag>`, the `XLSX.readFile(path)` function can't properly create workbook object. What can You advise on this?
aaronArinder commented 2019-04-15 14:00:33 +00:00 (Migrated from github.com)

Ran into this problem, too.

Ran into this problem, too.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/sheetjs#1238
No description provided.