crc-32 not working for >5Gb file . Support for large files #8

Closed
opened 2017-04-17 21:21:41 +00:00 by ghost · 4 comments
ghost commented 2017-04-17 21:21:41 +00:00 (Migrated from github.com)

I was able to calculate the crc-32 using this for smaller files. However, for larger files, it is unable to calculate the crc-32 value. I was trying to calculate using 5 GB file.

I was able to calculate the crc-32 using this for smaller files. However, for larger files, it is unable to calculate the crc-32 value. I was trying to calculate using 5 GB file.
SheetJSDev commented 2017-04-17 22:07:10 +00:00 (Migrated from github.com)

How are you reading the file? The module is not allocating too much memory so it may be an issue with loading the data in the first place.

How are you reading the file? The module is not allocating too much memory so it may be an issue with loading the data in the first place.
ghost commented 2017-04-27 15:47:39 +00:00 (Migrated from github.com)

I am using ng-file-upload directive for picking the file and getting it's details.
https://github.com/danialfarid/ng-file-upload

I am using ng-file-upload directive for picking the file and getting it's details. https://github.com/danialfarid/ng-file-upload
SheetJSDev commented 2017-04-27 17:21:23 +00:00 (Migrated from github.com)

@idnesh would it be possible to share a small demo using ng-file-upload and the crc-32 script? I can generate the large file manually, but I can't seem to load a file larger than 2 GB (chrome gives the "Aw, Snap!" error). Also, can you check whether you get the correct CRC32 for a 1 GB file?

If you are getting the data in chunks, make sure you repeatedly pass the previous CRC32 value, like in the example from https://github.com/SheetJS/js-crc32#usage :

// full data
CRC32.buf([ 83, 104, 101, 101, 116, 74, 83 ]) // -1647298270

// first two bytes
crc32 = CRC32.buf([83, 104]) // -1826163454
// then next 3 bytes (second argument is the previous crc32 value)
crc32 = CRC32.buf([101, 101, 116], crc32) // 1191034598
// finally last two bytes (second argument is the previous crc32 value)
crc32 = CRC32.buf([74, 83], crc32) // -1647298270
@idnesh would it be possible to share a small demo using ng-file-upload and the crc-32 script? I can generate the large file manually, but I can't seem to load a file larger than 2 GB (chrome gives the "Aw, Snap!" error). Also, can you check whether you get the correct CRC32 for a 1 GB file? If you are getting the data in chunks, make sure you repeatedly pass the previous CRC32 value, like in the example from https://github.com/SheetJS/js-crc32#usage : ```js // full data CRC32.buf([ 83, 104, 101, 101, 116, 74, 83 ]) // -1647298270 // first two bytes crc32 = CRC32.buf([83, 104]) // -1826163454 // then next 3 bytes (second argument is the previous crc32 value) crc32 = CRC32.buf([101, 101, 116], crc32) // 1191034598 // finally last two bytes (second argument is the previous crc32 value) crc32 = CRC32.buf([74, 83], crc32) // -1647298270 ```
SheetJSDev commented 2017-04-27 22:00:53 +00:00 (Migrated from github.com)

@idnesh I tested against larger files and it appears to work correctly. A new web demo http://oss.sheetjs.com/js-crc32/large.html now reads the data by chunks. Using the IE8 Win7 Modern.IE VMware VM:

https://az412801.vo.msecnd.net/vhd/VMBuild_20141027/VMware/IE8/Windows/IE8.Win7.For.Windows.VMware.zip

You can verify against the webpage:

crc32vm

and the command line tool:

## using the 1.0.2 version of the command line tool 
$ crc32 IE8.Win7.For.Windows.VMware.zip
1891069052
## using the OSX built-in cksum command
$ cksum -o 3 ~/Downloads/IE8.Win7.For.Windows.VMware.zip
1891069052 4161613172 ...

Keep in mind that the result is a signed 32 bit integer, and that may differ from your expectations if you are dealing with unsigned values. To convert to unsigned, just use crc >>> 0 -- the demo performs that shift to show unsigned

@idnesh I tested against larger files and it appears to work correctly. A new web demo http://oss.sheetjs.com/js-crc32/large.html now reads the data by chunks. Using the IE8 Win7 Modern.IE VMware VM: https://az412801.vo.msecnd.net/vhd/VMBuild_20141027/VMware/IE8/Windows/IE8.Win7.For.Windows.VMware.zip You can verify against the webpage: <img width="179" alt="crc32vm" src="https://cloud.githubusercontent.com/assets/6070939/25505862/c4156752-2b71-11e7-895b-de00c0a6ead8.png"> and the command line tool: ```bash ## using the 1.0.2 version of the command line tool $ crc32 IE8.Win7.For.Windows.VMware.zip 1891069052 ## using the OSX built-in cksum command $ cksum -o 3 ~/Downloads/IE8.Win7.For.Windows.VMware.zip 1891069052 4161613172 ... ``` Keep in mind that the result is a **signed 32 bit integer**, and that may differ from your expectations if you are dealing with unsigned values. To convert to unsigned, just use `crc >>> 0` -- [the demo performs that shift to show unsigned](https://github.com/SheetJS/js-crc32/blob/master/demo/worker.flow.js#L19-L21)
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/js-crc32#8
No description provided.