Result is a Signed 32bit value, is this known and wanted? #4

Closed
opened 2015-12-18 00:25:34 +00:00 by halfbyte · 2 comments
halfbyte commented 2015-12-18 00:25:34 +00:00 (Migrated from github.com)

I just spent a good hour trying to figure out why the results of the JS implementation did not match other implementation (I've specifically tested it against ruby's zLib wrapper) and then I understood: All the bitwise operators in JavaScript treat the variables as 32bit signed values. This is not a problem per se, as the bitwise operations are usually not dependent on the signs, but unfortunately the output is also 32bit signed. I've looked at various implementations now and it seems to me that usually, the output value of the crc process should be an unsigned 32 bit value.

Here's a simple test case (the numbers are not very relevant, it just takes a certain combination to trigger the sign problem):

# ruby example
a  =[ 68,69,77,79,220,187,0,0,0,0,0,0,0,0,0,
0,68,101,109,111,32,83,101,115,115,105,
111,110,32,32,32,32,32,32,32,32,32,32,32,
32,32,32,32,32,32,32,32,32,158,50,0,0,0,0,
0,0,0,0,0,0,0,7,0,0,0,3,0,0,0,3,0,0,1,1,0,0,
15,102,0,0,72,13,0,0,69,13,0,0,77,13,0,0,65,
13,0,0,0,0,0,0,0,0,0,0,0,96,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0]

a_string = a.map(&:chr).join('')

ZLib::crc32(a_string)
=> 2795502179
// JS example
a  =[ 68,69,77,79,220,187,0,0,0,0,0,0,0,0,0,
0,68,101,109,111,32,83,101,115,115,105,
111,110,32,32,32,32,32,32,32,32,32,32,32,
32,32,32,32,32,32,32,32,32,158,50,0,0,0,0,
0,0,0,0,0,0,0,7,0,0,0,3,0,0,0,3,0,0,1,1,0,0,
15,102,0,0,72,13,0,0,69,13,0,0,77,13,0,0,65,
13,0,0,0,0,0,0,0,0,0,0,0,96,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0];

CRC.buf(a)
=> -527573939

If you convert this value to an unsigned value (Stack Overflow never disappoints) by CRC.buf(a) >>> 0, you get the correct result.

I've also checked this against the crc32 utility of Mac OS X by converting the numbers into a text file and the result matches.

I''ll do a pull request if this makes things easier.

I just spent a good hour trying to figure out why the results of the JS implementation did not match other implementation (I've specifically tested it against ruby's zLib wrapper) and then I understood: All the bitwise operators in JavaScript treat the variables as 32bit signed values. This is not a problem per se, as the bitwise operations are usually not dependent on the signs, but unfortunately the output is also 32bit signed. I've looked at various implementations now and it seems to me that usually, the output value of the crc process should be an unsigned 32 bit value. Here's a simple test case (the numbers are not very relevant, it just takes a certain combination to trigger the sign problem): ``` ruby # ruby example a =[ 68,69,77,79,220,187,0,0,0,0,0,0,0,0,0, 0,68,101,109,111,32,83,101,115,115,105, 111,110,32,32,32,32,32,32,32,32,32,32,32, 32,32,32,32,32,32,32,32,32,158,50,0,0,0,0, 0,0,0,0,0,0,0,7,0,0,0,3,0,0,0,3,0,0,1,1,0,0, 15,102,0,0,72,13,0,0,69,13,0,0,77,13,0,0,65, 13,0,0,0,0,0,0,0,0,0,0,0,96,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0] a_string = a.map(&:chr).join('') ZLib::crc32(a_string) => 2795502179 ``` ``` js // JS example a =[ 68,69,77,79,220,187,0,0,0,0,0,0,0,0,0, 0,68,101,109,111,32,83,101,115,115,105, 111,110,32,32,32,32,32,32,32,32,32,32,32, 32,32,32,32,32,32,32,32,32,158,50,0,0,0,0, 0,0,0,0,0,0,0,7,0,0,0,3,0,0,0,3,0,0,1,1,0,0, 15,102,0,0,72,13,0,0,69,13,0,0,77,13,0,0,65, 13,0,0,0,0,0,0,0,0,0,0,0,96,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0]; CRC.buf(a) => -527573939 ``` If you convert this value to an unsigned value ([Stack Overflow](http://stackoverflow.com/questions/14890994/javascript-c-style-type-cast-from-signed-to-unsigned) never disappoints) by `CRC.buf(a) >>> 0`, you get the correct result. I've also checked this against the crc32 utility of Mac OS X by converting the numbers into a text file and the result matches. I''ll do a pull request if this makes things easier.
SheetJSDev commented 2016-01-08 16:41:56 +00:00 (Migrated from github.com)

This is indeed known, the behavior is documented in the README:

The return value is a signed 32-bit integer.

Is it desired? There were a few reasons for settling on the signed version:

  1. There is a tangible performance penalty for handling unsigned 32 bit integers in general. Consider the following reduced test cases:
/* Function input normally within range of a signed integer, try passing unsigned */
function f(x) { return x - 1; }
console.log("warming up");
for(var i = 0; i < 10000; ++i) f(i);
f(3);
f(-3);
f(2147483647);
console.log("passing a number larger than 2**32 - 1");
console.log(f(2147483648));

/* Function output normally within range of a signed integer, try returning unsigned */
function g(x) { return x + 1; }
console.log("warming up");
for(var i = 0; i < 10000; ++i) g(i);
g(3);
g(-3);
g(2147483646);
console.log("returning a number larger than 2**32 - 1");
console.log(g(2147483647));

If you run this in node with the flags --trace-opt --trace-deopt --trace-bailout you will see both functions deoptimize when dealing with a number that isn't a 32 bit signed integer.

  1. python 2.x zlib.crc32, which was used for generating the baselines, returns signed integers:
$ python --version
Python 2.7.5
$ python -c 'from zlib import crc32; print crc32("foobar")'
-1628037227
This is indeed known, the behavior is documented in the README: > The return value is a signed 32-bit integer. Is it desired? There were a few reasons for settling on the signed version: 1) There is a tangible performance penalty for handling unsigned 32 bit integers in general. Consider the following reduced test cases: ``` /* Function input normally within range of a signed integer, try passing unsigned */ function f(x) { return x - 1; } console.log("warming up"); for(var i = 0; i < 10000; ++i) f(i); f(3); f(-3); f(2147483647); console.log("passing a number larger than 2**32 - 1"); console.log(f(2147483648)); /* Function output normally within range of a signed integer, try returning unsigned */ function g(x) { return x + 1; } console.log("warming up"); for(var i = 0; i < 10000; ++i) g(i); g(3); g(-3); g(2147483646); console.log("returning a number larger than 2**32 - 1"); console.log(g(2147483647)); ``` If you run this in node with the flags `--trace-opt --trace-deopt --trace-bailout` you will see both functions deoptimize when dealing with a number that isn't a 32 bit signed integer. 2) python 2.x zlib.crc32, which was used for generating the [baselines](https://github.com/SheetJS/js-crc32/blob/master/misc/bits.js), returns signed integers: ``` $ python --version Python 2.7.5 $ python -c 'from zlib import crc32; print crc32("foobar")' -1628037227 ```
halfbyte commented 2016-01-08 20:01:14 +00:00 (Migrated from github.com)

@SheetJSDev Thanks for clarifying, especially regarding the performance penalty. I didn't think of that. will try to move my code to do the unsigned conversion at the very end of the checksum calculation.

the python example is interesting, as this seems to use the same binding as my ruby example. You never cease to learn :)

I'm currently using a somewhat patched version of your lib in my code anyway as I needed continuous summing, so I'll close this issue.

@SheetJSDev Thanks for clarifying, especially regarding the performance penalty. I didn't think of that. will try to move my code to do the unsigned conversion at the very end of the checksum calculation. the python example is interesting, as this seems to use the same binding as my ruby example. You never cease to learn :) I'm currently using a somewhat patched version of your lib in my code anyway as I needed continuous summing, so I'll close this issue.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sheetjs/js-crc32#4
No description provided.