545 lines
19 KiB
Plaintext
545 lines
19 KiB
Plaintext
WORKSHEET FILE FORMAT
|
||
FROM LOTUS
|
||
|
||
APPENDIX B - THE FORMULA COMPILER
|
||
|
||
Copyright(c) 1984, Lotus Development Corporation
|
||
161 First Street
|
||
Cambridge, Massachusetts 02142
|
||
(617) 492-7171
|
||
Electronic Edition, December, 1984
|
||
All Rights Reserved
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
APPENDIX B: The Formula Compiler
|
||
|
||
This appendix describes the internal workings of the formula compiler. The
|
||
compiler transforms an ASCII string of characters representing a formula to
|
||
its Reverse Polish code. The basic algorithm utilizes and SR parser (SR =
|
||
shift and reduce). The aim of the parser is to apply a set of reduction
|
||
rules which embody the syntax of the compiler to an input string. Formula
|
||
code is compiled to a temporary buffer.
|
||
|
||
Lexicon Analysis
|
||
|
||
A lexical analyzer breaks up the input string into lexical units called
|
||
tokens. A token is a substring of the original input string operand,
|
||
operator, or special symbol (such as comma, parentheses, etc.) In addition,
|
||
the lexical analyser supplies two special tokens, "beginning of formula"
|
||
(boform) and "end of formula" (eoform), to facilitate the compilation
|
||
process. The lexical analyzer identifies and processes literals (both
|
||
number and string), cell and range references, operators, and function
|
||
calls. It assigns a unique code to each distinct operator, function, or
|
||
type of operand.
|
||
|
||
A function with no arguments is treated like a number.
|
||
|
||
Syntax Analysis
|
||
|
||
The syntactical analysis of a formula is accomplished by processing a list
|
||
of tokens in left-to-right order. A stack called the syntax is also used
|
||
during the syntactical scan. The basic algorithm is as follows:
|
||
|
||
Repeat the following steps:
|
||
|
||
1) Get the next token
|
||
|
||
2) If the token is a literal or cell reference:
|
||
a) Push the number code on the syntax stack
|
||
b) Push the number code on the syntax stack
|
||
|
||
3) If the token is a range reference:
|
||
a) Compile code to push the range reference
|
||
b) Push the range code on the syntax stack
|
||
|
||
4) Otherwise push the token code for the token on the syntax stack.
|
||
|
||
For each syntax rule, if the pattern on the top of the syntax matches the
|
||
rule pattern take the action associated with the rule and start scanning
|
||
from the beginning for any additional rules which may apply.
|
||
|
||
When a token code is pushed on the syntax stack, an additional word of
|
||
zeros is also pushed on the stack. This is used when compiling function
|
||
calls to hold the function's argument count.
|
||
|
||
|
||
|
||
|
||
|
||
Rule Matching
|
||
|
||
A relatively small number of rules are used to process formulas of arbitrary
|
||
complexity. If a rule matches the top of the syntax stack, then the
|
||
compiler takes a specific action and rule scanning starts again with the
|
||
first rule. Each rule matches certain patterns on the syntax stack. A
|
||
typical rule might be: if the top of the stack is the token for right
|
||
parenthesis, and the next-to-top is a number, and the second form the top
|
||
is a left parenthesis, then pop the top three items from the syntax stack
|
||
and push the number on the syntax stack.
|
||
|
||
This rule can be more succinctly represented as:
|
||
|
||
Stack
|
||
|
||
Before After Action
|
||
)
|
||
number
|
||
( number none
|
||
|
||
|
||
|
||
The Rules
|
||
|
||
|
||
The following are the syntax rules used to process formulas. Note that the
|
||
order of the rules is important. The rules for compilation of operators
|
||
used additional tables which assign a precedence number and opcode to each
|
||
legal unary and binary operator. Thus, for example, there is a single
|
||
token code for minus sign (-), but there are two opcodes one for unary
|
||
minus and one for binary minus. In addition, these two operators, while
|
||
lexically identical, also have different precedence. In general, operators
|
||
of higher precedence will be performed before operators of lower precedence
|
||
are performed left-to-right. All special operators (boform, eoform,
|
||
parentheses, comma, etc.) are implicitly assigned a precedence of zero.
|
||
|
||
Rule 1 Termination test
|
||
|
||
Stack
|
||
|
||
Before After Action
|
||
eoform Output a return code to compile buffer
|
||
number Return, indicating successful compile
|
||
boform
|
||
|
||
Rule 2 Function argument processing
|
||
|
||
Stack
|
||
Before After Action
|
||
' Error if range argument illegal for
|
||
number or range function.
|
||
( ( Increment argument count on stack
|
||
function function
|
||
|
||
Rule 3 Process final function argument
|
||
|
||
Stack
|
||
Before After Action
|
||
) Error if range argument illegal for
|
||
number or range function.
|
||
( Increment argument count on stack
|
||
function number Compile function opcode
|
||
If list function, compile argument
|
||
count; otherwise error is wrong
|
||
argument count.
|
||
|
||
|
||
|
||
|
||
Rule 4 Parenthesis removal
|
||
|
||
Stack
|
||
Before After Action
|
||
) Compile parenthesis opcode
|
||
number
|
||
( number
|
||
operator operator
|
||
|
||
|
||
|
||
Rule 5 Binary operators
|
||
|
||
Stack
|
||
Before After Action
|
||
op2 If binary op<binary op, rule does
|
||
number not match. Otherwise, compile opcode
|
||
op1 op2 for operator op1.
|
||
|
||
|
||
Rule 6 Unary operators
|
||
|
||
Stack
|
||
Before After Action
|
||
op2 I unary op<binary op, rule does
|
||
number op2 not match. Otherwise, compile opcode.
|
||
op1 number for operator op 1.
|
||
|
||
|
||
Rule 7 Error detection
|
||
|
||
Stack
|
||
Before After Action
|
||
eoform Return indicating unsuccessful compile
|
||
|
||
|
||
|
||
|
||
|
||
Table 9 Operator Precedence Table
|
||
|
||
Operator Unary Precedence Binary Precedence
|
||
+ 6 4
|
||
- 6 4
|
||
* na 5
|
||
/ na 7
|
||
^ na 3
|
||
= na 3
|
||
< > na 3
|
||
< = na 3
|
||
> = na 3
|
||
< na 3
|
||
> na 3
|
||
#and# na 1
|
||
#or# na 1
|
||
#not# 2 na
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Example:
|
||
|
||
Using the above rules, we can now see how a particular formula is
|
||
compiled. Let us consider the following formula:
|
||
|
||
3+5*6
|
||
|
||
This is broken up by the lexical analyzer into seven tokens.
|
||
|
||
boform
|
||
3
|
||
+
|
||
5
|
||
*
|
||
6
|
||
eoform
|
||
|
||
The syntax scans proceed as follows until a matching rule is found:
|
||
|
||
Stack
|
||
|
||
boform number + number
|
||
boform number +
|
||
boform number
|
||
boform
|
||
|
||
Compile buffer
|
||
|
||
push 3 push 3 push 3
|
||
push 5
|
||
|
||
At this point, rule 5 is invoked, but since the precedence of boform is
|
||
zero, no action is taken.
|
||
|
||
Stack
|
||
|
||
* number
|
||
number *
|
||
+ number
|
||
number +
|
||
boform number
|
||
boform
|
||
|
||
Compile buffer
|
||
|
||
push 3 push 3
|
||
push 5 push 5
|
||
push 6
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
At this point, since the binary precedence of + is lower than the binary
|
||
precedence of *, rule 5 does apply, and the opcode for * is compiled. The
|
||
stack is reduced by replacing number * number by number and scan is made,
|
||
but no further rule applies.
|
||
|
||
|
||
Stack
|
||
|
||
number eoform
|
||
+ number
|
||
number +
|
||
boform number
|
||
boform
|
||
|
||
Compile buffer
|
||
|
||
push 3 push 3
|
||
push 5 push 5
|
||
push 6 push 6
|
||
|
||
|
||
|
||
Rule 5 applies again, and the opcode for + is compiled, reducing the stack
|
||
to boform, number, eoform. Rescanning finds a match on rule 1 which
|
||
compiles a return opcode and terminates. The final compiled code is thus:
|
||
|
||
push 3
|
||
push 5
|
||
push 6
|
||
*
|
||
+
|
||
return
|
||
|
||
A Note on the Decompiler
|
||
|
||
The algorithm for the formula decompiler was taken verbatim from:
|
||
|
||
Writing Interactive Compilers and Interpreters, P.J. Brown, John Wiley and
|
||
Sons, 1979. See chapter 6.2. The algorithm itself is described on pages
|
||
216 and 217.
|
||
|
||
This algorithm is also described in the following article.
|
||
|
||
More on the Re-creation of Source Code from Reverse Polish, P.J. Brown,
|
||
Software Practice and Experience, Vol 7, 545-551 (1977).
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
WORKSHEET COLUMN DESIGNATORS
|
||
|
||
Most records within the 1-2-3 Condensed Worksheet format are specified
|
||
with column/row designators (for example, column 0, row 0 equals A1). When
|
||
determining the column designator, the table below will help make
|
||
conversion easier.
|
||
|
||
|
||
Column Hex Dec Column Hex Dec Column Hex Dec
|
||
A 0 1 BA 34 52 DA 68 104
|
||
B 1 1 BB 35 53 DB 69 105
|
||
C 2 2 BC 36 54 DC 6A 106
|
||
D 3 3 BD 37 55 DD 6B 107
|
||
E 4 4 BE 38 56 DE 6C 108
|
||
F 5 5 BF 39 57 DF 6D 109
|
||
G 6 6 BG 3A 58 DG 6E 110
|
||
H 7 7 BH 3B 59 DH 6F 111
|
||
I 8 8 BI 3C 60 DI 70 112
|
||
J 9 9 BJ 3D 61 DJ 71 113
|
||
K A 10 BK 3E 62 DK 72 114
|
||
L B 11 BL 3F 63 DL 73 115
|
||
M C 12 BM 40 64 DM 74 116
|
||
N D 13 BN 41 65 DN 75 117
|
||
O E 14 BO 42 66 DO 76 118
|
||
P F 15 BP 43 67 DP 77 119
|
||
Q 10 16 BQ 44 68 DQ 78 120
|
||
R 11 17 BR 45 69 DR 79 121
|
||
S 12 18 BS 46 70 DS 7A 122
|
||
T 13 19 BT 47 71 DT 7B 123
|
||
U 14 20 BU 48 72 DU 7C 124
|
||
V 15 21 BV 49 73 DV 7D 125
|
||
W 16 22 BW 4A 74 DW 7E 126
|
||
X 17 23 BX 4B 75 DX 7F 127
|
||
Y 18 24 BY 4C 76 DY 80 128
|
||
Z 19 25 BZ 4D 77 DZ 81 129
|
||
AA 1A 26 CA 4E 78 EA 82 130
|
||
AB 1B 27 CB 4F 79 EB 83 131
|
||
AC 1C 28 CC 50 80 EC 84 132
|
||
AD 1D 29 CD 51 81 ED 85 133
|
||
AE 1E 30 CE 52 82 EE 86 134
|
||
AF 1F 31 CF 53 83 EF 87 135
|
||
AG 20 32 CG 54 84 EG 88 136
|
||
AH 21 33 CH 55 85 EH 89 137
|
||
AI 22 34 CI 56 86 EI 8A 138
|
||
AJ 23 35 CJ 57 87 EJ 8B 139
|
||
AK 24 36 CK 58 88 EK 8C 140
|
||
AL 25 37 CL 59 89 EL 8D 141
|
||
AM 26 38 CM 5A 90 EM 8E 142
|
||
AN 27 39 CN 5B 91 EN 8F 143
|
||
AO 28 40 CO 5C 92 EO 90 144
|
||
AP 29 41 CP 5D 93 EP 91 145
|
||
AQ 2A 42 CQ 5E 94 EQ 92 146
|
||
AR 2B 43 CR 5F 95 ER 93 147
|
||
AS 2C 44 CS 60 96 ES 94 148
|
||
AT 2D 45 CT 61 97 ET 95 149
|
||
AU 2E 46 CU 62 98 EU 96 150
|
||
AV 2F 47 CV 63 99 EV 97 151
|
||
AW 30 48 CW 64 100 EW 98 152
|
||
AX 31 49 CX 65 101 EX 99 153
|
||
AY 32 50 CY 66 102 EY 9A 154
|
||
AZ 33 51 CZ 67 103 EZ 9B 155
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
(CONTINUED)
|
||
|
||
|
||
|
||
|
||
Column Hex Dec Column Hex Dec
|
||
|
||
FA 9C 156 HA DO 208
|
||
FB 9D 157 HB D1 209
|
||
FC 9E 158 HC D2 210
|
||
FD 9F 159 HD D3 211
|
||
FE AO 160 HE D4 212
|
||
FF A1 161 HF D5 213
|
||
FG A2 162 HG D6 214
|
||
FH A3 163 HH D7 215
|
||
FI A4 164 HI D8 216
|
||
FJ A5 165 HJ D9 217
|
||
FK A6 166 HK DA 218
|
||
FL A7 167 HL DB 219
|
||
FM A8 168 HM DC 220
|
||
FN A9 169 HN DD 221
|
||
FO AA 170 HO DE 222
|
||
FP AB 171 HP DF 223
|
||
FQ AC 172 HQ EO 224
|
||
FR AD 173 HR E1 225
|
||
FS AE 174 HS E2 226
|
||
FT AF 175 HT E3 227
|
||
FU BO 176 HU E4 228
|
||
FV B1 177 HV E5 229
|
||
FW B2 178 HW E6 230
|
||
FX B3 179 HX E7 231
|
||
FY B4 180 HY E8 232
|
||
FZ B5 181 HZ E9 233
|
||
GA B6 182 IA EA 234
|
||
GB B7 183 IB EB 235
|
||
GC B8 184 IC EC 236
|
||
GD B9 185 ID ED 237
|
||
GE BA 186 IE EE 238
|
||
GF BB 187 IF EF 239
|
||
GG BC 188 IG FO 240
|
||
GH BD 189 IH F1 241
|
||
GI BE 190 II F2 242
|
||
GJ BF 191 IJ F3 243
|
||
GK CO 192 IK F4 244
|
||
GL C1 193 IL F5 245
|
||
GM C2 195 IM F6 246
|
||
GN C3 195 IN F7 247
|
||
GO C4 196 IO F8 248
|
||
GP C5 197 IP F9 249
|
||
GQ C6 198 IQ FA 250
|
||
GR C7 199 IR FB 251
|
||
GS C8 200 IS FC 252
|
||
GT C9 201 IT FD 253
|
||
GU CA 202 IU FE 254
|
||
GV CB 203 IV FF 255
|
||
GW CC 204
|
||
GX CD 205
|
||
GY CE 206
|
||
GZ CF 207
|
||
|
||
|
||
|
||
|
||
ANALYSIS OF 1-2-3 WORKSHEET FILE
|
||
|
||
The worksheet shown below was created in 1-2-3 and saved to disk.
|
||
|
||
|
||
|
||
Key:
|
||
|
||
A2..A5 Named Range (code 11)
|
||
EXAMPLE A2: Label (code 15)
|
||
100 A3: Integer (code 13)
|
||
12.5 A4: Number (code 14)
|
||
87.5 A5: Formula (+A3-A4)
|
||
(code 16)
|
||
|
||
|
||
The example shown below is a partial hex dump of this worksheet file. By
|
||
reading each record header, you can determine the type of record you are
|
||
encountering. The record header will also tell you the length of that
|
||
follows the header. By analyzing the record header, you can read the
|
||
records you want and skip unrelated records.
|
||
|
||
|
||
362B:0100 06 00 08 00 00 00 00 00 00 00
|
||
362B:0110 04 00 2F 00 01 00 01 02 00 01 00 FF 03 00 01 00
|
||
362B:0120 00 04 00 01 00 00 05 00 01 00 FF 07 00 1F 00 00
|
||
362B:0130 00 01 00 71 00 09 00 08 00 14 00 00 00 00 00 00
|
||
362B:0140 00 00 00 00 00 00 00 04 00 04 00 48 00 00 0B 00
|
||
362B:0150 18 00 54 45 53 54 00 00 00 00 00 00 00 00 00 00
|
||
362B:0160 00 00 00 00 01 00 00 00 04 00 18 00 19 00 00 FF
|
||
362B:0170 FF 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF
|
||
362B:0180
|
||
|
||
|
||
362B:05C0
|
||
362B:05D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
||
362B:05E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
||
362B:05F0 00 00 00 00 71 71 01 00 0F 00 0E 00 FF 00 00 01
|
||
362B:0600 00 27 45 58 41 4D 50 4C 45 00 0D 00 07 00 FF 00
|
||
362B:0610 00 02 00 64 00
|
||
362B:0620 10 00 1B 00 FF 00 00 04 00 00
|
||
362B:0630 00 00 00 00 E0 55 40 0C 00 01 00 80 FE BF 01 00
|
||
362B:0640 80 FF BF 0A 03
|
||
|