notes/lotus/WSFF4.TXT

545 lines
19 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

WORKSHEET FILE FORMAT
FROM LOTUS
APPENDIX B - THE FORMULA COMPILER
Copyright(c) 1984, Lotus Development Corporation
161 First Street
Cambridge, Massachusetts 02142
(617) 492-7171
Electronic Edition, December, 1984
All Rights Reserved
APPENDIX B: The Formula Compiler
This appendix describes the internal workings of the formula compiler. The
compiler transforms an ASCII string of characters representing a formula to
its Reverse Polish code. The basic algorithm utilizes and SR parser (SR =
shift and reduce). The aim of the parser is to apply a set of reduction
rules which embody the syntax of the compiler to an input string. Formula
code is compiled to a temporary buffer.
Lexicon Analysis
A lexical analyzer breaks up the input string into lexical units called
tokens. A token is a substring of the original input string operand,
operator, or special symbol (such as comma, parentheses, etc.) In addition,
the lexical analyser supplies two special tokens, "beginning of formula"
(boform) and "end of formula" (eoform), to facilitate the compilation
process. The lexical analyzer identifies and processes literals (both
number and string), cell and range references, operators, and function
calls. It assigns a unique code to each distinct operator, function, or
type of operand.
A function with no arguments is treated like a number.
Syntax Analysis
The syntactical analysis of a formula is accomplished by processing a list
of tokens in left-to-right order. A stack called the syntax is also used
during the syntactical scan. The basic algorithm is as follows:
Repeat the following steps:
1) Get the next token
2) If the token is a literal or cell reference:
a) Push the number code on the syntax stack
b) Push the number code on the syntax stack
3) If the token is a range reference:
a) Compile code to push the range reference
b) Push the range code on the syntax stack
4) Otherwise push the token code for the token on the syntax stack.
For each syntax rule, if the pattern on the top of the syntax matches the
rule pattern take the action associated with the rule and start scanning
from the beginning for any additional rules which may apply.
When a token code is pushed on the syntax stack, an additional word of
zeros is also pushed on the stack. This is used when compiling function
calls to hold the function's argument count.
Rule Matching
A relatively small number of rules are used to process formulas of arbitrary
complexity. If a rule matches the top of the syntax stack, then the
compiler takes a specific action and rule scanning starts again with the
first rule. Each rule matches certain patterns on the syntax stack. A
typical rule might be: if the top of the stack is the token for right
parenthesis, and the next-to-top is a number, and the second form the top
is a left parenthesis, then pop the top three items from the syntax stack
and push the number on the syntax stack.
This rule can be more succinctly represented as:
Stack
Before After Action
)
number
( number none
The Rules
The following are the syntax rules used to process formulas. Note that the
order of the rules is important. The rules for compilation of operators
used additional tables which assign a precedence number and opcode to each
legal unary and binary operator. Thus, for example, there is a single
token code for minus sign (-), but there are two opcodes one for unary
minus and one for binary minus. In addition, these two operators, while
lexically identical, also have different precedence. In general, operators
of higher precedence will be performed before operators of lower precedence
are performed left-to-right. All special operators (boform, eoform,
parentheses, comma, etc.) are implicitly assigned a precedence of zero.
Rule 1 Termination test
Stack
Before After Action
eoform Output a return code to compile buffer
number Return, indicating successful compile
boform
Rule 2 Function argument processing
Stack
Before After Action
' Error if range argument illegal for
number or range function.
( ( Increment argument count on stack
function function
Rule 3 Process final function argument
Stack
Before After Action
) Error if range argument illegal for
number or range function.
( Increment argument count on stack
function number Compile function opcode
If list function, compile argument
count; otherwise error is wrong
argument count.
Rule 4 Parenthesis removal
Stack
Before After Action
) Compile parenthesis opcode
number
( number
operator operator
Rule 5 Binary operators
Stack
Before After Action
op2 If binary op<binary op, rule does
number not match. Otherwise, compile opcode
op1 op2 for operator op1.
Rule 6 Unary operators
Stack
Before After Action
op2 I unary op<binary op, rule does
number op2 not match. Otherwise, compile opcode.
op1 number for operator op 1.
Rule 7 Error detection
Stack
Before After Action
eoform Return indicating unsuccessful compile
Table 9 Operator Precedence Table
Operator Unary Precedence Binary Precedence
+ 6 4
- 6 4
* na 5
/ na 7
^ na 3
= na 3
< > na 3
< = na 3
> = na 3
< na 3
> na 3
#and# na 1
#or# na 1
#not# 2 na
Example:
Using the above rules, we can now see how a particular formula is
compiled. Let us consider the following formula:
3+5*6
This is broken up by the lexical analyzer into seven tokens.
boform
3
+
5
*
6
eoform
The syntax scans proceed as follows until a matching rule is found:
Stack
boform number + number
boform number +
boform number
boform
Compile buffer
push 3 push 3 push 3
push 5
At this point, rule 5 is invoked, but since the precedence of boform is
zero, no action is taken.
Stack
* number
number *
+ number
number +
boform number
boform
Compile buffer
push 3 push 3
push 5 push 5
push 6
At this point, since the binary precedence of + is lower than the binary
precedence of *, rule 5 does apply, and the opcode for * is compiled. The
stack is reduced by replacing number * number by number and scan is made,
but no further rule applies.
Stack
number eoform
+ number
number +
boform number
boform
Compile buffer
push 3 push 3
push 5 push 5
push 6 push 6
Rule 5 applies again, and the opcode for + is compiled, reducing the stack
to boform, number, eoform. Rescanning finds a match on rule 1 which
compiles a return opcode and terminates. The final compiled code is thus:
push 3
push 5
push 6
*
+
return
A Note on the Decompiler
The algorithm for the formula decompiler was taken verbatim from:
Writing Interactive Compilers and Interpreters, P.J. Brown, John Wiley and
Sons, 1979. See chapter 6.2. The algorithm itself is described on pages
216 and 217.
This algorithm is also described in the following article.
More on the Re-creation of Source Code from Reverse Polish, P.J. Brown,
Software Practice and Experience, Vol 7, 545-551 (1977).
WORKSHEET COLUMN DESIGNATORS
Most records within the 1-2-3 Condensed Worksheet format are specified
with column/row designators (for example, column 0, row 0 equals A1). When
determining the column designator, the table below will help make
conversion easier.
Column Hex Dec Column Hex Dec Column Hex Dec
A 0 1 BA 34 52 DA 68 104
B 1 1 BB 35 53 DB 69 105
C 2 2 BC 36 54 DC 6A 106
D 3 3 BD 37 55 DD 6B 107
E 4 4 BE 38 56 DE 6C 108
F 5 5 BF 39 57 DF 6D 109
G 6 6 BG 3A 58 DG 6E 110
H 7 7 BH 3B 59 DH 6F 111
I 8 8 BI 3C 60 DI 70 112
J 9 9 BJ 3D 61 DJ 71 113
K A 10 BK 3E 62 DK 72 114
L B 11 BL 3F 63 DL 73 115
M C 12 BM 40 64 DM 74 116
N D 13 BN 41 65 DN 75 117
O E 14 BO 42 66 DO 76 118
P F 15 BP 43 67 DP 77 119
Q 10 16 BQ 44 68 DQ 78 120
R 11 17 BR 45 69 DR 79 121
S 12 18 BS 46 70 DS 7A 122
T 13 19 BT 47 71 DT 7B 123
U 14 20 BU 48 72 DU 7C 124
V 15 21 BV 49 73 DV 7D 125
W 16 22 BW 4A 74 DW 7E 126
X 17 23 BX 4B 75 DX 7F 127
Y 18 24 BY 4C 76 DY 80 128
Z 19 25 BZ 4D 77 DZ 81 129
AA 1A 26 CA 4E 78 EA 82 130
AB 1B 27 CB 4F 79 EB 83 131
AC 1C 28 CC 50 80 EC 84 132
AD 1D 29 CD 51 81 ED 85 133
AE 1E 30 CE 52 82 EE 86 134
AF 1F 31 CF 53 83 EF 87 135
AG 20 32 CG 54 84 EG 88 136
AH 21 33 CH 55 85 EH 89 137
AI 22 34 CI 56 86 EI 8A 138
AJ 23 35 CJ 57 87 EJ 8B 139
AK 24 36 CK 58 88 EK 8C 140
AL 25 37 CL 59 89 EL 8D 141
AM 26 38 CM 5A 90 EM 8E 142
AN 27 39 CN 5B 91 EN 8F 143
AO 28 40 CO 5C 92 EO 90 144
AP 29 41 CP 5D 93 EP 91 145
AQ 2A 42 CQ 5E 94 EQ 92 146
AR 2B 43 CR 5F 95 ER 93 147
AS 2C 44 CS 60 96 ES 94 148
AT 2D 45 CT 61 97 ET 95 149
AU 2E 46 CU 62 98 EU 96 150
AV 2F 47 CV 63 99 EV 97 151
AW 30 48 CW 64 100 EW 98 152
AX 31 49 CX 65 101 EX 99 153
AY 32 50 CY 66 102 EY 9A 154
AZ 33 51 CZ 67 103 EZ 9B 155
(CONTINUED)
Column Hex Dec Column Hex Dec
FA 9C 156 HA DO 208
FB 9D 157 HB D1 209
FC 9E 158 HC D2 210
FD 9F 159 HD D3 211
FE AO 160 HE D4 212
FF A1 161 HF D5 213
FG A2 162 HG D6 214
FH A3 163 HH D7 215
FI A4 164 HI D8 216
FJ A5 165 HJ D9 217
FK A6 166 HK DA 218
FL A7 167 HL DB 219
FM A8 168 HM DC 220
FN A9 169 HN DD 221
FO AA 170 HO DE 222
FP AB 171 HP DF 223
FQ AC 172 HQ EO 224
FR AD 173 HR E1 225
FS AE 174 HS E2 226
FT AF 175 HT E3 227
FU BO 176 HU E4 228
FV B1 177 HV E5 229
FW B2 178 HW E6 230
FX B3 179 HX E7 231
FY B4 180 HY E8 232
FZ B5 181 HZ E9 233
GA B6 182 IA EA 234
GB B7 183 IB EB 235
GC B8 184 IC EC 236
GD B9 185 ID ED 237
GE BA 186 IE EE 238
GF BB 187 IF EF 239
GG BC 188 IG FO 240
GH BD 189 IH F1 241
GI BE 190 II F2 242
GJ BF 191 IJ F3 243
GK CO 192 IK F4 244
GL C1 193 IL F5 245
GM C2 195 IM F6 246
GN C3 195 IN F7 247
GO C4 196 IO F8 248
GP C5 197 IP F9 249
GQ C6 198 IQ FA 250
GR C7 199 IR FB 251
GS C8 200 IS FC 252
GT C9 201 IT FD 253
GU CA 202 IU FE 254
GV CB 203 IV FF 255
GW CC 204
GX CD 205
GY CE 206
GZ CF 207
ANALYSIS OF 1-2-3 WORKSHEET FILE
The worksheet shown below was created in 1-2-3 and saved to disk.
Key:
A2..A5 Named Range (code 11)
EXAMPLE A2: Label (code 15)
100 A3: Integer (code 13)
12.5 A4: Number (code 14)
87.5 A5: Formula (+A3-A4)
(code 16)
The example shown below is a partial hex dump of this worksheet file. By
reading each record header, you can determine the type of record you are
encountering. The record header will also tell you the length of that
follows the header. By analyzing the record header, you can read the
records you want and skip unrelated records.
362B:0100 06 00 08 00 00 00 00 00 00 00
362B:0110 04 00 2F 00 01 00 01 02 00 01 00 FF 03 00 01 00
362B:0120 00 04 00 01 00 00 05 00 01 00 FF 07 00 1F 00 00
362B:0130 00 01 00 71 00 09 00 08 00 14 00 00 00 00 00 00
362B:0140 00 00 00 00 00 00 00 04 00 04 00 48 00 00 0B 00
362B:0150 18 00 54 45 53 54 00 00 00 00 00 00 00 00 00 00
362B:0160 00 00 00 00 01 00 00 00 04 00 18 00 19 00 00 FF
362B:0170 FF 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF
362B:0180
362B:05C0
362B:05D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
362B:05E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
362B:05F0 00 00 00 00 71 71 01 00 0F 00 0E 00 FF 00 00 01
362B:0600 00 27 45 58 41 4D 50 4C 45 00 0D 00 07 00 FF 00
362B:0610 00 02 00 64 00
362B:0620 10 00 1B 00 FF 00 00 04 00 00
362B:0630 00 00 00 00 E0 55 40 0C 00 01 00 80 FE BF 01 00
362B:0640 80 FF BF 0A 03