-
Notifications
You must be signed in to change notification settings - Fork 97
/
Copy pathpp.txt
682 lines (611 loc) · 35.5 KB
/
pp.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
PP description
==============
By Przemyslaw Czerpak (druzus/at/priv.onet.pl)
Hi All,
I collected this text from notes I created when I was analyzing
Clipper PP and then I was updating it in few places. Sorry but
I do not have enough energy to check it and update. It's much
shorter then I planed and it does not contain many important
things I encoded in new PP code. Sorry you will have to look at
new code because now I do not want to think about PP any more.
After last days I hate PP and I'd be very happy if I could forget
about it for at least few days. I spend much more time on it then
I planed and I'm really frustrated with the brain off job I was
making in last days.
-----------------------------------------------------------------------------
1. Clipper's PP is a lexer which divides source code into tokens
and then operates on these tokens and not on text data. This is the
main reason why current [x]Harbour PP cannot be Clipper compatible.
Tokenization is the fundamental condition which implicates a lot of
Clipper PP behavior and as long as we do not replicate it then we
will never be able to be Clipper compatible them.
Even such simple code cannot be well preprocessed and compiler by
current [x]Harbour PP:
#define a -1
? 1-a
and it cannot be fixed with current code without breaking some
other things, e.g. match markers which depends on number of spaces
between tokens. So at start we have to forget about updating current PP.
It will never be Clipper compatible and cannot be because it's not a
lexer.
2. During dividing input data to tokens and later in finding match patterns
Clipper PP always try to allocate the biggest possible set of input data
as a given type even if it can break some possible other method of
input data serialization. This can be seen in wild match marker <*marker*>
behavior or optional clause in match pattern, operator tokenization, etc.
It greatly simplify the code though introduce some limitations,
e.g.:
#xcommand CMD [FROM] FROM <*x*> => ? #<x>
or:
#xcommand CMD <x,...> , <id> => ? #<x>
or:
#xcommand CMD <*x*> END => ? #<x>
are accepted by Clipper PP but they cannot match any line.
3. Preprocessor should extract all quoted strings and create separated
tokens from them. The string tokens contents cannot be modified later
by any rules. Quoting by [] create string tokens when it's not just
after keyword, macro or one of closing brackets: ) } ]
We will have to change it to keep working already existing extensions
like accessing string characters with [] operator so I suggest to change
this condition and not create string token when it follows also constant
value of any type - not only strings. It will be usable for scalar
classes and overloading [] operator, e.g. someone can create LOGICAL
class where:
.T.[1] => ".T.", .T.[2] => "TRUE", .T.[3] => "YES"
The opening square bracket '[' has to be closed with ']' in the same line.
Such quoting has very high priority like normal string quoting. f.e:
? [ ; // /* ]
should generate:
QOut( " ; // /* " )
This implicates one important thing: PP has to read whole physical
line from file, then convert it to tokens and if necessary (';' is the
last token after preprocessing) read next line(s).
There is also one exception to the above. When Clipper PP finds '['
character and previous token is keyword or macro then it always checks
for closing bracket and if in scanned text it will find odd numbers
of other text delimiters ('") then ignore the type of previous token
and always creates strings. This behavior breaks some valid code. E.g.
Clipper cannot compile code like:
x := a[ f("]") ] $ "test"
or:
x := a[ f( "'" ) ] $ "test"
If it find closing ']' without odd number of other text delimiters
then it creates differ token then for other opening square brackets '['
open_array_index which has differ meaning in later preprocessing
and allow to convert group of tokens inside to string by compiler.
If something is not recognized by preprocessor as string token or
open_array_index then it should never become string token. It doesn't
matter how it will be preprocessed later, e.g.:
#define O1 [
#define O2 ]
? O1 b O2
should generate:
QOut( [ b ] )
not:
QOut( " b " )
but:
#command A <x> => ? <x>
A [ b ]
generate also:
QOut( [ b ] )
and in this case Clipper compiler makes conversion to string.
It means that only at initial line preprocessing preprocessor decides
what can or cannot be string token. I think that we do not have to
exactly replicate this behavior and we should allow string conversion
also when '[' is not marked as open_array_index in final preprocessor
pass which will create string token from the group of tokens inside '['
and ']' tokens using the initial stringify condition which checks type
of token before.
In fact with new PP such operation will be done by still existing
lexer after preprocessing and converting the preprocessed token to string
which is then once again divided into tokens by FLEX or SIMPLEX. It's
redundant and because neither FLEX nor SIMPLEX are MT safe and both
have limitations like maximum line size we will not be able to fully
benefit from the new code (read below about it).
4. # directives tokenization.
In #define directive strings in result pattern cannot be quoted by [].
They always will be used as array index or (in #[x]command
and #[x]translate) as optional expression (when not quoted by '\').
Characters like [] are not allowed in #define match pattern.
Quoting by [] in #[x]command and #[x]translate match pattern
produce optional clause. The left square bracket can be quoted by \ to
disable this special meaning and in such case Clipper PP
generates array tokens but they are not marked as open_array_index
when in the code they are. It causes that in code like
#command A B\[C] => QOut("A B[C]")
A B[C]
A B[C] is not preprocessed because in #command match pattern '[' is
not open_array_index and PP cannot find matching tokens.
Anyhow it's possible to create passing match pattern which will use [].
It's enough to create matching pattern for the code which have [] not
translated to string and not bound
with keyword. As I wrote above it will be possible when '[' is after
one of the closing brackets: ')', '}' or ']', e.g.:
#command A }\[C] => QOut("A }[C]")
A }[C]
Will be perfectly translated. For me it seems to be limitation of
Clipper PP implementation (probably it's a side effect of some internal
solutions) or a bug. Not something intentionally designed. It's highly
possible it's a hack to pass to compiler some additional information
about preprocessed tokens because in Clipper PP seems to be also the
compiler lexer. I do not think that we should try to keep strict
compatibility in PP translation and also introduce the array ID tokens
before preprocessing. Such operation can be done after preprocessing
but this is differ subject.
The important conclusion is that #directives should be preprocessed
in differ way then normal lines. In general # as first line token disable
using [] as string delimiters.
5. Clipper allow to quote strings using also back apostrophe (`) as
string begin marker and normal apostrophe as string end marker, f.e:
? `Hello World'
works perfectly.
6. String tokens can be part of match pattern as any other tokens, they
are not case sensitive during preprocessing so it's important to early
detect and convert data inside [] to string token.
7. NIL is preprocessed rather as keyword then constant value. At least it
behaves like a keyword and I cannot find anything what can suggest
something differ.
8. Numbers are not converted and stored as other tokens in literal
form. It's important to not change the numbers representation or
compiler will have problems with calculating declared size and
decimal places. The number tokens can be in the following form:
[0-9]*[\.[0-9]+]. Token is ended on first character which does not
pass the above expression and this is the first character of next
token. This behavior will interact with Harbour extensions for
hexadecimal numbers 0x[0-9A-Z] and date constants 0d[0-9] and we
will have to generate separated tokens for them. For strict compatibility
we can disable it and create final tokens after preprocessing but I do
not think we have to be such strictly compatible.
9. Logical value is a single token, .N. is translated to .F. and
.Y. to .T.
10. Multi character operators are parsed as single token. It's important
to keep the list of such operators and properly pars them at beginning
or later we will have problems.
11. Clipper PP allow to use only characters in ASCII range from 32 to 125
and some control codes with special meaning.
'\n' is line terminator
'\t' when not inside quoted string is converted to 4 spaces
'\r' is always stripped, also from quoted strings
'\0' stop line processing, like '\n' but the rest of line is ignored
^z (Chr(26)) works _exactly_ like '\n'
All other characters are illegal
12. All characters which are not keyword, string, numbers and know
operators are used as some pseudo binary operator tokens. We allow to
use characters with ASCII code greater then 125 then I suggest to define
for these characters new token called TEXT so they will not be pseudo
operators and still will could use them.
13. Clipper have special macro token which marks all input data in the
following form: [&<keyword>[.[<nextidchars>]]]+[&]
It's a single token which has special meaning in preprocessing and
we have to replicate it.
14. The expression is a list of keywords, macros and constant values
separated by one or more of other tokens. If the other token is
one of binary operators which is marked that need valid expression
in some internal PP table then PP check if next token is keyword,
number, string or operator marked as left unary followed by non operator
token and if it's not then end the expression.
AFAIK only -, --, ++, & operators are marked as left unary operators
and +, !, @ don't what can break some expressions.
Also the above behavior causes that '-' cannot be repeated many times
as left unary operator (multiple negation) what can break some valid
expressions too. The following tokens are marked as binary operators
which needs valid expression as next token: +, -, *, /, %, ^
The expression can be groped in (), {}, or [] and in such case
PP looks for corresponding closing bracket but it does not respect
other type of brackets and not update nested other bracket counters
only for the currently processed pair. As long as the expression is not
part of some other preprocessor rule which will change number
of different brackets then it seems to be safe because at this level
all strings should be separated tokens and each valid Clipper expression
correctly closes brackets. User should only be careful with using []
for strings quoting - see above conversion to strings. '[' works like
a group operator only if it's not the first token. See below operators
which cannot much regular match marker.
The groped expression list is also ended when end of line is reach or ';'
token. The ',' and closing brackets ), ], } tokens end the expression when
bracket counter is 0.
Some operators like :=, +=, -=, /=, *=, %=, ^=, **=, =, ==, and [ if
is not marked as open_array_index cannot match regular match marker as
first token, seems that they are marked as needing left side expression.
Of course closing tokens ',', ';', ')', '}', ']' also cannot match
regular match marker as first token.
Tokens are equal only if after preprocessing they have the same type and
the same value. It means that "!=" is equal to "<>" but not equal with "#"
There is exception to this rule in restricted match marker and macro token
15. Match markers.
<idMarker> regular match marker, matches non empty expression, cannot
match single closing parenthesis and some operators, see above.
<idMarker,...> list match marker, matches maximal number of comma separated
regular match markers, if the last token in parsed expression
is operator which need right valid expression and next token
is not such valid expression then it stops checking for
farther expressions even if the next token is comma,
it accepts empty separated regular expressions but cannot be
empty itself. It cannot also much anything starting with
closing bracket or some operators, see above, it's the same
behavior as in regular much marker.
<idMarker:...> restricted match marker, checks if next token(s) is/are
exactly the same as one of the word in pattern. Words are
comma separated expressions. Word can be empty but as both
markers above it cannot match anything starting with closing
bracket or some chosen operators (see above). If the last
token in one of restricted expression in the marker is '&'
then it has special meaning. It will match any macro tokens.
But only in such case. If it's not the last token in one
of comma separated expressions then it will work like any
other ones.
<*idMarker*> wild match marker, matches all tokens to the end of input
line, the expression should not be stopped by ; token
or any other ones. It's the only one marker which will match
expressions starting with closing brackets and operators which
need left side expression.
<(idMarker)> extended expression match marker matches any number of tokens
which do not have leading spaces until end of token list,
comma (,) and or (;). Empty expressions are not allowed.
It cannot match closing bracket: ')' and ']' but can match '}'
and some operators. It cannot match expressions
starting with '[' token and if the expressions start with
'(' token then it drops the rule which check spaces but
maps the same tokens as regular match marker.
When the expression is created from tokens and match marker is followed
by non optional token(s) then the expression is immediately finished when
the first token following match marker is found and parentheses
counter is 0. This additional stop condition does not work for wild
match markers: <*idMarker*> which can be used only as the last part
of match pattern or the pattern will never much anything. We can add
here some extension to allow defining stop condition for wild match
markers in the future. It will not interact with Clipper compatibility
because rules which have some additional tokens in match pattern after
wild marker do not work with Clipper PP at all.
The non optional token which can stop the expression is passed as
stop condition also to all nested optional match expressions which
are just before it and this token is used instead of other stop tokens
which can exist inside nested optional match pattern. This code
illustrates it. Clipper does not preprocess the TR2.
#xtranslate TR1 [<x,...> D] => ! [#<x>] !
#xtranslate TR2 [<x,...> D] C => ! [#<x>] !
#xcommand CMD <*x*> => QOut( #<x> )
proc main()
CMD $ TR1 a + b + c + d c
CMD $ TR2 a + b + c + d c
return
There is also a hidden aspects of match markers defined by result
pattern. Each match marker can have one of four possible states:
1. ignore matched expression - when it's not part of result pattern
We do not need any special case to implement this - it will be
enough to not define result holder for such markers
2. accept only one matched expression and refuse accepting any other
- when it's used at least once inside non optional part of result
pattern
3. accept multiple matched expression - when it's used only in optional
part of result pattern
4. accept first matched expression and ignore others - in such way
works repeated markers in #define directive with pseudo function.
Harbour PP does not allow to repeat the same much marker in #define
pseudo function generating error so such situation never happens.
In new PP we can keep current behavior or simply not define result
holder for repeated markers just like in point 1 above.
PP tries to allocate as much expressions for each match marker as possible
and finally checks if point 2 above was not broken and if it does then
refuse to accept whole rule even if it was possible to find a valid match
in differ way.
16. Result markers.
<idMarker> Regular result marker - inserts matched result as is
without any modifications. The first token inherits
number of leading spaces from the result pattern.
#<idMarker> Dumb stringify result marker - converts all matched
tokens to single string token even if they are comma
separated expressions. Clear number of leading spaces
for the first token before creating string. If there
are no matching tokens then create empty string token.
Finally copy number of leading spaces from result
pattern to the new string token and insert it.
<"idMarker"> Normal stringify result marker - converts each comma
separated expression in matched result into string tokens
using the same rules as for dump stringify with the exception
to macro tokens expressions starting with '&' followed by '('.
The macro tokens are stringify in differ way. If macro
does not have any internal '&' characters and has at most
one '.' as last character then as result non quoted keyword
is generated. Otherwise it generate strings with stripped first
'&' character.
If expression starts with '&' token followed by single
'(' then '&' token is stripped and the rest of tokens copied
as is.
<(idMarker)> Smart stringify result marker - converts each comma separated
expression in matched result into string tokens using the same
rules as for normal stringify with the exception to expressions
which start with string or '(' token. In Such case it does not
make any conversions to string and copy expression as is.
<{idMarker}> Blockify result marker - converts each comma separated
expression in matched result into codeblock token by simple
adding "{||" prefix and "}" suffix. The expression is not
modified at all. Leading spaces in first '{' token are
inherited from result pattern. If the expression starts with
'{' token followed by '|' then Clipper PP recognize it as
codeblock and does not add prefix and suffix.
<.idMarker.> Logify result marker - unlike Clipper documentation says
it only checks if match pattern passed the test and not
is not empty and then insert logical token .T. otherwise .F.
Leading spaces in new token are inherited from result pattern.
The Dumb stringify result marker format is a little bit differ then all
others. It needs a special token '#' before '<'. Clipper PP strips all
'#' tokens which are before result marker token '<' and if the result
marker was the regular one then it's converted to stringify dump otherwise
the marker type is unchanged.
When substitution is done then optional parts are repeated as many times
as the biggest number of accepted multiple matched expressions in the match
markers which are in the processed optional part. After each repeating
tokens are shifted but only if marker accepted more then one value.
This is the only one condition. The type or state of marker is unimportant.
The above shows that there is no correlation between type of match
marker and type of result marker. The type of conversion depends only
on contents of marked expression(s) and type of result marker.
Clipper does not support nested optional result patterns. I can add such
support but I do not know if it's necessary. To keep the base rules used
by Clipper PP the external optional pattern should be repeated as many
times as maximum number of repeating in one of its nested optional
patterns. It can be usable in some seldom cases for someone who knows
what will happen but IMHO in most cases it will create problems so probably
refusing such expressions is the best choice.
In optional clauses you can observe one Clipper bug I do not want to
replicate. When Clipper PP finds '[' then it will take all other tokens
until first unquoted ']'. If it finds it then preprocess tokens inside
as new result pattern but sets flag that other nested clauses are
forbidden. But when it extracts tokens for new optional result pattern
then it strips quote characters so when optional pattern is preprocessed
then all '[' tokens even properly quoted in source code will cause C2073
error. Clipper also does not respect the context of preprocessed tokens
when it looks for optional pattern so it will break restricted match
markers which contains ']' token. For me it's nothing more then to pure
implementation which should be fixed.
Some dipper tests shows also other bugs in Clipper PP when matched tokens
ends with ','.
In such case the blockify result marker does not create empty codeblock
for the last token when for all empty expressions before they does.
The same is with normal and smart stringify result markers but here it's
also yet another problem when there is more commas at the end. The last
one is converted to the string token with comma inside "," ;-)
I do not think we should replicate such behaviors though it seems to
be quite easy because they look like simple bugs which can appear in
the most trivial implementation of some conditions.
In general I think that many of Clipper PP behaviors even the documented
ones was not intentionally designed. Just simply someone in the past
created preprocessor and then the same person or probably someone else
documented - more or less precisely - some side effects and even bugs
of this implementation as expected behavior.
17. Storing real expression strings for later stringify operation in PP
output and stringify result patterns.
* Tabs are replaced by 4 spaces.
* Only one leading space is left from the lines concatenated with ;
* Each token should have counter with number of leading spaces
* When result pattern is created all repeated spaces are replaced by
a single one.
In #define pseudo functions there is small difference to the above.
In result pattern number of spaces before parameter(s) and token before
is significant and stored with pattern definition. The maximum number of
spaces between keywords is not 1 but 2.
* During result markers substitution the original number of leading
spaces in match marker token should overwrite number of leading
spaces in first substituted token
18. TEXT [TO [PRINTER | FILE <(fileName)>]] / ENDTEXT
It enables in Clipper PP special stream output. It work in differ way
then our implementation. Clipper PP preprocess whole lines. When it
finds:
TEXT <keyword>,<keyword>
command then he set special mode for next lines so they will not be
divided to tokens in standard way but whole lines will
be converted to string toke until special marker (ENDTEXT at the beginning
of line) will not be found. But if the line with TEXT token has some other
commands after ; then they are preprocessed in normal way. The new mode
will effect _ONLY_ the next lines which will be read from file not
currently preprocessed one. So we are not Clipper compatible here and I
will change it. The above means that Clipper PP already supports the
starting function Ryszard implemented in #pragma __text. Just simply
it's enough to add it TEXT <keyword>,<keyword> after ';' token.
19. The optional match patterns can be nested and each nested submatch
pattern is fully functional match pattern and only operates on the
same markers as parent pattern. If optional match pattern is followed
by another ones then they can match expressions which are any
combination of these patterns which will pass aggressive allocation
(see point 2 above) with one exception. Clipper PP tries to detect
optional match patterns which contain only match markers and always
gives them the lowest priority and if it detect more then one of
such patterns in the series of not separated optional patterns generate
an error.
The optional match patterns are one of the weakest point of current PP.
Even such simple code:
#xcommand CMD <x> [IN [GET] [PUT]] => ? #<x>
CMD something IN PUT GET
Is not well preprocessed.
20. rule have to begin with non empty token or the rule will never be used.
Generate warning for such rules? or maybe add support for such rules
to implement some language extensions, e.g. clasfunc{p1,p2,p3}
21. translation algorithm used by Clipper PP
Initiate token list
Do
get line stripping comments and dividing line to tokens
While last token in list is ;
Do While not empty token list
Do
If the first token is # then
parse # directive and remove all line tokens
break
EndIf
Do
Do
For each keyword token check if it match:
#define
If token(s) can be substituted then substitute
Next
While anything substituted
Do
For each token check if it match:
#[x]translate
If token(s) can be substituted then substitute
Next
While anything substituted
If anything substituted
continue
Do While 1st token match some #[x]command pattern
substitute
EndDo
While anything substituted
Output processed token until the last one or ; token
If 1st token is '#'
continue
Remove all tokens in the list until the last one or ; token
break
While True
EndDo
Output EOL
The above algorithm is differ then the one used by [x]Harbour and this is
the next reason why we are not Clipper compatible in substitution precedence.
This code illustrate the problem:
#define RULE( p ) ? "define value", p
#translate RULE(<p>) => ? "translate value", <p>
#command RULE(<p>) => ? "command value", <p>
#define DEF( p ) RULE( p )
#translate TRS(<p>) => RULE(<p>)
#command CMD(<p>) => RULE(<p>)
proc main()
DEF("def")
TRS("trs")
CMD("cmd")
return
Compile it by Clipper and [x]Harbour and compare the results.
Next important thing is that Clipper preprocess all indirect #directive body.
It means that in Clipper is not possible to execute indirect #undef DEFNAME
because if DEFNAME is already defined then it will be preprocessed and as
result we will have #undef <DEFNAME_value> before PP execute this #
directive. We can replicate this behavior but personally I do not like it.
for me it's a limitation not a feature and I do not want to replicate it.
So as I would like to define additional stop condition for line tokens
preprocessing: ';' followed by '#'.
I do not want to make all ';' the stop condition like in current [x]Harbour
PP because the same stop condition has to be used in wild match marker.
In Clipper it matches the text to the end if line. In new PP it will match
the text to the end of line or next # directive. I think it will give
reasonable compatibility level and the body of indirect # directive will
not be preprocessed. Please note that programmer still will be able to
force preprocessing of indirect # directive body using additional
preprocessor rule(s) and even control the preprocessing level e.g.:
#define PREPROCESS_DIRECTIVE DO_DIRECTIVE
#xcommand HASH_DIRECTIVE [<*x*>] => PREPROCESS_DIRECTIVE <x>
#xcommand DO_DIRECTIVE [<*x*>] => \# <x>
#define NEWCMD MYCMD
#xcommand CREATEDIRECTIVE => HASH_DIRECTIVE xcommand NEWCMD \<x> => ;;
QOut( "INDIRECT # DIRECTIVE", #\<x> )
CREATEDIRECTIVE
MYCMD Hello
The second problem is stop condition in # directive body. When PP finds #
as first token then always remove all tokens to the end of line and take
them as part of # directive or ignore. It does not respect ';' token as
command separator. This also causes pleasure side effects, e.g. it's
not possible to insert indirect # directive without breaking commands
after it because they will be always used as part of the inserted #
directive by PP. Here I strongly prefer to define the following behavior:
direct #define, #[x]translate and #[x]command always accept tokens to the
end of line ignoring ';'. Just like in Clipper. All other #directive will
respect ';' as end of # directive - It will cause that ';' cannot be used
in #error and #stdout. If it's a problem then I can define add support for
quoting ';' by '\' for this command and be default keep Clipper
compatibility for not quoted ';' and end rule for quoted ones or by default
use unquoted ';' as end of command and display others. Current Harbour
PP always stop #error and #stdout on ; what is not Clipper compatible and
so far I haven't seen that people reported it as bug so probably it's not
big problem.
Indirect #define, #[x]translate and #[x]command will also respect ';'
as end of command. If user will need to use multiple commands in result
pattern of indirect # directive then it will be enough to define ';' as
some other preprocessor rule, .f.e:
#define EOC ;;
#xcommand CREATECMD => #xcommand NEWCMD => QOut("1") EOC QOut("2")
CREATECMD
NEWCMD
This will give programmer full control on preprocessed data when in
Clipper the indirect # directive seems to be a hack added later and
can be used only in very limited way. e.g. in Class(y) as workaround
for Clipper PP behavior #include is used to execute# directive directly
from included files.
22. #define exception. I do not understand why Clipper has it.
during substitution if substituted token is:
'#' 'define' <otherTokens,...>
then it replaces all tokens from current position to the end of line.
It's quite possible that it's a work around for some side effects with
indirect directive in Clipper PP described above. Anyhow it's not
complex solution so I will not replicate it.
23. Conditional compilation.
a. #if[n]def directive pushes on the conditional statement stack current
conditional compilation flag and create new one. If the flag already
disabled preprocessing then the new flag have condition which cannot
be changed by #else.
b. #endif directive pops this value. If conditional statement stack was
empty then error is generated: "Error C2069 #endif does not match #endif".
c. #else directive reverts the current conditional statement flag if its
status allow modifications by #else
If conditional statement stack was empty then error is generated:
"Error C2070 #else does not match #ifdef"
If the conditional compilation flag is set then Clipper PP ignores all
parsed tokens except #if, #endif, #else directive.
The conditional compilation flag and stack are global for all included
files so one can set the new condition and other change or pop it.
In Clipper #if[n]def <define> has to be separate statement in a line
additional token or line concatenators (;) are not allowed. #endif
and #else have to be the first token in the line all tokens after are
ignored to the end of line, command separator ';' is also ignored.
24. NOTE is not an instruction keyword but whole like comments and has
to be stripped at beginning. It has higher priority then /* */
It does not have its special meaning when is used after ;
25. Suggested extensions:
- Higher priority for multi line comments /* */ stripping then in Clipper
where line concatenation (;) is interpreted before /* */. Just like now.
IMHO we should not replicate exact Clipper behavior here.
a. Add #pragma operator directive to define new multi character operators.
Such feature will allow to remove from FLEX/SIMPLEX the hack for ::
translation to Self and define safely other operators which will be
used as single token. Now it's impossible so things like @: matches
@ <any_number_of_blank_characters> :
b. token concatenation with new PP operator/marker or automatic in some
chosen cases, e.g. no spaces between two tokens and both tokens are
valid keywords
c. Something to stop result pattern definition in # directives and begin
new command. ; does not interrupt it but it's included to result pattern.
The result pattern works like wild match marker: <*resultPattern*>
It should work also for #define. Maybe it should be global token to
stop all wild markers. We can add special status for ; token. Such
token will have to also break loops with #define and #[x]translate
processing or we will never be able to make from #undef SOME_DEFINE
indirect PP rule used when SOME_DEFINE exists - just simply SOME_DEFINE
will be preprocessed earlier to the defined value.
Such special status can be added automatically when ; token is followed
by # or ; is quoted by \
d. already existing xHarbour extensions:
#[x]untranslate, #[x]uncommand
but modified to locate match pattern which can cover exactly the same
data.
#if
but working with integer to allow using 64bit ones which are broken
due to conversion to double. The semantic for expressions will be
similar to C one with the exception to ! (not) operator precedence.
I do not think that Clipper/xBase users are familiar with the exact
not operator precedence in C which is differ then the one in xBase
world.
e. modified version of Harbour's
#pragma {__text, __stream, __cstream, __endtext}
f. other, see in the code.
-----------------------------------------------------------------------------
Other things you can see in the new PP code. I was adding comments or
using HB_CLP_STRICT macro to mark the most important things. In few places
I had to break Clipper compatibility to keep FLEX working. Just simply
I cannot generate preprocessed line in exactly the same form as Clipper
does because FLEX or SIMPLEX will not be able to decode it.
Update:
New Harbour lexer is simple translator between PP tokens and Harbour
grammar terminal symbols so it's not necessary to convert preprocessed
code to strings to pass them to FLEX or SIMPLEX. It works faster and it's
fully compatible Clipper behavior what fixes above problem.
best regards,
Przemek
2006-11-08
-----------------------------------------------------------------------------