DataStructure representing SyntaxDiagrams?

Discussion:

(too old to reply)

A***@gmail.com

2012-10-20 17:10:45 UTC

In the 70's there was a 2 or 3 issue article in the BYTE periodical,
developing a PASCAL compiler in BASIC. Since I had just then learned about
Finite-State-Machines represented by diagrams equivalent to syntax-diagrams
and since I wanted to make the HP [IEEE488-instrument-driver] desktop
calculator into a PASCAL-like language controller for a suite of
IEEE488 instruments, I was able to use the FSM ideas.

It's amazingly simple, and very maintainable.
You just let the source code 'walk through the syntax diagrams'
[which are nested: blok, statement, exprn.. -- I just guessing now]
and at the relevant nodes [like stations on a railwayline] you
do the action: push, pop, access SymblTbl, generate-code ...

I used to 'code' mainly from my paper-written syntax diagrams IIRC.
Even later when I ported this 'PASCAL' to 8 & 16 bit microprocessors,
running 'its own PASCAL' I prefered to use the FSM method.

Since the HP machine had a BASIC-like [ROM based] language, it had only
numbers and characters and 1-dimensional arrays of both.
Consequently, I never learned to use records. I can read them, but
I've never written any and totally lack a feeling for their use.

The FSM was represented by arrays. IIRC, one per each column of
my paper diagram. BTW this is one BIG reason why I oppose
Oberon's abandonment of <data-arrays>.

This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Can someone suggest a suiatble data stucture for the syntax diagrams?
The correspondence between the code and the paper diagrams must be
easy and obvious, for ease of maintenance, because the intended application
is structured editors for 'your next new languages'.
Fill in the table and let it guide you, when writing code,
instead of having to remember a new syntax.

BTW, for google: what is the name [L, LR, ?] of the family of grammars
like PASCAL which need only one-token look-ahead to parse?

== TIA

Richard

2012-10-21 10:27:57 UTC

Permalink

Post by A***@gmail.com
This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Some productions of the grammar are usually very similar. E.g., there
is probably one called Factor for multiplication and division, and one
called Term for addition and subtraction. Both have the same number of
subexpressions; thus, a single record type containing a flag
specifying the actual operation will be sufficient for all binary
operations.

Richard

A***@gmail.com

2012-10-23 09:42:22 UTC

Permalink

Post by Richard

Post by A***@gmail.com
This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Seeking structures which are similar, in order to factorise,
is too much like low-level optimising.

It's not the computer that matters, it's the human.
A 2-dimensional representastion [spreadsheet-like], with perhaps
an extra dimension given by color seems impossible to beat.

With the system that I used, I could immediately see if/what
code was generated at the last N nodes/tokens before the
error token % in the code:
Total := Max(BestOf(A,B,C), Min(B,C)) + % Y ;

Post by Richard
Richard

BTW I'm ad-libbing this from memory.

The reaon why ETH-Oberon & V4 are marvelous, is that they
are visual: allowing you to recognise instead of needing to
remember. Plus the minimalist cognitive load of eg.
each TextFrame is labeled with what-it-can-do.

Richard

2012-10-23 17:35:24 UTC

Permalink

Post by A***@gmail.com
Seeking structures which are similar, in order to factorise,
is too much like low-level optimising.

I think, it is an interesting challenge to find common patterns and to
limit complexity this way.

Post by A***@gmail.com
A 2-dimensional representastion [spreadsheet-like], with perhaps
an extra dimension given by color seems impossible to beat.

Well, you could store this information in a text file and read it into
appropriate data structures at runtime. The Oberon System already
provides a text scanner which makes this easy.

Of course, this is not as elegant as built-in array literals, but such
is the price of simplicity.

Richard

A***@gmail.com

2012-10-23 09:42:38 UTC

Permalink

So you already have the railroad diagrams? For original Pascal, they are
also found in Wirth's _Algorithms and Data Structures_ (1976 edition)
in the appendix.

Well no. I want to apply the same principle to a completely
different application:
<traversing syntax diagrams to parse other languages,
eg. to make a syntax-aware editor, so that it shows you, what's
the next possible valid entry, like a menu shows you,
so that you just need to recognise, instead of remember>.

I don't offhand know of any online on the web except the Pascal-ish
http://www.xpl0.org/syntax.html

Ok, that may be good to have on-file.

Post by A***@gmail.com
The FSM was represented by arrays. IIRC, one per each column of
my paper diagram. BTW this is one BIG reason why I oppose
Oberon's abandonment of <data-arrays>.

I don't understand what this means. AFAICT, Oberon arrays are the same as
Pascal. Sure, for dynamic structured data, you'd be better off with POINTER
TO RECORD, but ARRAY OF CHAR, INTEGER, etc. is still available.
Or maybe you mean Oberon-07 restrictions? Dunno.

IIRC in Turbo-PASCAL [and definitely in my PLO-derivative] I could write:
CharY[ "B","S","A","I","R"];
IntR[1,4,7,3,5];
and by suitably spacing/formatting the source code, it looked like
a 2-dim-array. Which is a massively powerfull tool. Which is why
spreadseets were the killier-app. because you can easily map
cause-and-effect, from eg. a bus/train time-table.
Oberon doesn't allow this.

Post by A***@gmail.com
This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Just try it out and see. If it doesn't work, use something else. There's no=
fixed answer, just use whatever works.

Post by A***@gmail.com
Can someone suggest a suiatble data stucture for the syntax diagrams?

In Wirth's 1976 A+D book, chapter five was the mini PL/0 compiler example. =
Later he ripped that out in lieu of a separate book entirely: _Compiler Con=
struction_
http://www.inf.ethz.ch/personal/wirth/books/CBEAll.pdf

Post by A***@gmail.com
The correspondence between the code and the paper diagrams must be
easy and obvious, for ease of maintenance, because the intended
application
is structured editors for 'your next new languages'.
Fill in the table and let it guide you, when writing code,
instead of having to remember a new syntax.
BTW, for google: what is the name [L, LR, ?] of the family of grammars
like PASCAL which need only one-token look-ahead to parse?

I believe Pascal's grammar is called LL(1), but most other languages use
more complicated ones, e.g. LALR or LR (or even GLR). I suppose Pat Terry's
adaptation of Coco/R is probably the easiest non-YACC/LEX toolset to use.
http://en.wikipedia.org/wiki/LL_parser
http://www.scifac.ru.ac.za/compilers/
http://www.scifac.ru.ac.za/coco/

I've looked at coco previously. It lacks: what I had was MAGIC!
You just look at the paper, and at any node you can see immediately:
= what the valid Next-Tokens are,
= what action is done at that node.

So you don't need to write a top-down-compiler, with the
syntax of the langauge interwoven in the compiler's code.

You write a utility to read the syntax diagrams, where the reader
is completely separate from the particular syntax. The syntax is in the
diagrams. And the code-generation is also separate, and represented
by notation next the the corresponding node.

So, IIRC, I needed a dummy node, so that when exiting from an
assignment statement: X := 2+3
the action at the dummy-node was:
pop [the SymbtblPointer (which was pushed at node "ID=X"]
and use it to show where to store [block & offset] the
value of the expression: 2+3.
==============

A***@gmail.com

2012-10-23 15:07:49 UTC

Permalink

Post by A***@gmail.com
and at the relevant nodes [like stations on a railwayline] you
do the action: push, pop, access SymblTbl, generate-code ...

So you already have the railroad diagrams? For original Pascal, they are
also found in Wirth's _Algorithms and Data Structures_ (1976 edition) in
the appendix.

http://www.freepascal.org/docs-html/ref/refse90.html#x190-20000016.2

Is it SOO difficult to understand me?!

I'm asking people who know about the syntax diagram method of
syntax specification, if they have ideas about suitable data-structures
for representing such <railroad diagrams> ?

And I'm noting that a previous very successfull method which I used
decades ago [so I'd have to re-invent the details now] used several
arrays: representing [IIRC] one-per-column of my diagrams.
Instead of left to right, I drew them top-to-bottom.

And then I'm commenting on the inconvenience that Oberon,
does NOT provide for such data-arrays.

-> ipNews.Send *

Marco van de Voort

2012-10-23 19:24:08 UTC

Permalink

Post by A***@gmail.com

also found in Wirth's _Algorithms and Data Structures_ (1976 edition) in
the appendix.

http://www.freepascal.org/docs-html/ref/refse90.html#x190-20000016.2

Well, let's do a first attempt:

Basically it is EBNF. The devil is in the possible recursion.

If you require that a repititeve term always has a separate definition, one
could probably structure each production as an array of record.

The record would have tokentype, a variant part (to store literal values),
and some flags (to indicate repetition)

Post by A***@gmail.com
And I'm noting that a previous very successfull method which I used
decades ago [so I'd have to re-invent the details now] used several
arrays: representing [IIRC] one-per-column of my diagrams.
Instead of left to right, I drew them top-to-bottom.

I think if you really want a comment on such vague descriptions, better work
out a detailed example.

Post by A***@gmail.com
And then I'm commenting on the inconvenience that Oberon,
does NOT provide for such data-arrays.

Well, I have never seen a requirement of "Oberon", since you crossposted
this to M2 and Pascal.misc. And afaik both support multi dimensional arrays
just fine (but not ragged ones).

However I couldn't make heads or tails of your example, that was most
definitely not Pascal or M2.

r***@gmail.com

2012-10-24 01:11:15 UTC

Permalink

Hi,

Post by Marco van de Voort

Post by A***@gmail.com
I'm asking people who know about the syntax diagram method of
syntax specification, if they have ideas about suitable data-structures
for representing such <railroad diagrams> ?

Check Wirth's example in _Compiler Construction_ for Oberon-0, which is a superset of PL/0 [sic] anyways.

http://en.wikipedia.org/wiki/PL/0

"The language Oberon-0 is much more complex than PL/0. For example. Oberon-0 offers arrays, records, type declarations and procedure parameters."

Post by Marco van de Voort

Post by A***@gmail.com
And then I'm commenting on the inconvenience that Oberon does NOT provide for such data-arrays.

Well, I have never seen a requirement of "Oberon", since you crossposted
this to M2 and Pascal.misc. And afaik both support multi dimensional arrays just fine (but not ragged ones).

He must mean constant initialization of structured types.

Post by Marco van de Voort
CharY[ "B","S","A","I","R"];
IntR[1,4,7,3,5];
and by suitably spacing/formatting the source code,
it looked like a 2-dim-array.

ISO Modula-2 supports it, and others like Extended Pascal, Turbo Pascal, Delphi, Modula-3, Ada too, but definitely not PIM Modula-2 nor classic (ISO 7185) Pascal.

======================
MODULE blah; (* gm2 -fiso *)
IMPORT STextIO, SWholeIO;
TYPE rec = RECORD i: CARDINAL; c: CHAR; s: ARRAY [1..32] OF CHAR END;
arr = ARRAY [1..20] OF INTEGER;
VAR r: rec; a: arr;
PROCEDURE WriteLn; BEGIN STextIO.WriteLn END WriteLn;
BEGIN
a := arr{41B BY 20};
r := rec{5150,'!',"Somebody?"};

STextIO.WriteChar(r.c); WriteLn;
SWholeIO.WriteInt(ORD(a[10]),1); WriteLn;
STextIO.WriteString(r.s); WriteLn
END blah.
======================

Post by Marco van de Voort
However I couldn't make heads or tails of your example,
that was most definitely not Pascal or M2.

It's not totally clear which host or target languages he is using or wanting to use. He seems more slanted towards Pascal despite some obvious experience with Oberon. Nothing wrong with any of it, just lots of minor nits to sort out.

Martin Brown

2012-10-24 07:24:36 UTC

Permalink

Post by r***@gmail.com
Hi,

Post by Marco van de Voort

Check Wirth's example in _Compiler Construction_ for Oberon-0, which is a superset of PL/0 [sic] anyways.
http://en.wikipedia.org/wiki/PL/0
"The language Oberon-0 is much more complex than PL/0. For example. Oberon-0 offers arrays, records, type declarations and procedure parameters."

Post by Marco van de Voort

Post by A***@gmail.com
And then I'm commenting on the inconvenience that Oberon does NOT provide for such data-arrays.

Well, I have never seen a requirement of "Oberon", since you crossposted
this to M2 and Pascal.misc. And afaik both support multi dimensional arrays just fine (but not ragged ones).

He must mean constant initialization of structured types.

Post by Marco van de Voort
CharY[ "B","S","A","I","R"];
IntR[1,4,7,3,5];
and by suitably spacing/formatting the source code,
it looked like a 2-dim-array.

ISO Modula-2 supports it, and others like Extended Pascal, Turbo Pascal, Delphi, Modula-3, Ada too, but definitely not PIM Modula-2 nor classic (ISO 7185) Pascal.

JPI/TopSpeed dialect of Modula2 also allowed constant initialisation of
structured types - slightly more clunky way. But amenable to easy
compilation. ISTR you declared an ARRAY type and then used it to CAST
the list of constants into the required ARRAY OBJECT.

Other M2/PASCAL dialects you usually had to do external assembler based
declaration blocks of static constants and tortuously link them together
during the build - hard to get right the first time.

The omission of this was always a PITA for embedded programming back
then since by default you ended up with a copy in precious ram!

Seems to be more than a bit of thread drift though - it is titled
"Data structures for syntax diagrams" !

Perhaps the OP could clarify their question. I missed it.

--
Regards,
Martin Brown

Marco van de Voort

2012-10-24 07:41:36 UTC

Permalink

Post by r***@gmail.com
TYPE rec = RECORD i: CARDINAL; c: CHAR; s: ARRAY [1..32] OF CHAR END;
arr = ARRAY [1..20] OF INTEGER;
VAR r: rec; a: arr;
PROCEDURE WriteLn; BEGIN STextIO.WriteLn END WriteLn;
BEGIN
a := arr{41B BY 20};
r := rec{5150,'!',"Somebody?"};

Afaik that way it doesn't work for the Borland derived dialects either. One
would have to define a separate constant manually.

(and even then one would have to name the field in the "rec" case)

trijezdci

2012-11-30 10:15:52 UTC

Permalink

Post by A***@gmail.com
Can someone suggest a suiatble data stucture for the syntax diagrams?

The data structure you are looking for is called a tree. For that, the most suitable elementary data types available in Modula-2 are RECORDs and POINTERs. For more details on how to build trees in Modula-2 you might want to see if you can find Webre's book "Data Structures in Modula-2".

One way to go about the transformation is to convert your productions/diagrams into S-expressions, then transform them into tree nodes using records and pointers to records. There are two extremes at either end of a spectrum of approaches. At the one end there would be one tree type per node variant, at the other end there would be a single universal tree node type that covers all node variants. The sweet spot is usually somewhere in between.

For an S-expression like representation of syntax diagrams you may want to take a look at our syntax diagram generator script at:

https://bitbucket.org/trijezdci/m2r10/src/tip/_GRAMMAR/modula2_syntax_diagrams.tcl

The comment section at the beginning describes how to convert from EBNF to S-expressions.

The corresponding EBNF grammar is also online at:

https://bitbucket.org/trijezdci/m2r10/src/tip/_GRAMMAR/Modula2.g

Post by A***@gmail.com
what is the name [L, LR, ?] of the family of grammars
like PASCAL which need only one-token look-ahead to parse?

The class of grammars you are looking for is called LL(1).

Note that it is not strictly a language property since even languages which have not been designed with LL(1) in mind can often be described by an LL(1) grammar and languages designed for LL(1) can be described by a non-LL(1) grammar. More often than not, non-LL(1) grammars can be refactored into LL(1) grammars. Although this can lead to the grammar becoming less human readable, the fact remains that LL(1) is a property of grammars, not a property of languages.

hope this helps,

let us know when your diagram generator is ready to try ;-)

r***@gmail.com

2012-12-03 21:44:47 UTC

Permalink

Hi,

Post by trijezdci

Post by A***@gmail.com
Can someone suggest a suiatble data stucture for the syntax diagrams?

The data structure you are looking for is called a tree.
https://bitbucket.org/trijezdci/m2r10/src/tip/_GRAMMAR/Modula2.g

I think line 138 has a typo:

FORWARD = 'FOWARD'; /* pragma only */

Anyways, I'm now thinking that Wirth's Pascal-S subset interpreter would be a good place for the OP (and others, e.g. me) to begin to study, esp. the article (PDF). There are also syntax diagram .GIFs available.

http://www.moorecad.com/standardpascal/pascals.html

trijezdci

2012-12-03 22:38:10 UTC

Permalink

Post by r***@gmail.com
FORWARD = 'FOWARD'; /* pragma only */

thanks, fixed now in:

https://bitbucket.org/trijezdci/m2r10/commits/c90afee61303d01f0f31dad28f63891f38e3f940

Bernhard Treutwein

2013-06-08 14:01:02 UTC

Permalink

Post by A***@gmail.com
Can someone suggest a suiatble data stucture for the syntax diagrams?
The correspondence between the code and the paper diagrams must be
easy and obvious, for ease of maintenance, because the intended application
is structured editors for 'your next new languages'.
Fill in the table and let it guide you, when writing code,
instead of having to remember a new syntax.

yet another month later, I was able to track down my memories ...
there is an EBNF visualizer, which creates syntax diagrams from
EBNF code at http://dotnet.jku.at/applications/visualizer/. It is
implemnted in C# and has been generated with the help of Coco
(the Compiler Compiler Toolkit) (see
http://www.scifac.ru.ac.za/cspt/modula2.htm#Compilertools) and/or
http://www.ssw.uni-linz.ac.at/coco/).

Bernhard

Richard

2013-06-08 20:16:26 UTC

Permalink

Post by Bernhard Treutwein
there is an EBNF visualizer, which creates syntax diagrams from
EBNF code at http://dotnet.jku.at/applications/visualizer/. It is
implemnted in C# and has been generated with the help of Coco
(the Compiler Compiler Toolkit) (see
http://www.scifac.ru.ac.za/cspt/modula2.htm#Compilertools) and/or
http://www.ssw.uni-linz.ac.at/coco/).

A similar tool, which also works in other operating systems than
Windows, is the Java-based ANTLRWorks:

http://www.antlr3.org/works

It contains an editor for ANTLR grammars (slightly different syntax
than EBNF: e.g. (...)? instead of [...]) and can show syntax diagrams
immediately during editing.

Richard