trijezdci
2015-08-17 11:49:37 UTC
This may be of some interest to readers of this group.
Years ago I had contributed Modula-2 plug-ins and language description files for a variety of source code rendering frameworks and other syntax highlighting tools. To me the most important one of these is the Pygments framework because Bitbucket use that and I have most of my Modula-2 sources on Bitbucket.
More recently, it became apparent that it would be useful to render source code depending on the dialect in which the sources are written rather than settling for lowest common denominator capability. To this end I replaced my earlier Modula-2 plug-in for Pygments with a new one that has multi-dialect support. The multi-dialect plug-in is now part of the official Pygments distribution. It can also be found at:
https://bitbucket.org/trijezdci/m2r10/src/tip/_GRAMMAR/pygments.lexers.modula2.py
Detailed documentation can be found within the source of the file above.
Although it is in principle possible to determine the dialect automatically, to do it right, one would need to do a full syntax analysis of the input source and most of these frameworks are not designed for that. Also, most of these frameworks are written in inefficient languages such as Python and Ruby where a full syntax analysis prior to rendering will add considerable server side payload which may some large sites like Bitbucket to choke and possibly remove such plug-ins which the site operators consider wasteful.
I thus decided on a design where the selection of the dialect is determined by a special comment tag within the source, ideally placed at the top, alongside the copyright notice. I invited maintainers of Modula-2 compilers as far as their contact details are known to me to a discussion to agree on how the special comment tags should be. The comment tags the plug-in recognises are an outcome of that discussion.
A dialect tag is a special comment, defined by the following EBNF:
dialectTag :
OpeningCommentDelim Prefix dialectOption ClosingCommentDelim ;
dialectOption :
baseDialect ( '+' languageExtension )? '
baseDialect : 'm2pim' | 'm2iso' | 'm2r10' | 'objm2' ;
languageExtension :
'gm2' | 'mocka' | 'aglet' | 'gpm' | 'p1' | 'sbu' | 'xds' ;
Prefix : '!' ;
OpeningCommentDelim : '(*' ;
ClosingCommentDelim : '*)' ;
No whitespace is permitted between the tokens of a dialect tag.
The following is an example of such a dialect tag at the beginning of a Modula-2 source file:
(*!m2pim+gm2*) DEFINTION MODULE FooLib;
A distinct benefit of dialect selection by embedded special comment tag is that the tag acts to state the intent of the author and syntax used within the source file that is not supported by the specified dialect can then be rendered in red to indicate errors. The plug-in supports this via default style sheets.
Another feature of the plug-in is a special rendering mode called Algol Publication mode. In this mode. source text is rendered for publication in scientific papers and academic texts following the format of the Revised Algol-60 Language Report which set a de-facto standard in rendering algorithms for scientific publications and many scientific texts use it to this day.
When rendering Modula-2 source text in Algol Publication mode, reserved words are rendered lowercase boldface (optionally underlined) and built-in identifiers are rendered lowercase boldface italic. In other words, the capitalisation of reserved words and built-in identifiers is then considered to be a form of stropping (see the Algol report for a definition of stropping).
The Algol Publication mode is activated by command line switch when invoking a local Pygments installation. It cannot be activated by special comment tags.
Some example PDFs of rendered output can be downloaded below:
https://bitbucket.org/trijezdci/m2r10/downloads/M2LexerTestReport.ISOplusGM2.pdf
https://bitbucket.org/trijezdci/m2r10/downloads/M2LexerTestReport.PIMplusGM2.pdf
https://bitbucket.org/trijezdci/m2r10/downloads/M2LexerTestReport.M2R10.pdf
The Pygments rendering framework can be downloaded from
http://pygments.org
We have also looked into supporting Github but they have recently started a migration to their own rendering framework and while in transition they use various different renderers depending on the language which means it is extremely difficult to provide support while the transition is in progress. We'll just have to wait until they have completed the migration.
I have also made a multi-dialect plug-in for Algol and one for Wirth's LOLA-2 HDL but these have not been committed yet to the Pygments distribution. The maintainer is only doing these commits sporadically. At a later date, I might also make a plug-in for Oberon but currently I have other priorities.
I hope this is useful to somebody out there.
Years ago I had contributed Modula-2 plug-ins and language description files for a variety of source code rendering frameworks and other syntax highlighting tools. To me the most important one of these is the Pygments framework because Bitbucket use that and I have most of my Modula-2 sources on Bitbucket.
More recently, it became apparent that it would be useful to render source code depending on the dialect in which the sources are written rather than settling for lowest common denominator capability. To this end I replaced my earlier Modula-2 plug-in for Pygments with a new one that has multi-dialect support. The multi-dialect plug-in is now part of the official Pygments distribution. It can also be found at:
https://bitbucket.org/trijezdci/m2r10/src/tip/_GRAMMAR/pygments.lexers.modula2.py
Detailed documentation can be found within the source of the file above.
Although it is in principle possible to determine the dialect automatically, to do it right, one would need to do a full syntax analysis of the input source and most of these frameworks are not designed for that. Also, most of these frameworks are written in inefficient languages such as Python and Ruby where a full syntax analysis prior to rendering will add considerable server side payload which may some large sites like Bitbucket to choke and possibly remove such plug-ins which the site operators consider wasteful.
I thus decided on a design where the selection of the dialect is determined by a special comment tag within the source, ideally placed at the top, alongside the copyright notice. I invited maintainers of Modula-2 compilers as far as their contact details are known to me to a discussion to agree on how the special comment tags should be. The comment tags the plug-in recognises are an outcome of that discussion.
A dialect tag is a special comment, defined by the following EBNF:
dialectTag :
OpeningCommentDelim Prefix dialectOption ClosingCommentDelim ;
dialectOption :
baseDialect ( '+' languageExtension )? '
baseDialect : 'm2pim' | 'm2iso' | 'm2r10' | 'objm2' ;
languageExtension :
'gm2' | 'mocka' | 'aglet' | 'gpm' | 'p1' | 'sbu' | 'xds' ;
Prefix : '!' ;
OpeningCommentDelim : '(*' ;
ClosingCommentDelim : '*)' ;
No whitespace is permitted between the tokens of a dialect tag.
The following is an example of such a dialect tag at the beginning of a Modula-2 source file:
(*!m2pim+gm2*) DEFINTION MODULE FooLib;
A distinct benefit of dialect selection by embedded special comment tag is that the tag acts to state the intent of the author and syntax used within the source file that is not supported by the specified dialect can then be rendered in red to indicate errors. The plug-in supports this via default style sheets.
Another feature of the plug-in is a special rendering mode called Algol Publication mode. In this mode. source text is rendered for publication in scientific papers and academic texts following the format of the Revised Algol-60 Language Report which set a de-facto standard in rendering algorithms for scientific publications and many scientific texts use it to this day.
When rendering Modula-2 source text in Algol Publication mode, reserved words are rendered lowercase boldface (optionally underlined) and built-in identifiers are rendered lowercase boldface italic. In other words, the capitalisation of reserved words and built-in identifiers is then considered to be a form of stropping (see the Algol report for a definition of stropping).
The Algol Publication mode is activated by command line switch when invoking a local Pygments installation. It cannot be activated by special comment tags.
Some example PDFs of rendered output can be downloaded below:
https://bitbucket.org/trijezdci/m2r10/downloads/M2LexerTestReport.ISOplusGM2.pdf
https://bitbucket.org/trijezdci/m2r10/downloads/M2LexerTestReport.PIMplusGM2.pdf
https://bitbucket.org/trijezdci/m2r10/downloads/M2LexerTestReport.M2R10.pdf
The Pygments rendering framework can be downloaded from
http://pygments.org
We have also looked into supporting Github but they have recently started a migration to their own rendering framework and while in transition they use various different renderers depending on the language which means it is extremely difficult to provide support while the transition is in progress. We'll just have to wait until they have completed the migration.
I have also made a multi-dialect plug-in for Algol and one for Wirth's LOLA-2 HDL but these have not been committed yet to the Pygments distribution. The maintainer is only doing these commits sporadically. At a later date, I might also make a plug-in for Oberon but currently I have other priorities.
I hope this is useful to somebody out there.