Discussion:
M2C identifier translation -- What naming conventions do you generally use?
(too old to reply)
trijezdci
2016-03-14 07:12:31 UTC
Permalink
With work on M2C having moved to the code generator, the question of how to translate identifiers had come up. Since it is desired that the intermediate C code is human readable, it would also be desirable that it would be familiar to those who code in C but not in Pascal or M2. But it is also desirable to accommodate M2/Pascal practitioners.

In other words, it is desirable that M2C can generate both C and M2 style in its intermediate C output, and the style should be user selectable by compiler switch. For C style translations are all-lowercase with lowline "_" separators (foo_bar_baz) and for M2 style they are title case for module and type identifiers and camel-case for other user defined identifiers. In either style, language defined all-uppercase names are translated case by case, as there may be 1:1 C equivalents (INTEGER => integer).

The current name translation library covers all scenarios documented in

https://bitbucket.org/trijezdci/m2c-rework/downloads/m2c-ident-conversions.txt

This assumes that the Modula-2 input sources also use one of the two styles.

If somebody used fooType and fooVar, instead of FooType and fooVar) that alone wouldn't cause an issue, but if there were both FooType and fooType or both fooVar and FooVar in the same scope, that would cause name conflicts in the output.

VAR foo : Foo; => foo_t foo; or Foo foo;

but VAR foo, Foo : Bar; => bar_t foo, foo; or Bar foo, foo;

We discussed a scheme whereby

* module identifiers not starting with an uppercase letter would be rejected
* type identifiers not starting with an uppercase letter would be considered prefixed with T
* constant, variable and procedure identifiers not starting with a lowercase letter would be considered prefixed with c, v and fn respectively.

This would then bring such identifiers in line with the presumed title-case/camel-case name convention. If the input uses all lowercase with _ separators, it can simply be copied verbatim for C style or converted to M2 style in the output.

However, it would be interesting to know what kind of naming styles are used. What are your naming conventions in your M2 source code? Thanks in advance.
tbreeden
2016-03-15 13:34:28 UTC
Permalink
Post by trijezdci
* module identifiers not starting with an uppercase letter would be rejected
* type identifiers not starting with an uppercase letter would be considered prefixed with T
* constant, variable and procedure identifiers not starting with a lowercase letter would be considered prefixed with c, v and fn respectively.
Seems quite reasonable.
Post by trijezdci
However, it would be interesting to know what kind of naming styles are used. What are your naming conventions in your M2 source code? Thanks in advance.
I tend to go with mixed case names, mostly avoiding underscore chars.

I don't find the encoding of type/var/const properties of the name nearly as essential
as using a name that correctly implies the meaning or purpose of the item withing the
context of the program.

That, however, might be a bit difficult to handle with a text process :) .

Even a small hesitation between reading the name and groking the place of it within
the gestalt can interfer with understanding the code.

Tom
trijezdci
2016-03-15 15:19:48 UTC
Permalink
Post by tbreeden
Post by trijezdci
* module identifiers not starting with an uppercase letter would be rejected
* type identifiers not starting with an uppercase letter would be considered prefixed with T
* constant, variable and procedure identifiers not starting with a lowercase letter would be considered prefixed with c, v and fn respectively.
Seems quite reasonable.
Thanks for the feedback.

Unfortunately there are some issues with the scheme.

If a variable FooBar is considered to be the same as vFooBar and converted to v_foo_bar in C style output, then what happens if we actually have a variable vFooBar also in the same scope in the M2 source?

We would have yo do a conversion from FooBar to vFooBar in the parser when building the AST, so that any identifier vFooBar that might also be there will cause a name conflict that can be reported then. That would however violate the principle of least astonishment. If you have a variable FooBar and another vFooBar, you would not expect the two to be the same.

Another issue is that I have spoken to at least one M2/Oberon practitioner who likes to be able to use lowercase module identifiers. I am not sure how representative this preference is or whether it is an outlier, but since this is not the only issue, we tried to come up with alternative ideas.

One which looks good is to append any uppercase letter that is not followed by a lowercase letter with a lowline in the C style output, so that all uppercase letters are surrounded by lowlines. This works for all but the first letter because in C identifiers starting with a lowline are reserved for internal use by the C compiler itself.

Thus, for the first character we need to prepend something that cannot otherwise occur. If the C output is all uppercase, as is the case with macros, then a lowercase character cannot otherwise occur, so we could prepend a lowercase x, as in eXception, followed by a lowline for visual consistence. Likewise, if the output is all lowercase, an uppercase character cannot otherwise occur, so we can prepend an uppercase X followed by a lowline.

Following this scheme ...

DEFINITION MODULE fooBar;

would then generate an include guard macro

x_FOO_BAR_H

and

VAR FooBar; or
PROCEDURE FooBar;

would generate an identifier

X_foo_bar


These x_ and X_ prefixed identifiers do not match the proper C naming convention, but they are minimally intrusive and seem like an acceptable compromise.

One thing is certain, the implementation of the conversion is fairly simple. We only have to check the first character of an identifier to determine if it should get an x or X prefix. And we only need to check a character's successor's case to determine if a lowline should follow.
Post by tbreeden
I tend to go with mixed case names,
by mixed case, do you mean title-case?

like FooBarBaz
Post by tbreeden
mostly avoiding underscore chars.
I have not ever heard anyone saying they use lowlines in Modula-2 proper. The only use case is generally interfacing to C APIs with identifiers that have lowlines in them, but in M2 R10 we defined a pragma FFIDENT to handle this and we could adopt that pragma into M2C as it seems practical to do so.

PROCEDURE doFoo (*$FFIDENT="do_foo"*) ( bar : Baz );
Post by tbreeden
I don't find the encoding of type/var/const properties of the name nearly as essential
as using a name that correctly implies the meaning or purpose of the item withing the
context of the program.
Actually, I am NOT a fan of Hungarian notation, although I do find the often used C convention to append _t to a type identifier very helpful in the context of C which has little punctuation.

But the prepending of c, v and fn wasn't about encoding the category at all. The sole objective was to bring the identifier in line with the naming convention and if you prepend something, then what do you prepend? You can use a random char, or you could use a char that has some meaning. We tried lowercase l (ell) but this looks visually very odd and can easily be confused for digit 1. The next best thing to use was the initial of the identifier's category. Anyway, that scheme turned out unworkable.
trijezdci
2016-03-15 15:27:36 UTC
Permalink
Post by trijezdci
One which looks good is to append any uppercase letter that is not followed by a lowercase letter with a lowline in the C style output, so that all uppercase letters are surrounded by lowlines.
Here are some examples for this rule set:

module identifier ASCII

include guard
=> A_S_C_I_I__H

public identifier prefix
=> a_s_c_i_i__

constant identifier FOOBAR

=> F_O_O_B_A_R_

type identifier FooBAR

=> foo_b_a_r_t

procedure identifier convertToPDF

=> convert_to_p_d_f_
trijezdci
2016-03-15 15:34:33 UTC
Permalink
Post by trijezdci
constant identifier FOOBAR
=> F_O_O_B_A_R_
actually that should be

x_F_O_O_B_A_R_

because constant identifiers should start with a lowercase letter and FOOBAR doesn't, so it gets an x_ prepended.
m***@pascalprogramming.org
2016-03-16 10:50:05 UTC
Permalink
Post by trijezdci
This would then bring such identifiers in line with the presumed title-case/camel-case name convention. If the input uses all lowercase with _ separators, it can simply be copied verbatim for C style or converted to M2 style in the output.
However, it would be interesting to know what kind of naming styles are used. What are your naming conventions in your M2 source code? Thanks in advance.
Depends on what you want m2c for. For bootstrapping of one dataset, I guess enforcing some input style is not that bad. But in general not accepting the full spec of the language is quite bad.

I guess I would look if it ispossible to simply have two styles, a less readable but highly compatible, and a nice output with limited input, selectable with a cmdline paramter?
trijezdci
2016-03-16 11:47:08 UTC
Permalink
Post by m***@pascalprogramming.org
Depends on what you want m2c for. For bootstrapping of one dataset, I guess enforcing some input style is not that bad. But in general not accepting the full spec of the language is quite bad.
The M2C compiler has two objectives:

(1) provide a means to compile and run programming examples from Modula-2 books written before ISO M2, in particular Wirth's books.
(2) provide a modernised subset dialect for new development, in particular building a compiler for M2 R10.

The PIM modes are provided for obiective #1, the M2C's own dialect mode is provided for objective #2.

I am unaware of any book ever published in which a programming example used a module identifier that started lowercase. I might be wrong, but I believe we can safely assume that this never happened.

But in any event, as per my later comments on the thread, we have come up with a different translation scheme by which it is possible to permit lowercase starting module identifiers.

The idea is to violate the C naming conventions in a MINIMALLY INTRUSIVE way when Modula-2 identifiers are used that do not comply with our naming convention.

A module identifier FooBar would lead to a C include guard FOO_BAR_H and comply with C conventions, whilst a module identifier foobar would lead to a C include guard x_FOOBAR_H which does not comply with C conventions but the transgression is minimal.

Likewise a type identifier FooBar would become foo_bar_t in C but a type identifier fooBar would become X_foo_bar_t, again not following convention but again the transgression is minimal.

However, we will impose restrictions on the use of lowline in Modula-2 identifiers.

When lowline is enabled via compiler switch, a Modula-2 identifier may contain lowlines, BUT ...

(1) it may NOT start with a lowline,
(2) it may NOT end with a lowline,
(3) it may NOT contain consecutive lowlines.

Furthermore, ...

(4) if a Modula-2 identifier is mixed case, it may not contain any lowline.

#1 is necessary because C reserves a leading lowline for internal use by the C compiler.
#2 is necessary because qualified identifier conversion uses two lowlines to separate module and name.
#3 is necessary because the name translation for mixed case identifiers may lead to a trailing lowline.
#4 is necessary because generated C include guards (all-caps) could otherwise become ambiguous.

Of course the PIM specification does not permit lowlines in identifiers at all, so this won't narrow compliance of the PIM modes with the specification.


Nevertheless, the nicest C output will result from following the recommended M2 naming convention: module and type identifiers in title-case, all other identifiers in camel-case.
trijezdci
2016-03-16 18:03:08 UTC
Permalink
The identifier conversion specification document has been updated with the modified translation scheme.

https://bitbucket.org/trijezdci/m2c-rework/downloads/m2c-ident-conversions.txt
trijezdci
2016-03-17 06:30:02 UTC
Permalink
Post by trijezdci
The identifier conversion specification document has been updated with the modified translation scheme.
https://bitbucket.org/trijezdci/m2c-rework/downloads/m2c-ident-conversions.txt
I discovered two more scenarios where a name collision is possible for input identifiers with lowlines when the output mode converts identifiers to C conventions. This too, could be nudged by manipulating the output in some way, but I think it is worrisome that the translation scheme gets more and more exceptional rules.

When this sort of thing is happening repeatedly, it is time to reject features and go back to a simpler scheme. We are not going to sacrifice any lamb to the evil religion of consumerism and its feature gods.

The right thing to do at this point is to make the lowline and readable C output options mutually exclusive. Nevertheless, even when lowlines are enabled, the restrictions of no leading, no trailing and no consecutive lowlines will remain.
trijezdci
2016-03-17 15:39:23 UTC
Permalink
Version 3 of identifier conversion specification

https://bitbucket.org/trijezdci/m2c-rework/downloads/m2c-ident-conversions.txt
trijezdci
2016-03-21 15:18:15 UTC
Permalink
Version 3.2 of identifier conversion specification:

https://bitbucket.org/trijezdci/m2c-rework/downloads/m2c-ident-conversion.txt

This changes conversions for

- builtin identifiers to get a BUILTIN__ or builtin__ prefix
- private identifiers to get a PRIVATE__, Private__ or private__ prefix
- local identifiers (except variables) to get a LOCAL__, Local__ or local__ prefix

and thereby makes the output more explicit and intuitive, further avoids accidental collisions with arbitrary names imported from C libraries.

This brings the possibility of name conflicts with Modula-2 libraries when BUILTIN, Private or Local are used as module identifiers in the Modula-2 source. We have not yet decided whether we will make BUILTIN, Private and Local forbidden identifiers for module names or whether we will name-mangle them when they are used as module identifiers.
trijezdci
2016-03-22 12:10:57 UTC
Permalink
Post by trijezdci
This brings the possibility of name conflicts with Modula-2 libraries when BUILTIN, Private or Local are used as module identifiers in the Modula-2 source. We have not yet decided whether we will make BUILTIN, Private and Local forbidden identifiers for module names or whether we will name-mangle them when they are used as module identifiers.
Since PIM3 and PIM4 permit the conflict-causing identifiers as module names, we will need to name-mangle them anyway, at least in PIM3 and PIM4 mode.

They will be mangled as follows:

static const char *collision_replacement[] = {
/* BUILTIN => */ "__4255",
/* Builtin => */ "__4275",
/* builtin => */ "__6275",
/* LOCAL => */ "__4C4F",
/* Local => */ "__4C6F",
/* local => */ "__6C6F",
/* PRIVATE => */ "__5052",
/* Private => */ "__5072",
/* private => */ "__7072",
}; /* collision_replacement */

As for M2O mode, we may keep it this way or we may forbid the identifiers as module names or make it user selectable by compiler switch.

In any event, if anyone will use these names as module identifiers, the resulting identifiers in the C output will look like dog poo, which is perfectly intentional to discourage their use. If you want nice output, stick to our recommendations.
Xin Wang
2016-03-23 11:04:00 UTC
Permalink
Post by trijezdci
With work on M2C having moved to the code generator, the question of how to translate identifiers had come up. Since it is desired that the intermediate C code is human readable, it would also be desirable that it would be familiar to those who code in C but not in Pascal or M2. But it is also desirable to accommodate M2/Pascal practitioners.
In other words, it is desirable that M2C can generate both C and M2 style in its intermediate C output, and the style should be user selectable by compiler switch. For C style translations are all-lowercase with lowline "_" separators (foo_bar_baz) and for M2 style they are title case for module and type identifiers and camel-case for other user defined identifiers. In either style, language defined all-uppercase names are translated case by case, as there may be 1:1 C equivalents (INTEGER => integer).
The current name translation library covers all scenarios documented in
https://bitbucket.org/trijezdci/m2c-rework/downloads/m2c-ident-conversions.txt
This assumes that the Modula-2 input sources also use one of the two styles.
If somebody used fooType and fooVar, instead of FooType and fooVar) that alone wouldn't cause an issue, but if there were both FooType and fooType or both fooVar and FooVar in the same scope, that would cause name conflicts in the output.
VAR foo : Foo; => foo_t foo; or Foo foo;
but VAR foo, Foo : Bar; => bar_t foo, foo; or Bar foo, foo;
We discussed a scheme whereby
* module identifiers not starting with an uppercase letter would be rejected
* type identifiers not starting with an uppercase letter would be considered prefixed with T
* constant, variable and procedure identifiers not starting with a lowercase letter would be considered prefixed with c, v and fn respectively.
This would then bring such identifiers in line with the presumed title-case/camel-case name convention. If the input uses all lowercase with _ separators, it can simply be copied verbatim for C style or converted to M2 style in the output.
However, it would be interesting to know what kind of naming styles are used. What are your naming conventions in your M2 source code? Thanks in advance.
When skimming through `_STANDARD_LIBRARY` in M2R10 repo, I found that name of procedures are not quite consistent, like procedure names in `Scanner.def` start with captial letters. Will they be updated according to rules described in `m2c-ident-conversion.txt`?
trijezdci
2016-03-23 11:40:24 UTC
Permalink
Post by Xin Wang
When skimming through `_STANDARD_LIBRARY` in M2R10 repo, I found that name of procedures are not quite consistent, like procedure names in `Scanner.def` start with captial letters. Will they be updated according to rules described in `m2c-ident-conversion.txt`?
Some of this code predates a written down name convention. Initially we had a loose understanding that we would probably use the naming convention Rick used/uses in his Modula-2 courses at university and also in his book. That is to say modules, types and functions (except math functions) start with a capital letter, all else with a lowercase letter.

That convention is based on the observation that procedures often resemble verbs while functions often resemble nouns. The general idea is that verbs should be lowercase and nouns capitalised.

However, this rationale does not seem to be all that useful. Variables for example could be either verbs or nouns, and if they are of a function type, then they will be declared as variables but invoked like functions, so should they be lowercase or capitalised?! Furthermore, does it even have any practical value to mark identifiers based on whether they are verbs or nouns?!

As a result, we have lately been discussing the name convention again.

One alternative practical approach might be to capitalise all functions and procedures simply because it appears to look nicer in qualified names, e.g. FooLib.Bar() and FooLib.DoBaz versus FooLib.bar() and FooLib.doBaz.

Another alternative practical approach might be to use lowercase procedures and functions all the way but capitalise bindings. This way an extension module that extends an ADT library would stay clear of risking a name collision with a binding without needing to know what the names are used within a binding since identifiers in the extension module all start lowercase and bindings would be capitalised. This is an issue because in M2 R10 one does not call bound procedures by their names, but they are invoked through the built-in syntax they are bound to. Consequently one doesn't usually know what the bound procedures are called.

At this point, this is an ongoing discussion. Keep in mind that a naming convention is just a convention. The library will of course be made to conform to whatever we settle for.

The document that specifies the translations for M2C is specific to M2C, even if it informs and influences our ongoing discussion.
Xin Wang
2016-03-24 03:26:17 UTC
Permalink
Post by trijezdci
Post by Xin Wang
When skimming through `_STANDARD_LIBRARY` in M2R10 repo, I found that name of procedures are not quite consistent, like procedure names in `Scanner.def` start with captial letters. Will they be updated according to rules described in `m2c-ident-conversion.txt`?
Some of this code predates a written down name convention. Initially we had a loose understanding that we would probably use the naming convention Rick used/uses in his Modula-2 courses at university and also in his book. That is to say modules, types and functions (except math functions) start with a capital letter, all else with a lowercase letter.
That convention is based on the observation that procedures often resemble verbs while functions often resemble nouns. The general idea is that verbs should be lowercase and nouns capitalised.
However, this rationale does not seem to be all that useful. Variables for example could be either verbs or nouns, and if they are of a function type, then they will be declared as variables but invoked like functions, so should they be lowercase or capitalised?! Furthermore, does it even have any practical value to mark identifiers based on whether they are verbs or nouns?!
As a result, we have lately been discussing the name convention again.
One alternative practical approach might be to capitalise all functions and procedures simply because it appears to look nicer in qualified names, e.g. FooLib.Bar() and FooLib.DoBaz versus FooLib.bar() and FooLib.doBaz.
Another alternative practical approach might be to use lowercase procedures and functions all the way but capitalise bindings. This way an extension module that extends an ADT library would stay clear of risking a name collision with a binding without needing to know what the names are used within a binding since identifiers in the extension module all start lowercase and bindings would be capitalised. This is an issue because in M2 R10 one does not call bound procedures by their names, but they are invoked through the built-in syntax they are bound to. Consequently one doesn't usually know what the bound procedures are called.
M2R10 report[1] do not say much about `binding`, I find another link[2]. Is that relate to `blueprint`?

[1] http://modula-2.info/m2r10/pmwiki.php/Spec/LanguageReport
[2] http://modula-2.info/m2r10/pmwiki.php/Spec/BINDINGS
Post by trijezdci
At this point, this is an ongoing discussion. Keep in mind that a naming convention is just a convention. The library will of course be made to conform to whatever we settle for.
The document that specifies the translations for M2C is specific to M2C, even if it informs and influences our ongoing discussion.
trijezdci
2016-03-24 04:19:47 UTC
Permalink
Post by Xin Wang
M2R10 report[1] do not say much about `binding`
It says "(To do: transfer missing content from PDF version and update)".

Nevertheless, you can find examples of bindings in the library section of the repo. For example, in module BCD.def you will find

PROCEDURE [+] add ( a, b : BCD ) : BCD;

You could call this procedure as usual, using BCD.add(x, y) but the purpose of binding is to use built-in syntax instead, thus x+y. Therefore, you won't become familiar with their names.

An extension module inserts identifiers defined in the extension into the name space of the module it extends, therefore, user defined extension modules might cause name conflicts when accidentally defining a name that matches a bound name. Of course the compiler will reject and report such conflicts, but a name convention as described could rule this out if followed.

However, none of this is relevant to M2C as M2C will not support bindings.
Xin Wang
2016-03-24 05:40:59 UTC
Permalink
Post by trijezdci
Post by Xin Wang
M2R10 report[1] do not say much about `binding`
It says "(To do: transfer missing content from PDF version and update)".
Nevertheless, you can find examples of bindings in the library section of the repo. For example, in module BCD.def you will find
PROCEDURE [+] add ( a, b : BCD ) : BCD;
You could call this procedure as usual, using BCD.add(x, y) but the purpose of binding is to use built-in syntax instead, thus x+y. Therefore, you won't become familiar with their names.
An extension module inserts identifiers defined in the extension into the name space of the module it extends, therefore, user defined extension modules might cause name conflicts when accidentally defining a name that matches a bound name. Of course the compiler will reject and report such conflicts, but a name convention as described could rule this out if followed.
However, none of this is relevant to M2C as M2C will not support bindings.
Sorry if I miss something, but I can not find info about *extension module* in wiki or that old PDF[1].

[1] https://bitbucket.org/trijezdci/m2r10/downloads/M2R10.2014-01-31.pdf
trijezdci
2016-03-24 07:26:13 UTC
Permalink
Post by Xin Wang
Sorry if I miss something, but I can not find info about *extension module* in wiki or that old PDF[1].
The history of the PDF and our Wiki has been explained before, I won't go into it again.

Eventually missing pieces will be added to the online specification on our wiki but at this point in time there are other priorities. Patience!
trijezdci
2016-03-24 07:33:53 UTC
Permalink
Here is an example for an extension module that extends module BCD with IO operations.

https://bitbucket.org/trijezdci/m2r10/src/tip/_STANDARD_LIBRARY/BCDIO.def

The FOR BCD part in the module header targets module BCD.
Xin Wang
2016-03-24 08:47:40 UTC
Permalink
Post by trijezdci
Here is an example for an extension module that extends module BCD with IO operations.
https://bitbucket.org/trijezdci/m2r10/src/tip/_STANDARD_LIBRARY/BCDIO.def
The FOR BCD part in the module header targets module BCD.
Thank you for your patience. Blueprint and binding are considerably more complex than other parts of the language. I feel that I still can not grasp all the ideas after reading Rick's intro article[1]. I'll visits those concepts again after documentation is more rich or an implementation is out.

No hurry, and thank you again for all your efforts!

[1] http://thenorthernspy.com/spyAug2015.htm
trijezdci
2016-03-24 09:29:38 UTC
Permalink
Post by Xin Wang
Blueprint and binding are considerably more complex than other parts of the language.
I don't think the concepts are complex, its just unfamiliar, that's all.

A binding is simply a means to bind a user defined function to built-in syntax, such as built in functions, operators, the NEW statement, the FOR loop etc.

The following definition ...

PROCEDURE [+] add ( a, b : BCD ) : BCD;

... binds procedure add to built-in operator +.

This facility alone would be sufficient to allow user defined types to be usable just like built-in types.

However, it requires a lot of attention to detail to maintain consistency with built-in types or consistency among library defined types that belong to the same category.

For example, a library defined real number type should act in every aspect just like a built-in real number type. Whatever built-in syntax can be used with a built-in real number type should also be provided by a library defined real-number type, nothing should be left out, nothing should be added.

A blueprint is a compilation unit that defines which built-in syntax a type of a given category must provide. It is like a blueprint for a mechanical component to be produced. The part must follow the blueprint in every detail, it must not add anything that isn't in the blueprint.

The binding facility is only available when the library declares conformance to a blueprint in its module header. No declaration of blueprint conformance -- no binding. The compiler then verifies that the library actually conforms with the blueprint. This ensures consistency and referential integrity.

The library contains a rich set of blueprints but it is also possible to write your own blueprints. By doing so, the user of a library is alerted by the conformance declaration in the module header that the library is using a non-standard blueprint which likely means non-standard behaviour.
Xin Wang
2016-03-24 11:24:53 UTC
Permalink
Post by trijezdci
Post by Xin Wang
Blueprint and binding are considerably more complex than other parts of the language.
I don't think the concepts are complex, its just unfamiliar, that's all.
A binding is simply a means to bind a user defined function to built-in syntax, such as built in functions, operators, the NEW statement, the FOR loop etc.
The following definition ...
PROCEDURE [+] add ( a, b : BCD ) : BCD;
... binds procedure add to built-in operator +.
This facility alone would be sufficient to allow user defined types to be usable just like built-in types.
However, it requires a lot of attention to detail to maintain consistency with built-in types or consistency among library defined types that belong to the same category.
For example, a library defined real number type should act in every aspect just like a built-in real number type. Whatever built-in syntax can be used with a built-in real number type should also be provided by a library defined real-number type, nothing should be left out, nothing should be added.
A blueprint is a compilation unit that defines which built-in syntax a type of a given category must provide. It is like a blueprint for a mechanical component to be produced. The part must follow the blueprint in every detail, it must not add anything that isn't in the blueprint.
The binding facility is only available when the library declares conformance to a blueprint in its module header. No declaration of blueprint conformance -- no binding. The compiler then verifies that the library actually conforms with the blueprint. This ensures consistency and referential integrity.
The library contains a rich set of blueprints but it is also possible to write your own blueprints. By doing so, the user of a library is alerted by the conformance declaration in the module header that the library is using a non-standard blueprint which likely means non-standard behaviour.
It sounds similar to Rust's traits[1], if I understand correctly.

- *blueprint* vs. *trait*
- *extension module* vs. *impl*

[1] https://doc.rust-lang.org/book/traits.html
trijezdci
2016-03-24 15:46:51 UTC
Permalink
Post by Xin Wang
It sounds similar to Rust's traits[1], if I understand correctly.
- *blueprint* vs. *trait*
- *extension module* vs. *impl*
A blueprint has similarities with Smalltalk's protocol. Everybody else, whatever they may call it got the basic idea from Smalltalk, directly or indirectly.

The facility Java calls interface and Scala and others call trait is 100% plagiarised from Smalltalk. A blueprint has significant overlap with a Smalltalk protocol but it is not a 100% overlap.

There are some properties that a library needs to obtain that simply do not make sense to "implement" in the library every time, and sometimes it is not even possible to implement in the library because the property is built into the compiler.

One such example is the compatibility of a library defined type with literals. Most languages do not even bother to provide literal assignment compatibility for library defined types. Our blueprints allow just that.

A blueprint can specify compatibility with a literal of the language and any library that conforms to that blueprint automatically becomes assignment compatible with that literal, and anything that needs to be there to facilitate that compatibility is automatically enforced then.

This way you can define a library defined scalar type for very large integers and then use ordinary integer literals as legal values to be assigned to instances of the type.

IMPORT VeryLargeInteger;

VAR n : VeryLargeInteger;

n := 123'456'789'012'345'678'901'234'567'890'123'456'789'012'345'678'901'234'567'890;

The compiler accepts such assignments if the blueprint of the type specifies compatibility with integer literals. There is no type specific conversion involved. Instead the compiler converts the value to the language defined internal representation called scalar exchange format.

Scalar types are required to implement two conversion procedures, one to convert from scalar exchange format to the type's own representation, another for the opposite. The former is then used to convert a literal that has already been converted to scalar exchange format by the compiler at compile time. The conversion procedures are also used for safe type conversions between any two given scalar types.

The conversion procedures are implemented by the type, but the literal compatibility is implemented in the compiler itself thanks to the standardised scalar exchange format.

In this way, a blueprint not only enforces a library's implementation conforms, but it also gives built-in properties to conforming types. This makes blueprints different from other instances where the Smalltalk protocol concept has been adopted.

As for extension modules, they are not comparable with what other languages call implementation.

Like any other Modula-2 library, an extension library consists of two compilation units, a definition part (DEFINITION MODULE) and a corresponding implementation part (IMPLEMENTATION MODULE).

An extension library is simply a piece of a library module that has been stuck into a separate set of files.

Typically library defined types in M2 R10 will follow a pattern where the main part is in a library that has the same name as the type itself, for example type BCD has BCD.def and BCD.mod. However, the IO for the type is placed in an extension library represented by BCDIO.def and BCDIO.mod. The math library is placed in another extension library represented by BCDMath.def and BCDMath.mod.

When BCD is imported, the type itself and all basic operations such as built-in operators are available but not IO operations such as READ, WRITE, WRITEF, nor math library operations like sin, cos, tan etc.

For the IO operations the type's IO library must be imported, for the math library operations, the math library must be imported. However, these imports insert the added identifiers into the type's own name space, thus

IMPORT BCD, BCDMath;

VAR x, y : BCD;

y := BCD.sin(x);

instead of

y := BCDMath.sin(x);

Also, an implementation module may mark certain low level procedures as private. These procedures are not visible to clients of the library, but they are visible to extension libraries targeting the library.

The ability to insert new "methods" into an existing library originated in Smalltalk as well, but in Smalltalk there is no specific compilation unit associated with it, it can be done anywhere. Of course such anarchy is entirely unwanted in a disciplined language like Modula-2, hence the extension module.
Xin Wang
2016-03-25 02:26:20 UTC
Permalink
Post by trijezdci
Post by Xin Wang
It sounds similar to Rust's traits[1], if I understand correctly.
- *blueprint* vs. *trait*
- *extension module* vs. *impl*
A blueprint has similarities with Smalltalk's protocol. Everybody else, whatever they may call it got the basic idea from Smalltalk, directly or indirectly.
The facility Java calls interface and Scala and others call trait is 100% plagiarised from Smalltalk. A blueprint has significant overlap with a Smalltalk protocol but it is not a 100% overlap.
There are some properties that a library needs to obtain that simply do not make sense to "implement" in the library every time, and sometimes it is not even possible to implement in the library because the property is built into the compiler.
One such example is the compatibility of a library defined type with literals. Most languages do not even bother to provide literal assignment compatibility for library defined types. Our blueprints allow just that.
A blueprint can specify compatibility with a literal of the language and any library that conforms to that blueprint automatically becomes assignment compatible with that literal, and anything that needs to be there to facilitate that compatibility is automatically enforced then.
This way you can define a library defined scalar type for very large integers and then use ordinary integer literals as legal values to be assigned to instances of the type.
IMPORT VeryLargeInteger;
VAR n : VeryLargeInteger;
n := 123'456'789'012'345'678'901'234'567'890'123'456'789'012'345'678'901'234'567'890;
The compiler accepts such assignments if the blueprint of the type specifies compatibility with integer literals. There is no type specific conversion involved. Instead the compiler converts the value to the language defined internal representation called scalar exchange format.
Scalar types are required to implement two conversion procedures, one to convert from scalar exchange format to the type's own representation, another for the opposite. The former is then used to convert a literal that has already been converted to scalar exchange format by the compiler at compile time. The conversion procedures are also used for safe type conversions between any two given scalar types.
The conversion procedures are implemented by the type, but the literal compatibility is implemented in the compiler itself thanks to the standardised scalar exchange format.
In this way, a blueprint not only enforces a library's implementation conforms, but it also gives built-in properties to conforming types. This makes blueprints different from other instances where the Smalltalk protocol concept has been adopted.
As for extension modules, they are not comparable with what other languages call implementation.
Like any other Modula-2 library, an extension library consists of two compilation units, a definition part (DEFINITION MODULE) and a corresponding implementation part (IMPLEMENTATION MODULE).
An extension library is simply a piece of a library module that has been stuck into a separate set of files.
Typically library defined types in M2 R10 will follow a pattern where the main part is in a library that has the same name as the type itself, for example type BCD has BCD.def and BCD.mod. However, the IO for the type is placed in an extension library represented by BCDIO.def and BCDIO.mod. The math library is placed in another extension library represented by BCDMath.def and BCDMath.mod.
When BCD is imported, the type itself and all basic operations such as built-in operators are available but not IO operations such as READ, WRITE, WRITEF, nor math library operations like sin, cos, tan etc.
For the IO operations the type's IO library must be imported, for the math library operations, the math library must be imported. However, these imports insert the added identifiers into the type's own name space, thus
IMPORT BCD, BCDMath;
VAR x, y : BCD;
y := BCD.sin(x);
instead of
y := BCDMath.sin(x);
Also, an implementation module may mark certain low level procedures as private. These procedures are not visible to clients of the library, but they are visible to extension libraries targeting the library.
The ability to insert new "methods" into an existing library originated in Smalltalk as well, but in Smalltalk there is no specific compilation unit associated with it, it can be done anywhere. Of course such anarchy is entirely unwanted in a disciplined language like Modula-2, hence the extension module.
Rust's traits is different from interfaces in some other languages. It can be used to extend existing types.

Also, Rust have specific traits used to extend bulitin operators and loops, see docs of `std::ops`[1] and `std::iter`[2] for details.

Further, implementation modules in Rust need to be imported to make extended methods available, which is also similar to M2R10's usage of extension module.

[1] https://doc.rust-lang.org/std/ops/
[2] https://doc.rust-lang.org/std/iter/
trijezdci
2016-03-25 06:38:50 UTC
Permalink
This post might be inappropriate. Click to display it.
Xin Wang
2016-03-25 09:19:27 UTC
Permalink
Get it.

I'm not aware that "method categories" in Squeak is kind of specification.
trijezdci
2016-03-25 09:30:15 UTC
Permalink
Post by Xin Wang
I'm not aware that "method categories" in Squeak is kind of specification.
As I said, the roles are not that clearly separated in Smalltalk. A protocol fulfils the role of specification and interface, but in some Smalltalk systems they are only informal in which case they can take on the role of specification, interface and implementation. The lines are blurred.
Chris Burrows
2016-03-25 11:58:14 UTC
Permalink
Post by trijezdci
Very very very little new has been invented in language design since Smalltalk. We are all standing on the shoulders of the giants at Xerox PARC.
If you had said "... in object-oriented language design ..." instead of "language design" you may have a valid point. However, there are many other programming languages in existence that are equally significant in their own domain that bear no resemblance to Smalltalk. Also, you talk as though Smalltalk was created out of thin air. Don't forget the significant influence of that Simula had on Smalltalk.
trijezdci
2016-03-25 17:03:27 UTC
Permalink
Post by Chris Burrows
Post by trijezdci
Very very very little new has been invented in language design since Smalltalk. We are all standing on the shoulders of the giants at Xerox PARC.
If you had said "... in object-oriented language design ..." instead of "language design" you may have a valid point.
All that falls under "very very very little". Considering how many leaps were made in the 1960s and 1970s, very very very few such leaps IF ANY AT ALL, have been made since. Milner et al may be the one exception to that.
Post by Chris Burrows
However, there are many other programming languages in existence that are equally significant in their own domain that bear no resemblance to Smalltalk.
I am talking about concepts, not resemblance.
Post by Chris Burrows
Also, you talk as though Smalltalk was created out of thin air.
No I don't because I said something about "since Smalltalk". I didn't say "since Smalltalk and any time before Smalltalk".
Post by Chris Burrows
Don't forget the significant influence of that Simula had on Smalltalk.
The influence of Simula on Smalltalk is greatly exaggerated. Read up on Kay's history of Smalltalk. You will find he credits an unknown designer of an IBM file system dating back to the 1950s for the OO design pattern. He further states that he had seen this pattern many times afterwards and that he was puzzled how everybody draw conclusions from Simula that didn't seem important to him. In fact that difference runs like a red line through the OOP terrain to this day. Also, Smalltalk has had an influence beyond OOP (by whichever definition) while Simula has not. Kay says OOP doesn't require inheritance but late binding and genericity is key. It is precisely in those areas where Smalltalk has had a strong influence in areas not usually considered OOP.

Furthermore, Smalltalk wasn't the only thing that came out of Xerox PARC. Modula-2 for example also came from there in the form of Mesa. It had bizarre looking syntax and its transformation to Modula-2 was a facelift to make its syntax look more like Pascal. Oberon is a subset with a minimal OO facility that Wirth himself has credited to Smalltalk.

Naturally, even the folks at Xerox PARC were building upon earlier research, but they made very serious leaps back then which we have not really seen much of since.
Chris Burrows
2016-03-25 23:23:45 UTC
Permalink
Post by trijezdci
The influence of Simula on Smalltalk is greatly exaggerated.
Your claim is in direct conflict to Alan Kay's own words:

"It is not too much of an exaggeration to say that most of my ideas from then on took their roots from Simula--but not as an attempt to improve it. It was the promise of an entirely new way to structure computations that took my fancy."

Refer to "The Early History Of Smalltalk" by Alan C. Kay, Apple Computer. ACM SIGPLAN Notices Volume 28 Issue 3, March 1993 Pages 69 - 95.

http://dl.acm.org/citation.cfm?id=155364
trijezdci
2016-03-26 02:00:49 UTC
Permalink
Post by Chris Burrows
Post by trijezdci
The influence of Simula on Smalltalk is greatly exaggerated.
"It is not too much of an exaggeration to say that most of my ideas from then on took their roots from Simula--but not as an attempt to improve it. It was the promise of an entirely new way to structure computations that took my fancy."
Don't cherry pick, read the whole thing. My statement is 100% correct.
Loading...