Discussion:
request for feedback: making foreign function calls to C printf-style functions safer
(too old to reply)
trijezdci
2012-07-26 14:09:50 UTC
Permalink
We have been thinking about a pragma that would tell a Modula-2 compiler to statically check the format string in a foreign function call to a C printf-style function and generate warnings or errors if the format string and its variadic argument list do not match up.

Ideally, such a pragma would be general enough to verify not only the format strings of the printf series of C functions but also other kinds of format strings. In principle this could be achieved by using an EBNF grammar or regular expression syntax within the pragma to tell the compiler what components in a format string are parameter specifiers, thereby facilitate (a) counting how many arguments are required and (b) what types the arguments need to be.

for example

PROCEDURE printf ( fmt : ARRAY OF CHAR; arglist : UNSAFEARGLIST ) <* FFI = "C" *>
<* VARARGFMTSTR = fmt : <regexp1> = <type1>, <regexp2> = <type2>, ... *>;

This looks easier than it is, because the printf format string syntax alone is horribly convoluted. It would be relatively easy if every % accounted for one distinct argument in order, but there are so called numbered argument specifiers that can refer to any argument, even one that was already referred to before by an unnumbered argument specifier and there can be multiple such numbered argument specifiers referring to the same argument, regardless of the position of the specifier. What a bloody horrible piece of stinking dog poo. The designers should be ashamed of themselves!

Unfortunately, if we are going to allow foreign function calls to C from a safe language such as Modula-2, then we ought to try to make it as safe as we reasonably can even if the function itself is not as safe. The GCC compiler for example does statically check the format string of the printf series of functions. We should at least match that level of safety.

Perhaps the idea to use regular expressions to identify the argument specifiers is good, perhaps it is not, perhaps there are other approaches worth looking into. We'd appreciate any kind of feedback or crazy ideas how one might possibly go about this. Thanks in advance.
Marco van de Voort
2012-07-27 16:17:53 UTC
Permalink
Post by trijezdci
We have been thinking about a pragma that would tell a Modula-2 compiler
to statically check the format string in a foreign function call to a C
printf-style function and generate warnings or errors if the format string
and its variadic argument list do not match up.
Ideally, such a pragma would be general enough to verify not only the
format strings of the printf series of C functions but also other kinds of
format strings. In principle this could be achieved by using an EBNF
grammar or regular expression syntax within the pragma to tell the
compiler what components in a format string are parameter specifiers,
thereby facilitate (a) counting how many arguments are required and (b)
what types the arguments need to be.
for example
PROCEDURE printf ( fmt : ARRAY OF CHAR; arglist : UNSAFEARGLIST ) <* FFI = "C" *>
<* VARARGFMTSTR = fmt : <regexp1> = <type1>, <regexp2> = <type2>, ... *>;
This looks easier than it is, because the printf format string syntax alone is horribly convoluted. It would be relatively easy if every % accounted for one distinct argument in order, but there are so called numbered argument specifiers that can refer to any argument, even one that was already referred to before by an unnumbered argument specifier and there can be multiple such numbered argument specifiers referring to the same argument, regardless of the position of the specifier. What a bloody horrible piece of stinking dog poo. The designers should be ashamed of themselves!
Unfortunately, if we are going to allow foreign function calls to C from a safe language such as Modula-2, then we ought to try to make it as safe as we reasonably can even if the function itself is not as safe. The GCC compiler for example does statically check the format string of the printf series of functions. We should at least match that level of safety.
Perhaps the idea to use regular expressions to identify the argument specifiers is good, perhaps it is not, perhaps there are other approaches worth looking into. We'd appreciate any kind of feedback or crazy ideas how one might possibly go about this. Thanks in advance.
We've discussed this on IRC once. IMHO it goes against the spirit of the M2
language drawing such runtime constructs into the standard. If only because
it will probably break at the first new GNU extension to the syntax, so it
means you will have to have some setting to turn checking off, otherwise the
standard might already be broken by design before the first compiler comes out.

Do you have a safe open array type of your own? Not having a safe construct
will only force people to use the dirty one, also for non C interfacing
usage.
trijezdci
2012-07-27 22:08:32 UTC
Permalink
Post by Marco van de Voort
We've discussed this on IRC once.
Yes, I vaguely remember ;-)
Post by Marco van de Voort
IMHO it goes against the spirit of the M2 language
drawing such runtime constructs into the standard.
I think you misunderstood. There is nothing runtime here. It says "to statically check". This means, if the pragma is present in the scope of the foreign function definition and you make a call to the foreign function, then either of two things would happen:

- if the format string in your call is not a compile-time expression, the compiler would report that.

- if the format string in your call is a compile-time expression, the compiler would then use the rules in the pragma to identify argument specifiers, the number of required arguments and their types, then check that against the actual parameters passed and if there is a mismatch, report the mismatch.

As to the question what the consequences then are, this should be under user control. For example a compiler switch would determine whether the report should be a warning, thus being ignored, or whether it should be an error, thus preventing generating any executable code. Also, a user may want the first case to be treated as a warning but the second case to be treated as an error.


This could be

Well, perhaps for pedantic folks, there might be a compiler switch that makes the compiler even reject the call because the format string is not a compile-time expression, but that would be under the user's control and shouldn't be the default.

It is very much in the spirit of Wirthian languages to do as much as possible during compile-time to increase safety at runtime. In this respect such a feature would be within the spirit of the language.

I'd agree that it would be against the style of the language to add proper syntax for such a feature, but we are talking about an optional pragma, that is to say, a recommendation to implementors "if you feel strong enough about checking this type of dangerous FFI call, then here is how you should do it".
Post by Marco van de Voort
it will probably break at the first new GNU extension to the syntax, so it
means you will have to have some setting to turn checking off, otherwise the
standard might already be broken by design before the first compiler comes out.
No, that wouldn't be the case for a couple of reasons.

First, any extensions that might be added are extremely unlikely to break existing format strings. Therefore, any existing calls that the compiler verified and cleared as a result of the pragma would continue to compile fine even after the format specifier syntax had been extended. The only impact at that point would be that if you wanted to make use of the additional format syntax in your foreign function calls, then the compiler would likely report a mismatch.

Second, the whole reason why we have been thinking about a grammar or regular expression based pragma is that it is under user control what the compiler should check for in the format string. So, if extensions are added to the format string syntax of a C function you want to call, and you would like to make use of this additional format string syntax, then you would want to update the grammar or regular expression definitions in the pragma to teach the compiler the additional syntax.

Last but not least, we have conditional compilation pragmas, so you can always do something like this:

<* IF GCC_EXTENDED_PRINTF_FMTSTR *>
PROCEDURE printf ( fmt : ARRAY OF CHAR; arglist : UNSAFEARGLIST ) <* FFI = "C" *>;
<* MSG = WARN : "*** CALLS TO C PRINTF FUNCTION ARE UNCHECKED ***" *>
<* ELSE *>
PROCEDURE printf ( fmt : ARRAY OF CHAR; arglist : UNSAFEARGLIST ) <* FFI = "C" *>
<* VARARGFMTSTR = fmt : <regexp1> = <type1>, <regexp2> = <type2>, ... *>;
<* MSG = INFO : "*** Calls to C printf function are checked ***" *>
<* ENDIF *>
Post by Marco van de Voort
Do you have a safe open array type of your own?
We have native support for type safe variadic procedures.
Post by Marco van de Voort
Not having a safe construct will only force people
to use the dirty one, also for non C interfacing
No, because support for unsafe variadic parameters is limited to foreign function interfaces only.

If you want to implement a variadic procedure in Modula-2, then you have to use a type safe variadic formal parameter in the procedure header. Type safe variadic formal parameters can be used for foreign function interfaces as well, but they do not cover the printf use case, so if you want to call printf style C functions then the only way to do so is via the unsafe facility which must be imported from SYSTEM btw.

hope this clarifies
Marco van de Voort
2012-08-02 11:26:04 UTC
Permalink
Post by trijezdci
Post by Marco van de Voort
IMHO it goes against the spirit of the M2 language
drawing such runtime constructs into the standard.
I think you misunderstood. There is nothing runtime here. It says "to
statically check". This means, if the pragma is present in the scope of
the foreign function definition and you make a call to the foreign
I do understand that. I mean to allow it for general use.
Post by trijezdci
- if the format string in your call is not a compile-time expression, the
- compiler would report that.
For the "pedantics", that should be switchable again, preferably cmdline (I
use pedantic functionality sometimes to get to know the strengths and
weaknesses of a large body of source, just like turning on runtime checks)
Post by trijezdci
It is very much in the spirit of Wirthian languages to do as much as
possible during compile-time to increase safety at runtime. In this
respect such a feature would be within the spirit of the language.
That is. Maybe I should be more direct to state what I would prefer:

It is just not in the Wirthian spirit to call such constructs from the main
modules. It should be confined to the implementation of an lowlevel
interfacing module only, and such symbol should not be exportable to non
lowlevel modules.

IOW you can only call a proper M2 version of an open array type, which can
then in turn try to transform it to printf. This avoids litering unsafe
calls all through the mainmodule, and reduces the number of calls (AND their
usage) confining such constructs to a handful of lowlevel module's
implementations.
Post by trijezdci
I'd agree that it would be against the style of the language to add proper
syntax for such a feature, but we are talking about an optional pragma,
that is to say, a recommendation to implementors "if you feel strong
enough about checking this type of dangerous FFI call, then here is how
you should do it".
Avoid having to many of them in the first place. I think the time can be
better spent making that possible. Automated checking only makes sense if
you assume it will be a regularly used construct in general code, instead of
an isolated feature to interface some more ackward parts of the system.
Post by trijezdci
Post by Marco van de Voort
it will probably break at the first new GNU extension to the syntax, so
it means you will have to have some setting to turn checking off,
otherwise the standard might already be broken by design before the first
compiler comes out.
No, that wouldn't be the case for a couple of reasons.
First, any extensions that might be added are extremely unlikely to break
existing format strings.
Therefore, any existing calls that the compiler
verified and cleared as a result of the pragma would continue to compile
fine even after the format specifier syntax had been extended. The only
impact at that point would be that if you wanted to make use of the
additional format syntax in your foreign function calls, then the compiler
would likely report a mismatch.
I was more thinking that the first use of such an extension (like an extra
value before or after the letter) would only leave an user the choice to
turn off the checking. Most will do so globally.

(added later: I see the grammar specification feature fixes that, see below
for comment on that)
Post by trijezdci
Second, the whole reason why we have been thinking about a grammar or
regular expression based pragma is that it is under user control what the
compiler should check for in the format string. So, if extensions are
added to the format string syntax of a C function you want to call, and
you would like to make use of this additional format string syntax, then
you would want to update the grammar or regular expression definitions in
the pragma to teach the compiler the additional syntax.
Scary, and a terrible burden on the compiler builder (for doubtful benefit
IMHO), but it does fix the above concern so that the checking feature at
least is not broken by default as I was scared for. I hope you make it an
optional part to enforce this.
Post by trijezdci
Post by Marco van de Voort
Not having a safe construct will only force people
to use the dirty one, also for non C interfacing
No, because support for unsafe variadic parameters is limited to foreign
function interfaces only.
True.
Post by trijezdci
If you want to implement a variadic procedure in Modula-2, then you have
to use a type safe variadic formal parameter in the procedure header.
Type safe variadic formal parameters can be used for foreign function
interfaces as well, but they do not cover the printf use case, so if you
want to call printf style C functions then the only way to do so is via
the unsafe facility which must be imported from SYSTEM btw.
hope this clarifies
The worst bits are defanged yes. Still don't like it, or consider it an
elegant solution, but so be it.
trijezdci
2012-08-02 20:37:52 UTC
Permalink
Wirth defined a simple rule: Any facility that bypasses the otherwise strict safety rules of the language belongs into module SYSTEM and as a consequence, once a facility is imported from SYSTEM, then safety may no longer be guaranteed, thus import from SYSTEM works as an indicator, too.

I know some people do not like this rule set. They believe SYSTEM is evil and should be removed or at least it should be crippled. I find Wirth's rule set for SYSTEM perfectly sufficient and I do not share those notions on SYSTEM. This is not something I feel like discussing either. I take SYSTEM for granted.


The suggested pragma takes effect only in combination with a facility that is provided by SYSTEM. Therefore, in order to use the pragma, one would first need to use the facility and in order to use the facility, one would first need to import it from SYSTEM. It thus satisfies Wirth's rule set for SYSTEM.

Did we consider to revise Wirth's rule set for SYSTEM to put restrictions on what kinds of modules can import from what other kinds of modules? Yes we did and it turned out to be a silly idea that we abandoned soon again. I am not going to comment any further on this.
Post by Marco van de Voort
a terrible burden on the compiler builder (for doubtful benefit
IMHO)
No, you have it upside down. The pragma is suggested as a recommendation so that compiler implementors who feel they want such a check will be able to use a blueprint that avoids rendering source code non-portable, which is a risk if nothing is defined.
Post by Marco van de Voort
I hope you make it an optional part to enforce this.
Post by trijezdci
we are talking about an optional pragma, that is to say,
a recommendation to implementors "if you feel strong
enough about checking this type of dangerous FFI call,
then here is how you should do it".
j***@gmail.com
2012-08-04 15:32:49 UTC
Permalink
I think that any error or warning message that could increase the safety is always a good thing.
Calling the printf C function from Modula-2 is useful, but programmers could use it in a wrong way, a pragma that prints an error or warning on the use of this
particular feature is a good idea.
Julian Miglio

Loading...