trijezdci
2012-07-26 14:09:50 UTC
We have been thinking about a pragma that would tell a Modula-2 compiler to statically check the format string in a foreign function call to a C printf-style function and generate warnings or errors if the format string and its variadic argument list do not match up.
Ideally, such a pragma would be general enough to verify not only the format strings of the printf series of C functions but also other kinds of format strings. In principle this could be achieved by using an EBNF grammar or regular expression syntax within the pragma to tell the compiler what components in a format string are parameter specifiers, thereby facilitate (a) counting how many arguments are required and (b) what types the arguments need to be.
for example
PROCEDURE printf ( fmt : ARRAY OF CHAR; arglist : UNSAFEARGLIST ) <* FFI = "C" *>
<* VARARGFMTSTR = fmt : <regexp1> = <type1>, <regexp2> = <type2>, ... *>;
This looks easier than it is, because the printf format string syntax alone is horribly convoluted. It would be relatively easy if every % accounted for one distinct argument in order, but there are so called numbered argument specifiers that can refer to any argument, even one that was already referred to before by an unnumbered argument specifier and there can be multiple such numbered argument specifiers referring to the same argument, regardless of the position of the specifier. What a bloody horrible piece of stinking dog poo. The designers should be ashamed of themselves!
Unfortunately, if we are going to allow foreign function calls to C from a safe language such as Modula-2, then we ought to try to make it as safe as we reasonably can even if the function itself is not as safe. The GCC compiler for example does statically check the format string of the printf series of functions. We should at least match that level of safety.
Perhaps the idea to use regular expressions to identify the argument specifiers is good, perhaps it is not, perhaps there are other approaches worth looking into. We'd appreciate any kind of feedback or crazy ideas how one might possibly go about this. Thanks in advance.
Ideally, such a pragma would be general enough to verify not only the format strings of the printf series of C functions but also other kinds of format strings. In principle this could be achieved by using an EBNF grammar or regular expression syntax within the pragma to tell the compiler what components in a format string are parameter specifiers, thereby facilitate (a) counting how many arguments are required and (b) what types the arguments need to be.
for example
PROCEDURE printf ( fmt : ARRAY OF CHAR; arglist : UNSAFEARGLIST ) <* FFI = "C" *>
<* VARARGFMTSTR = fmt : <regexp1> = <type1>, <regexp2> = <type2>, ... *>;
This looks easier than it is, because the printf format string syntax alone is horribly convoluted. It would be relatively easy if every % accounted for one distinct argument in order, but there are so called numbered argument specifiers that can refer to any argument, even one that was already referred to before by an unnumbered argument specifier and there can be multiple such numbered argument specifiers referring to the same argument, regardless of the position of the specifier. What a bloody horrible piece of stinking dog poo. The designers should be ashamed of themselves!
Unfortunately, if we are going to allow foreign function calls to C from a safe language such as Modula-2, then we ought to try to make it as safe as we reasonably can even if the function itself is not as safe. The GCC compiler for example does statically check the format string of the printf series of functions. We should at least match that level of safety.
Perhaps the idea to use regular expressions to identify the argument specifiers is good, perhaps it is not, perhaps there are other approaches worth looking into. We'd appreciate any kind of feedback or crazy ideas how one might possibly go about this. Thanks in advance.