Check at compile time if a going-to-be-stringified symbol is a member of a struct

I wanted a string which must “contain” the name of a member of a struct. The compiler should check the right spelling (at compile time).

At last I have found a possible reason to use the comma operator…

A straightforward solution could be as follow:

#define STR_MEMBER(S,X)  ((dummy_ ## S).X, #X)

This exploits the comma operator: the left operand is evaluated and its “result” discarded, and then the right operand is used as result of the expression.

I want to use the macro like this:

/* ... */
struct l1st
    const char *name;
    uint16_t flags;

// it would be another struct, indeed
struct l1st dummy_l1st;

struct l1st els[] = {
    { STR_MEMBER(l1st, name), 0 },
    { STR_MEMBER(l1st, flags), 0 },
    { NULL }
/* ... */

This isn't nice because you need dummy_X for each structs you want to use the macro for.

In C compilation will fail when the els is a file-scoped variable (it's ok of it's a local variable). It's different for C++. In both cases, anyway, it works as expected, but we have those dummy structs we'd like to get rid of.

The comma operator

The macro uses the comma operator.

A typical usage of the comma operator is this: you write the left operand because of its side effects, but you are not interested in the return value, if there's any.

I thought that if it can be easily seen that the left operand has no side effects, then it can be given as granted that the compiler won't produce the code for the left operand, since actually it “does” nothing, it isn't an “action” at all, it's just a value which is then discarded, and something else is put into the struct. Being able to compute the value at compile time shouldn't matter, since in fact the value is discarded without the computation triggering side effects.

Thus, I thought the left operand would disappear at compile time, no matter what it was, because the compiler can see (at compile time) that specific condition when it can forget about the left operand: the value isn't used and there aren't side effects.1

It happens that compilers may indeed behave in this way, but you can't rely on this assumption because C and C++ standards don't require this kind of optimization (I've been told2).

It isn't a problem for the macro I've given above, but it is for the first mindless attempt to avoid the need for the dummy structs, which was just something like this (C++):

#define STR_MEMBER(S,X) (((S*)nullptr)->X, #X)

This compiles3 and works, but it could fail — I've been told. Indeed, when inspecting the generated code, I haven't seen any code trying to access X at nullptr (“0”), so the binary will work always. But you can't be sure that the same source code, compiled by another compiler (or another version of the same, maybe on a different machine…), will produce a binary executable that will work. All experiments I've done so far suggest that a decent compiler won't produce the code that does the dangerous memory access.

However the standard doesn't shield us from a simpler compiler, so this solution isn't acceptable.

In C++ the good alternative could be something like:

#define STR_MEMBER(S, X) (sizeof (S::X), #X)

In C it looks like:

#define STR_MEMBER(S, X) (sizeof ((struct S *)NULL)->X, #X)

I like a little bit more the use of offsetof (stddef.h), because according to me it makes clearer that I want to check something that has to do with the relationship struct–member:

#define STR_MEMBER(S, X) (offsetof(struct S, X), #X)

and in C++ (including cstddef) we can spare a struct, if we like to do so:

#define STR_MEMBER(S, X) (offsetof(S, X), #X)

However in both cases (sizeof and offsetof) it seems like I could be interested in a size or an offset, which I am not.

Once you spot the comma operator, you'll be enlightened; nonetheless I take note of the fact that there isn't a way to say such an easy thing to the compiler: does that symbol (which I'm going to stringify) exist?

In the worst case, this compile time check will produce actual, undesired code. All I wanted is the compiler to do (at compile time) some checks for me; it's part of its job, after all.

  1. Reading from a memory location usually haven't side effects, unless it's a special memory location or we take into account the physical effects as side effects we're interested in. To avoid the compiler to optimize such an important memory access we can always use volatile. In fact if I add it, the compiler emits the code to access that address, hence you obtain a segfault if you run the executable.

  2. Check my SO question In the comma operator, is the left operand guaranteed not to be actually executed if it hasn't side effects?. There was a little bit of confusion about the expression “to be actually executed”. By that I mean: the compiler produced (at compile time) assembly code that will be run when you run the program. In this case the code will execute a memory access to address 0, which isn't what we wanted and usually gives a segmentation fault.

  3. Compiled both with g++ 6.3.0 and with clang++ 3.0-6.2, with option -O0.

No comments:

Post a Comment