2020-10-02

Chaining mistakes

You shall never do the same mistake again.

Unless you don’t recognize that it’s the same error.

First, let’s summarize what I’ve learned when you work on old, cluttered codebases which should be rewritten from scratch but nobody dares.

  1. Do not trust functions’ or variables’ names
  2. Do not trust functions’ arguments to do what’s suggested by their names
  3. Do not trust comments

Pretty absurd, right? But sometimes it’s necessary.

Somewhere in the past

I had to do some development on C code the roots of which date back to 2005 or so. Unfortunately the code uses a lot of global variables. But that’s not the only problem.

I had to add behaviour tied to Something, which is — well, something. Think of it as a status persisted on a database. Hence, all begins with reading the status from the database.

    /* ... */
    ReadSomethingFromDb();

Exactly: no arguments. Because the function uses globals. Which is a horror show, but that’s what’s in the code you must work on.

Now, I thought I had the status and could work on it to add the behaviour I needed. But during tests clues suggested that somewhere between reading Something and my code there was code that updated Something.

Of course I didn’t identified ReadSomethingFromDb as the culprit because, you know, the name of the function says what it does, doesn’t it?

But after inspecting here and there without finding the updating bits, I had to take a look at that function too. And well, as you can imagine already, ReadSomethingFromDb did more than just reading.

This is, indeed, the risk of using globals instead of functions’ parameters. I don’t know how and why it happened — I wasn’t there yet.

Now the function is renamed as ReadAndUpdateSomethingOnDb, but it still uses globals as before: refactoring this means to change a lot of code — if the owner doesn’t understand the importance of fixing this kind of problems in the code of its product, she wouldn’t pay for that refactoring full rewrite of the code. And nobody works for free, and anyway nobody would take the risk.

Such a code will become a little bit worse at every new development, because new piece of code must either cause a change in almost everything else, or adapt to the craziness of the current code. Guess what’s the most common choice.

Almost in the present

Now let us suppose we have this function:

    void do_whatever(int x, int y, bool use_next, char *next);

I would write it so that if use_next is false, next can be NULL. You can ask: what’s the point of use_next? In fact you can pass NULL for next and do_whatever won’t use next, of course. You need to understand that this isn’t how the things are for real: there’s no use_next boolean parameter. Nonetheless, the name of the function and comments about what it does suggest that it doesn’t need next in some use cases (depending on other parameters).

Thus naïvely I passed NULL. And boom, segmentation fault. Stupid me: I had to imagine I can trust no contract, simply because there isn’t any.

The problem is that even if do_whatever shouldn’t use next, because it doesn’t need it when used that way, it indeed uses it at the very beginning of code, like this:

    *next = '\0';

And then it won’t touch it again.

Easy fix:

    if (next) *next = '\0';

I think this is better than modifying the client like this:

    char just_in_case[N];
    do_whatever(a, b, false, just_in_case);

This is a second strike: I trusted again something I shouldn’t have trusted; that is, common sense… There’s no common sense to be trusted in programming: either you found your choices on clearly stated binding contracts, or you don’t assume anything happy and reasonable. Especially when you work on ancient, cluttered code full of globals.

Never again…

And do not trust comments!

This isn’t a third strike: it’s indeed a bonus tale I find funny.

Dated comments that state a property which doesn’t hold anymore (but you don’t know when the property was removed), or a needed FIX/TODO or improvement, are the most funny.

One of those goes like this:

    // remove as soon as possible because
    // it's hardwired. XX 6/10/2009
    if (x == SPECIAL_CONSTANT) {
       // code with hardwired constants here and there
    }

When you read such a comment in 2020, you start wondering if the code it refers to is up to date. Also, what’s the comment suggesting? What should be removed? The entire condition? Unclear, and we are in 2020, and … well, what’s hardwired? Was there something hardwired I can’t see anymore in the condition, but there are other stuffs which could be described as “hardwired”.

Commits log can tell a story and make it clear, if you have time to dig. I think many of us just decided to leave things as they are: it was some XX’s responsability to update code and comment. But XX is not in the project since 2010 or so…

One could think that the whole if statement is something hardwired; hence, is the comment suggesting a new desiderable development where the SPECIAL_CONSTANT is taken from a configuration?

It’s unclear, and of course there isn’t a reference to requirements. Indeed, there is no requirements’ document dating back to 2009… That is, the code is the only “normative” source about itself…

If you have to rewrite it from scratch, you must be sure the new version behaves the very same way of the original, defects and bugs included.

Conclusion

When you work on such a code for too long, you start wishing it explodes1, so that finally they will understand that it needs to be rewritten from the ground up.


  1. It must be something that nobody understands how to fix. Programmers of the Thing are too good and patient: they always try and find a way to make the Thing stands up again, and step by step it goes, so that the Top doesn’t care. Also they aren’t scared by heisenbugs (which the Thing already exhibits). Arguments about how much more it costs to maintain such a Thing than if it were well-written, don’t work — also because to make it well-written you must invest now a lot of money: better to patch here and there as problems come. If only they would find programmers so desperate that they do it in their spare time…! But there’s a big testing-time need which scares everybody. It just can’t be done at once, and nobody on Top seems to reason in term of “let’s start today to have this Better Thing in 2 years or so, and then let it live altogether with the Old Thing until we see they are indeed the same”. It costs, it costs a lot, it is also a big risk (even if it can be mitigated), and there’s poor visibility of the Thing from the place where Top Heads, those who decide of the money flow, live. I could also say that The Thing looks a so small gear among huge gears, therefore everyone thinks it could be replaced at the last moment, if needs should arise. Indeed fews know what the Thing does, and even less clear are its responsabilities. The Thing is in the intersection of several different areas owned by more powerful player, and fews see the Thing as something of its own: rather it must be part of one of those bigger player. Part of the problem comes from a sort of political game: the Thing got responsabilities which shouldn’t be its; but it has “stolen” those in order to drive the flow of money and power towards a specific team, given that the core functionality of the Thing Of the Origin wasn’t something that needed to be evolved; if you own a code which works (mostly because it does few things well), and you are a team leader of a small team and everybody needs a wage and a justification for it, what do you do? You maintain the code, ok: you do bug fixes and alike, ok. You implement new features as required, ok — but it happens, say, three or four times per year… The Thing’s world is very small after all (when compared to those huge gears…) Enter the politics: you maneuver (because you can and you are good at it) so that new responsabilities are transferred into the Thing. Responsabilities which the code isn’t very much prepared to accomodate, but you and your team is good enough to plug them without rethinking the Thing. Good enough? Maybe it should have been “reckless enough”. And short-sighted — I’m sorry because I know many of them and I respect them (mostly)… but living on laurels (so to speak) of a much simpler system (the Thing of the Origin) made the Thing such a mess that every small addition can be painful, and it’s risky, full of unintended and unexpected consequences. Moreover, year after year, it’s become harder to straighten crooked lines, til the point that I don’t see how it could be done anymore — and in fact, I am adding my own shit, because that’s the only way you can cope with the Thing without doing a major rewrite — nobody would pay for, and it’s risky, so that doing it “underground” is out of question, even if it would be interesting. Yes, rewriting the Thing would be interesting a lot. The first step would be “reverse engineering” the requirements from code, and with this I’ve said already everything (I hope) to explain why it won’t happen.↩︎

No comments:

Post a Comment