2011-05-17

Languages can't be compared simplistically

Through the codeproject mealing list I landed on that kind of article that is frustrating reading to, since it is totally pointless and someway false. So this post will be.

Reasonable note

Likely the only thing to be remembered is that every single existing programming language has its own advocates. All have their strong reasons and subtle arguments to say their language of choice is better than language X or language Y.

So the most important thing to stress is that they are all wrong, and in the same time they are all right. It just depends on the point of view, programming habits and purposes, knowledge bases, and other elements I forget.

Ok, stop with the democracy.

Everything is Number

One thing that must be remembered is that computers understand only one language. This means that no matter the language you use, it must "map" to assembly language someway. To be extremists, high-level languages are just syntactic sugar for the Only Language.

Of course, this fact does not mean that for a human it is the same thing to write code in assembly and, say, Ada. This only means that each language potentially can do everything "machine language" can, and no more.

The way humans can express an algorithm using a computer language however matters, and this is a reason to create high-level language. Likely, for an average human being, it can be considered impossible to model a complex program using assembly, though as said before assembly can do everything.

Moreover, there are other advantages when we stick to high-level computer language: readability (other programmers can say what your code does), maintainibility (you can add features and drive out bugs faster and easier), portability (HL languages can provide abstraction from the hardware), ...

But all these features and more I forget come with a price: other programmers did the hard job of writing a lot of code (at the very beginning in a very low level way). And it continuously happens so again and again, at different "layers", e.g. when we use a new expressive syntax of an evolving language, or "new" classes or methods of STL or Boost libraries, part of which are becoming a new luggage for the standard C++.

One consequence of this speech is that — even though each language deserves its niche and likely outside of it another language can be considered more powerful — we can pick our language of choice and extend it as we want to reach the expressiveness of another language, using (ad hoc) pre-processors / pre-compilers and writing proper libraries of functions. (Don't forget once C++ was a sort of pre-compiler that produced C code, and it works this way the GNU Sather compiler too...)

Of course, why would you create handmade tools/libraries when they exist libraries that do what you want, or why should you not stick to another language? This sometimes can't be done easily (I am thinking about job and tons of already written code), but even when it can be done, it does not prove that the language you are leaving is inferior to the brand new one you're going to use.

But this is, briefly, the argument of the article: C++ is superior since something can be done easily (while C can't at all... and we have already said this can't be true).

Expressiveness matters

I agree that "expressiveness" matters, as said above... However a language shows to be less expressive or more expressive in a particular context, not per se. So again expressiveness (in a particular context) does not prove that a language is superior or more useful than another.

Taking seriously this argument and picking a specific problem to be solved, we can show that e.g. Matlab is superior to C++... and repeating the procedure we can show that language X is superior to Matlab, and Y to X and so on.

This escalation theoretically ends with the language Z, the most expressive and powerful language, superior to any other language. Then, why are we losing our time with inferior languages like C, C++, Matlab...?

Focusing on specific features that "show" C badness or lack, we can find another language that has those features, and a set of other features that C++ miss; so, we have proved C++ is vastly inferior.

Here it comes the article

Now let me go silly mad on some sentences of the article.

I will not concentrate on the actual implementation of this matrix type functions, as that's rather trivial in both C and C++. What I will concentrate on, however, is the usage of such a type in each language.

Still less trivial in those languages that support matrix manipulation directly: you don't need to implement anything, just use! Anyway, is it really trivial to implement such a thing, in both C++ and C? I suspect it is not as easy as it seems.

And about C, we don't need to reimplement the GSL (GNU Scientific Library) (awaiting for sparse matrices, maybe one day we'll see them; in the meantime, we can stick to Matlab or Octave... of course, someone have written the code, in C++ or even C...); we could in C wrap some operations to provide the wanted lazyness feature; and we can wrap GSL matrix operations in a C++ class of course. We obtain syntactic sugar, basically; I would stick to GNU Octave or R, anyway.

The implementation as you say is also important... since it dictates how you can use the matrices and the kind of errors you can do, if you use the API in the wrong way... Programming errors are possibile in any language, not only in C.

The C implementation, however, is not as straightforward because C does not offer a direct paradigm for this. If the C version wishes to match the C++ version in speed and memory usage, it will have to basically simulate the C++ implementation using a struct and functions which handle objects of that struct type (including things like initialization and destruction)

C++ does not offer direct paradigm for matrices computation. Again, Matlab/GNU Octave win over C++. Are you talking about operators overloading? Yes, it is an interesting thing, that makes the code easier to be read... but clues about how performance (speed) and memory usage enter the speech? Totally unrelated, if we must ignore the implementation; the C++ implementation can't be magically faster and less memory-hungry just because it is C++, or since it allows operator overloading... Indeed it is exactly the opposite. The class/objects' stuffs consume (a bit of) cpu time and a little bit more of memory; if we wrap into a C++ class a smart C implementation, properly and smartly translated into C++, the C++ version will eat anyway more cpu time and memory!

If you don't like the operator overloading of the Matrix class for whatever strange reason, you could always write the equivalent named functions instead. It's not really the main point here.

Indeed I thought it was one of the most important point! So you are saying that instead of
Matrix c = 2*a + b;
we can write
Matrix c = a.mult(2).plus(b);
Another example could be
Matrix c = a.plus(b).mult(2);
How do you translate this? Isn't this a possible source of error?! Do you think it is not so because C++ is superior? And what about C++ programmers? Wasn't it really an important point??

Your C equivalent is not complicated; it is simply "strongly procedural" (so to say); nothing complicated. Anyway I would have done something like
int foo()
{
matrix a = NULL, b = NULL, c = NULL;
matrix_ctx ctx = NULL;

matrix_start(ctx);

a = matrix_new(ctx, 1000, 1000);
b = matrix_new(ctx, 1000, 1000);

c = matrix_assign(ctx, c, a);
c = matrix_mult(ctx, c, matrix_unit(ctx, 2));

c = matrix_sum(ctx, c, b);

matrix_end(ctx);

return some_value;
}

The context stuff is given thinking about multithreading, if it is not a concern, a static hidden context can be used in the matrix library, avoiding the extra argument. Currently there's no need for a label, all functions must be safe when null pointer are given. Of course if you need, you can free a single matrix with matrix_free(ctx, m).

coding conventions need to be obeyed to avoid memory leaks, something which was not necessary in the C++ version

Memory leaks are impossible in C++? What does it happens if your implementation uses new and you never call delete? If C++ has a pool for new allocated memory that is released at the end of the program, ... then implement a similar pool for the C matrix "object": you never allocate the object by yourself, so how it is done is hidden (and tracked) inside the implementation, you only have to remember to call matrix_end(ctx) somewhere, maybe in a atexit function? (There could be a super-context accessed properly using mutex, so that a multithreaded application has only to call something like matrix_terminate() to destroy all contexts)

The C version is also easy to break accidentally. How easy it would be to simply write something like:
Matrix c = a


... Real programmers do not accidentally write such a thing once the API is clearly mandating the use of a function like matrix_assign or whatever... In my implementation, with that code c would be simply a not-tracked (through reference counter) "alias" for a. And anyway, even your innermost "simple" implementation of the Matrix class could be accidentally broken by some simple to write mistake...

The assignment works in C++, once you've correctly implemented the needed method to do it. And if you accidentally are able to write Matrix c = a, ...

Error handling

Proper error handling is always a problem in complex code; sure, languages allowing try-catch blocks or similar (and raising/throwing exception in general) give a great help... but however proper error handling existed before these "tricks" became widespread.

In my "imagined" implementation, the foo() does not need any error check, but the one needed to know if the result of computation is valid. If the result of the computation is, as foo(), just an integer, and all integer values are possible, other methods must be used of course... We could use the super-context, that can hold (per thread) the result of the last operation; since a fail in an operation can cause the fail of all subsequent operations, we don't need to add more code to handle anything, but at the exit of the function foo(); where we shall check the "super context per thread error code" just to know if the integer holds a correct value or not.

It must be also stressed the fact that try-catch presupposes a correct error checking point-by-point in the class to throw the exception (i.e. to catch all errors you still need if-then-else or whatever); that throwing exception is after all jumping; and that C mimicing C++ way of doing error handling is a bad idea: to do a correct and easy error handling in C, you have just to think differently.

Initialization and desctruction

Derivative types embedding C matrix "object" needs a whole new set of functions to handle them. Then, they can be used without worrying about what's inside... C++ classes hide in constructors and destructors the work; a class using Matrix objects allocated using new needs to deallocate them in the destructor. Matrix class hides initialization and destruction stuffs, so they do the matrix_* functions, and my implementation vastly simplify the usage, allowing the user to be not too much worried about allocations.

Each time I need to make an "instance" of SomeCompoundStruct, I need just to call the proper function. It will do all the works. Exactly how the code produced by the C++ compiler will free the memory used by the objects when it goes out of scope (or the program ends). This is just because we are avoiding "new".
typedef struct
{
Matrix* mainMatrix;
Matrix* secondaryMatrix;
int someValue;
int anotherValue;
}
Would this work without additional code? And in the other case, would this prove C is inferior??!

Anyway, indeed C has not "compound types" that can be handled as primitive types, since it has not the hability to embed "behaviours" into a new type. You discovered that C is not a OO language (though the features could be provided even by non-OO languages)

The final question for this section is: do we need such a SomeCompoundStruct?

Data containers

void bar(int amount)
{
std::vector matrices(amount, Matrix(1000, 1000));
//...
}

The above line creates an array which contains 1000x1000-sized unit matrices. Since, as specified, Matrix uses copy-on-write, each matrix in the array shares the same matrix data, so there's only one such data block allocated after the array has been created. This can be an enormous memory saver if not all of those matrices are modified.

Small note: the fact that Matrix(1000, 1000) gives unit matrix is just an assumption on an imagined implementation; nowhere it is clear that Matrix(1000, 1000) means something different by "give me a void matrix of 1000x1000"... I would rather implemenent a sort of factory to give initialized useful matrices, and would keep Matrix(1000, 1000) for the void matrix (uninitialized here means made of all 0 or null objects).

The code translated into C is
void bar(int amount)
{
vector matrices;
matrix a;
matrix_ctx ctx;

matrix_start(ctx);
a = matrix_new(ctx, 1000, 1000);
a = matrix_assign(ctx, a, matrix_unit(ctx, 1));

matrices = vector_new(amount, a, NULL);

// ...
vector_free(matrices, NULL);
matrix_end(ctx);
}
When I need basic containers and useful functions, I use the glib; here I imagined a possible implementation of a "vector object", that mimics (from a prototype point of view) the way STL vector can be initialized (and how it is used in this example).

I don't need to go into the details of the implementation of those functions, exactly how you did not need to go into the details of the implementation of the STL vector class.

Basically one line of C++ code requires a dozen of lines of C code.

This is just because you are using libraries that already do what you need, while you are imagining to implement what you need in C from scratch; of course C code, lacking some syntactic sugar and runtime help, will be "more verbose"; but not necessarily like that. Do you say it depends on where you "move" complexity, i.e. "where" you delegate the actual "computation"? O, well, this is almost the whole point about top-down/bottom-up approaches, maybe.

The reason C doesn't offer such an utility is because it can't, and that's one of the major problems with the language

This is a totally wrong perception about what a language is. As said far above, C can. How, this is, at most, the difference. At what cost, if it were you who have to write the code from scratch. Not so for the std::vector: someone else wrote the class for you, and put as standard class for C++ (the existance of the Boost library shows that even C++ with STL lacks utilities programmers would like to have). However, here you are talking about standard libraries, confusing the richness of one and the poverty of the other with the abilities and features of the language.

Implement the code to run my snippets, the more glue code, invent syntactic sugar (handled by a pre-processor/pre-compiler), ... and you'll have another C++ ... This does not prove C++ is superior; it just proves that C and C++ are equivalent.

Linked list

Linked lists are implemented in C eons before C++, templates, ... were in a human mind... As, on the other hand, it is true for assembly... Generalization (i.e. a set of functions to handle lists of any kind of object) is possible.

That's because C cannot offer any rational generic linked list implementation which would work with any user-defined type

The use of "opaque pointers" is what could be used here. Proper casting, well-thought and written functions that uses user-defined functions to handle creation-destruction of the objects in the list, can be used to create generic lists handling functions that can "link" any kind of "object" (represented as a pointer/reference to the actual object).

The amount of code required for that is quite significant, and the code will be complicated, error-prone and hard to follow

It would not be so big; anyway once written you hide it into a library, and just use the API... and it won't be more error prone than the implementation of a class of the STL library. And you don't need to follow it, just use it... exactly how you do with the STL classes.

Nested containers

Once you have the right code for C vectors (of opaque objects), and C lists (of opaque objects), and given a set of function to handle all these objects, the rest is straightforward. Longer than C++ (not considering the code of STL classes of course and not considering the code to handle vectors and lists in C, that we built into a library), but not so complicated and inefficient as you believe.

Moreover, the above code is completely safe and efficient.

The efficiency of the code depends on the implementation. If I can trust totally (and why?) STL classes' implmentation, should I magically agree with the fact that Matrix class is inherently efficient?

Yet the executable binary produced by the compiler will be as efficient as it can get.


With respect to...?

This is the beauty of C++. When modules are properly designed, it makes it extremely easy to take a module and reuse it with something user-defined

This is the beauty of many languages. In particular, as you remembered at the beginning, C++ is not too much appreciated by some with respect to its OO features. And I largely agree, but this is another topic.

The point here is that the sentence is good for C too. If modules (libraries) are properly designed, they can be reused with something user-defined easily.

The functionality in the example above can be reproduced in C, but it will be extremely complicated and hard to follow, and very error-prone

As writing a compiler for a language like C++, implementing the STL classes... I bet the code of them would be hard to follow... And very error-prone, but we know how complex code evolves, functions/modularity hide complexity, and bugs get fixed, doesn't we?

Copying containers

This single line of code translates to hundreds of lines of complicated and unsafe C code, which requires minute attention to detail and strict following of coding conventions

Again, you are forgetting about the fact that there's actual code behind the scene, and that it was written by someone. So, I will imagine there's a set of libraries/modules that gives C containers and all the needed stuff, so that I can simply write
array2 = vector_copy(array2, array);
// ...
vector_destroy_all();
matrix_destroy_all();
// ... and so on...
This basically means: the API can be clean and simple. There's a lot of code to be written (if it was not already written), ok, as it was for STL... But this does not prove the language is inferior to this or that (in the worst case it means it lack a powerful library for containers and similar). Moreover as said at the beginning, each language has its usage domain. If you need such a "complex" structure and you can't find a good library out there doing what you need, then you can stick to another language. I claim that rarely you need such a complex "hierarchy", and that 70% of common "problems" can be solved with C, thinking the C-way (not trying to force C to imitate OO languages like C++, as you've done).

Finally

if not even thousands of lines of C code, but an experienced C programmer is used to that.

Experience and bloating? Experienced programmers using bricks (no special libraries in sight) know that first they have to write their "tools" to approach easily the problem. Once done, these tools are put apart into a library, and we need just to use them, so that the code (the actual code doing the actual thing) is kept short (likely still longer than C++, but not that longer).

About templates, it is true, C has not templates, and templates are powerful, really. Anyway again, with a smart design and bits of preprocessor magic, the final wanted result can be reached.

We must not forget anyway the question: it makes always sense, or we just have picked a problem (which is in a subset of the set of all problems) suitable in particular for C++, just to show that with C it could be harder? (Again I have to stress the fact that if it is true that in C it would be so harder, this anyway does not prove C++ is superior)

And let's focus on the chosen problem: it is easy to use the very same problem to show that another language (e.g. let's consider a functional language, just to be "original") is superior to C++.

Many of the argument you used can "prove" that also Fortran (even without the object oriented features of the 2008 standard) is currently superior to C... and because of chosen example, since you don't need to implement anything to handle matrices in Fortran, we can argue that Fortran is superior to C++; instead of Fortran, we could talk about languages like Matlab, GNU Octave, or maybe more evidently about Python or similar... all these are superior to C++ for almost the same reasons why C++ is superior to C.

Even if you try to struggle by making the matrix type and all of its functions as preprocessor macros (something which would make it some of the most horrible pieces of code ever created), it would still fall short because it wouldn't work with the GMP, MPFR and other similar libraries which do not act as primitive types.

I believe there's an elegant and smart way of doing it, thus the code would not be the most horrible pieces of code ever created. Even C++ won't work by magic with non-primitive types: you have to overload operators and use functions and pointers the C-way — if there's a class that wraps everything, it means just that you've delegated the (small) "complexity" to someone else. (BTW this made me think about C++ ABI and name mangling, which raise several problems...)

And at last, the last sentence...

C++ is inferior to C (or, if you prefer, C is superior to C++) since it is much easier to write bad code in C++ than in C. This is a simple truth.