ShinTakezou's Blog: 2013-04

Occasionally (sometimes often, sometimes rarely) I try to do my best to answer some question on stackoverflow.

Recently I have answered this question but at first I had interpreted badly the OP request: in fact my first proposed solution told him to use grep to eliminate the lines containing final asterisk. But the user wanted the asterisk to be ignored, not the line to be deleted. So I fixed my answer to use sed to delete the asterisk and let diff make its work.

The edit has timestamp 2013-04-23 13:28:19Z.

Soon after I posted it, I saw another answer which proposed the same "double pipe" solution I gave but with the correct sed. Its timestamp was 2013-04-23 13:28:50Z, i.e. it came after the fix of my answer; in the same time it cites my previous wrong answer, so it was "built" on the shoulder of it (the one with the grep instead of sed).

That was interesting and made me think about a reason to be strongly against some kind of usage of patents (and maybe the whole patent industry), and how these can stop indeed independent innovation and ideas. Patent advocates state exactly the opposite, but it's of course a matter of defending their interests (and/or protegees)... a matter of money and not of innovation, progress, expanding human knowledge and so on.

The point is this: we proposed the right sed-solution independently. (He credited me just for the structure of the command line, i.e. the way the sed command is invoked in order to obtain the diff). Saying it differently, the functional part of the answer (the sed trick) was "produced" independently by each others.

This shows that having the same problem to be solved, having similar knowledges, similar shared informations and similar set of tools can make two minds to propose solutions that are similar if not identical.

That's why patents to "protect" intellectual properties are a very dangerous lie and stop innovation, giving the key to it in the hand of "who came first" in order to make money, prohibiting other minds to arrive to similar solutions independently.

The simpler is the "object" of the patent, the higher the probability two or more people produce the same solution. So that allowing patents can prevent others from using shared, common, widely available human knowledge to reach any result, lock the consumers of a solution to one solution provider (for a long time) and so promote differences, cutting the less rich out from the game and make it possible price control to maximize earning despite the intrinsic value of the "object" (when applicable), in detriment of the consumers.

I should never be tired of repeating it: word-processors are one of the more overrated and misused software in the computers of too many common users.

It's not a battle against the infamous Microsoft Word, even if from the freedom point of view, office suite like OpenOffice or LibreOffice are a better choice. The speech holds for these software as well: the problem is with that kind of software, with what is called currently word-processing.

In short the main point is that textual, human readable formats in specific contexts are better than any binary format. It should be not hard to agree when the content is itself textual: an article, our next best-seller book, our thesis about political science, our screenplay, notes of several kind (shopping list, names and numbers you need to remember), a blog entry and so on. They all are primarily made of text.

Unfortunately people are accustomed to WYSIWYG, which is often evil unless you are an artist and you are doing some kind of visual elaboration of the text which is part of the message you want to convey, or if you have to arrange the text altogether with a lot of graphics. Users should emancipate from this visual approach and learn to focus on the content, when it is what they have to deal with.

Word-processors make users to believe they are in control of how things will be presented and that it is their responsibility. But often it's not their duty. Often it's someone else duty and the users must focus on the content and the role of segments of this content: they have to mark a piece of text e.g. as chapter title, but forget about how that chapter title will be rendered.

Often the way things must be presented is codified; there are rules describing page size, font styles, font sizes, spacings and so on. Once you get that you need to deal for real only with the meaning or role of the things you write, then you are ready to throw away your preferred word-processor (or use it in a totally different way — modern WP would make it possible, but they don't work as well as other tools).

So you must prefer human readable formats: the file containing the text you wrote does not need a specific piece of software. It's enough a text editor. You can use a language that allows you to describe the actual content, marking it someway. And that's the sense of a markup language.

When I talk about human readable formats and markup language I am not thinking about several eXtensible Markup Languages produced and consumed by a machine: in fact even the evil Microsoft new MS Office formats, altogether with the OpenDocument standard for documents (the standard you should use when you won't follow the suggestions of this article), are XML based formats. So you could read the contents, but unlikely you can benefit of this possibility, and it would be even harder if not impossible to write that content by hand. Moreover everything is packaged into a zip archive and so those formats appear as binary.

Some reason why purely textual, simple formats are better than binary or complex textual machine-addressed formats:

contents can be understood having no or few knowledge of the format (this does not mean it's easy or they are usable the same way you would do interpreting correctly and knowing the format);
script to parse the data can be easily written by a programming enabled mind. Although the finest art is not for everyone, simply grepping or data extraction should be, almost;
the only program needed to view or edit the file is a text editor: an application that must be on every computer with any operating system installed. The data are operating system independent and software independent (not 100% true, but it is in common modern computing worlds);
merging and splitting can be done easily; if there's some sort of structure, this minimal knowledge is required in order to let the splitted files or the single merged file to keep an independent meaning;
diff-ing and comparing can be done with standard common tools;
versioning tool can track changes more easily (without relying on the specific tools of the specific software that can handle the binary format)

Simple markup languages are particularly tailored for benefitting of these features. This article was written using a subset of HTML, the markup language of the World Wide Web, and stored into a directory managed by Mercurial, a distributed Source Control Management tool, that makes it possible to track the changes and the history and (pushing to and pulling from a remote server) I could also contribute with myself. Without using a special software to handle this particular format.

The format of course was chosen according to a specific need: in this specific example the format is suitable for direct web publishing. Other requirements would have made us choose other formats. E.g. since I am an Emacs user, the format I choose for notes and other casual writings is often the org-mode.

Interpreting the markup language (that could be easy and at hand for a lot of computer geeks) it is possible to transform it (to another markup language or to anything else) and elaborate it in several mechanical ways.

Another classic example is when you think about stuffs you want to see printed, like for example a book, or an article on paper; then one of the most suitable format is LaTeX.

Even if you need a specific application (a set of applications and data, indeed) in order to produce the final document ready to be printed (e.g. a PDF), the format is textual, structured, it was thought to be written by hand (with just a text editor) and focus on content and not on presentation. From one unique source you can produce e.g. PostScript, PDF, HTML document or (virtually) any kind of document and format. And yet, you can read it with a text editor. And you can even add metainformations in disguise of comments.

For sure everything is clearer if you use a modern powerful text editor able to highlight the syntax of such languages. But it is not something you can't live without.

Once you learn to separate the meaning from the way it is presented, then you get the value of using everything but a word-processor for the vast majority of the things you may imagine (when the value is the content and not the way it is shown).

Said it in another way: the WYSIWYG approach is largely overrated. You must learn to separate the actual content from the way you want it to be seen on a screen, paper or other media. Learn that a lot of these way are codified (or they should), and usually you are not the one who codified it. So, you must focus on content, since the description of how to show it sits elsewhere — maybe it's even someone else duty, thus you have not to worry about.

Finally: before to fire a word-processor, consider other approaches that can make your life easier, even if it doesn't seem so at first (in this article I have ignored several anecdotes that drove crazy a lot of word-processors' users trying to obtain what they wanted from their software, losing more time on these efforts than in producing their content). In general, if you drop all word-processors forcibly, you will discover how rarely useful they are and how beneficial other workflow can be.

ShinTakezou's Blog

2013-04-24

Just another thought about how patent can be silly

2013-04-19

Against the misuse of word-processors

Other links