2010-12-12

Acer again

At last I've bought it: a new Acer Extensa 5235... Once I said I won't buy an Acer again because it was unsupportive with GNU/Linux and too much sort of "Microsoft is the best, buy it with my computers", but it happened this one came without Microsoft Windows... Maybe (surely) not the best hardware I could get, but it is a lot more than I've got before, and cheaper than I planned at the beginning.

It came with "Linpus", that I wiped out after discovering a part of it (if not the whole system) was compiled in 2008, and that it was badly installed (in fact the computer started only selecting "start with the last working config" from grub menu), and that it wasted a lot of hd giving not too much... thus I've rejected the dual boot idea and reinstalled Ubuntu 10.10, and I am still "tuning" it (again:().

Having this powerful hardware, and a lot of room on the hd (going from 30 Gbyte shared among MS Windows and GNU/Linux to 250 Gbyte made me ask myself what I can do with all that free space...), I've installed many things I had not installed before, e.g. and notably FlightGear.

While installing software and software, I've realized I have badly partitioned the hd, and soon I'll have to change something; anyway this should not be an issue I hope.

Leaving some notes that could be useful:


  • To be able to control brightness from Ubuntu, I had to add acpi_osi="Linux" into GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub (you have to escape the " and write indeed \"), and so in the same time I become aware of the fact that grub is changed: no more menu.lst, it's built on the run now...

  • If you install from "third parties" or, even worse, from files that are not .deb (like the Adobe Acrobat Reader which give an executable) and you are not able to edit your menus anymore, you could find interesting to know that the program for menu editing is called alacarte (from french à la carte, I suppose) and if you run it from the command line you'll be informed of what went bad; in my case, it happened that some directories and files in my ~/.config/menus folder had root as owner, instead of me... of course, a consequence of a badly written installer called with sudo; solved with sudo chown -R ME:ME .config/menus; check for symlinks before.

  • Webcam and microphone works without any effort, and wireless too...

  • ZynAddSubFx had problems and I've uninstalled it; though it is an interesting piece of software, I am now "devoted" to Csound and other tools, so I don't need it; however I think the knot that made it not work could be back when running other apps. I hope I am wrong.

  • Emacs started with a too high window (frame), I had to apply a fix in the .emacs; currently I've used a solution found in a blog, but I do not like it since it change the size after it was opened, so the user can see the change; however the solution tries to get the best size at "runtime"

  • Though this notebook was not sold with MS Windows, the keyboard has its logo on it... as usual, I've mapped it to Meta (mainly for Emacs usage); and of course the Menu key is bound to Compose

  • Euro and Dollar key near the "direction keys" do not work; xev does not "listen" to them; I will find a solution one day; but I find those keys mostly unuseful, it would be more interesting to map them to something more interesting than €$, even if their position suggests the binding can't be "critic"



I haven't copied data from the old good hd into this one yet, so I feel still al little bit "naked". Since the old laptop is out of order, I'll need to put its hd in a IDE-usb "bay" or something like that. I'll do.

2010-12-08

Traffic stats

I've just seen that there are stats about blog traffic! And I've just discovered a lot of traffic came from StackOverflow, a place I've almost discontinued to frequent — good place for practical solutions to practical problems, but absolutely deaf and blind for reasoning, "theoretical" analysis of hypothesis, exploring knowledge and so on (like any other Q&A site, no matter how much technical they call themselves)...

Anyway since it can produce traffic to this very "useful" blog, the piece of me allured by net popularity (!) is suggesting me not to forget the place... Now that I am thinking about it, I've left alone a war-speech about the output of a (intentionally) wrong piece of C code and distinction between "undefined" and "unspecified" behaviour in the standard paper for the language. It's enough for today, but in the weekend, if I'll own a brand new laptop, I'll poke the fire until someone can explain for real and competently why that code outputs what many people expect, as if it would be "unspecified behaviour", and what should realistically happen under the cover of the compiler in order to manifest the "undefined behaviour" — my position was that the specific code will output always what indeed it outputs, despite the "undefinition", and so, in the specific case, for the specific code, no "undefined behaviour" can manifest itself (and indeed even the unspecified behaviour would be hidden by the way the code was written, it was about passing a "problematic" argument to a printf with a format string that would take only one argument, details on SO, but I'll be back on the topic, don't be afraid).

Wo, two posts in a day, I am growing the statistics!

How many days in a month?

A coworker proposed to me a little challenge sometimes I think about. My first solution, though correct, was rejected since it used the ternary operator ?: and there's a way to succeed without.

Today is holiday here in Italy and I am using my spare time pushing the button of my almost-dead laptop just to hear the one-long-beep-two-short-beeps sequence (hoping not to hear it), then hold the button to turn it off and then stick hitting keys on the job laptop — where I am now.

After I've read a lot of pages from On the Edge - The Spectacular Rise and Fall of Commodore, almost finishing it, I decided to let my eyes be enlightened by the LCD of the job laptop, where there's nothing more interesting to do than surfing the net and using gcc (MinGW) in the unpleasant environment MS Windows XP provides.

Ok stop making the post longer than it could be...

The challenge was: using just an expression, get the number of days in a specified month (given as a number from 1 to 12). No-one told me about constraints for February so I assumed it has 28 days; moreover, initially there was no prohibition of the ternary operator...

My first solution was:

g = 31 - ((m < 9)? (m+1)%2 : m%2) - 2*(m==2);

which worked on all the system I've tested it (MS Windows, Sun Solaris on SPARC, and GNU/Linux on Intel) but obviously contains a possible bug dued to m==2 that does not need to be evaluated as 1 when true, though likely it does (As std C says, boolean expressions evaluate to 1 when true; unless we're dealing with non std compliant compiler, or if we translate the idea in other languages).

My second solution is:

g = "103012310321"[m-1] - 18 - (m&2);

This one is maybe tricky but is more robust (provided m is in [1,12] and we are on a system using ASCII), and can be easily modified in order to yield 29 for February. Moreover, what it makes me like it is that it uses something rarely many non-expert programmers are aware of about C (I mean using "array notation" for a literal string). The practice could be marked as bad, but it is great when codegolfing!

About codegolfing... If I am not wrong, that code in GolfRun would be "103012310321"m(=18-m2&- but currently I can't test it since my already-compiled (for GNU/Linux) work-in-progress GolfRun interpreter is on the Real Out-Of-Work Laptop, though I could checkout the SVN repository on this machine and try to compile it with MinGW on MS Windows — what a pity it uses Gtk+ and GMP extensively and I want not to go into the trouble of having them properly installed (ready for development usage) on this hateable system.

The third solution to the challenge that comes into my mind is about looking at the bits. The previous solution already does it... a bit! But the idea can be extended; for this purpose I've made the following table


XY ABCD
31 1F 1 11 11 1 0001
28 1C 1 11 00 2 0010
31 1F 1 11 11 3 0011
30 1E 1 11 10 4 0100
31 1F 1 11 11 5 0101
30 1E 1 11 10 6 0110
31 1F 1 11 11 7 0111
31 1F 1 11 11 8 1000
30 1E 1 11 10 9 1001
31 1F 1 11 11 10 1010
30 1E 1 11 10 11 1011
31 1F 1 11 11 12 1100


We have four bits in input and we need to "produce" just 4 "patterns" (two bits) in output. The boolean table old school (and semplification thereafter) will give us a solution (paper and pen here for a while...) (^ stands for an overbar on the following letter, i.e. for the logical not):


^X = A + B + ^C + D
^Y = ^A^D(B + C) + AD^B



That gave me the code:


g = 28 |
((((((((~m)>>2)&((~m)<<1)&((~m)>>1))^1) &
(((~m)^2)&((~m)>>3))) |
((m>>3)&((~m)>>2)&m&1))^3)&3);


I am rather dissatisfied with this: surely it can be optimised someway, e.g. by doing simple observations like: no need to handle higher bit in logic since it needs just an exception (when m is 2); try to work in "positive logic"; mix ideas from the previous solution with the "gate logic" stuff... Anyway, for now it is enough, I have to go and do better things: I have a button I must push.

2010-12-04

Bye bye TM250

Maybe 1 post per month is a very low activity for a personal blog, should I be more chatty? Ok, let's say so: car broken, a lot of money to make it work again but this is not hard to handle. The thing that is hard to handle is to accept that my laptop is old, and it started to stop working. Well, not exactly. It's hard to accept the fact that it indeed works still well, except for a small detail that makes it unworking most of the time. If the diagnosis is right, it is about the BIOS EPROM... CPU is ok. Hard disk is ok (good! no data loss for now), LCD works, fan spins, some keys is a little bit deaf but not too much, and I have an external usb keyboard... RAM is ok (I went from 256M to 1G, remember?), gfx card is ok... everything's ok, but that little piece, which decided to fool me (if the diagnosis is right, again).

It happened one day when instead of seeing the Acer logo screen, and then the Grub boot loader, a long annoying beep followed by two short beeps greeted me... What's up? A lot of BIOS says graphics card problems this way. But a deeper seek made me informed about the fact that 1L-2S is (should be!) a problem in the flash of the BIOS... Likely.

Somewhere I've found a solution: reflashing the ROM, twice or three times to be "safer", and everything should go... Not so for me (or this was not the problem...?) When luckly the PC started, after 20 or so attempt, I tried the procedure. It seemed ok for a while: turned off, restarted (after minutes), and it was ok. Hurrah... but then I waited longer and the problem showed itself again.

This time it took longer before a turn-on attempt was successful. Reflashed (using provided software) the Phoenix BIOS 1.15, ten times or so, but likely this won't solve... Now I am using the laptop, since I only rebooted between MS Windows and Ubuntu, never turned it off (yet...). And it's all ok. I know tomorrow it won't boot, I will try and try, and at some point I'll give up. And I will decide to buy a brand new laptop... In italian there's the slang local verb "rosicare"; it should exist a synonym I can then translate into english, but in this moment it comes not into my mind... it is the feeling you can taste e.g. when you are so close to a goal and then, suddenly, a small stupid detail goes wrong and you lose the goal.

My pre-euro-era laptop still works... except for that stupid detail that if fixed would make this piece of hardware last for many years still ... one year would be enough for me!

Useful post, isn't it?

2010-11-13

Drop a drop

Finally my badly over-tweaked "old" Mandriva's got wiped out by a fresh brand new Ubuntu 10.10. With the advent of a bit more of memory (from 256M to 1G!) I've decided to try again a full featured modern desktop and accept all the comforts of the defaults, though a little bit too much windowsish and even though accepting defaults is usually a bad idea. There will be time to tweak this system too and make it usable just by me...

In the meantime I was disappointed by a really annoying bug I've noticed while watching a flash video on YT. At first the guilty, to me, was the flash player... But soon I've discovered it was not. Submitted a bug report for "choppy audio", ... but recently I've realized it was not about it. The bug is bigger: the whole system is stuck "occasionally".

Today, I've timed the choppiness and surprise, it's very regular: one "stuck" every ten seconds. I've searched the net simply with "ubuntu stuck every ten seconds" and found this link very useful. Then I tried the nomodeset and woah now it is ok. On an Ubuntu wiki I've found how to disable permanently it by adding a file in /etc/modprobe.d containing simply an option for the i915 driver: options i915 modeset=0. Indeed I've not restarted the system yet... however, if the boot option worked, the permanent solution should too.

This is however still an annoying bug. I don't know if KMS is really good, but its problem with Intel chipset should be fixed.

Another thing I did on Mandriva was to kill the tslash from the key t, and put in its place the þ. I don't know way in the "latin" layout (type) for danish, norwegian (etc) the þ is where it should be, while otherwise it is bound graphically (instead of phonetically) to the p key.

After reading here and there about how to use xmodmap, and the .Xmodmap file in the home, I've decided to fix it my way, as I did in Mandriva: editing the file /usr/share/X11/xkb/symbols/latin (maybe on Mandriva was located elsewhere, I can't remember). Did it and now that I've mapped (this time using the provided configuration tool) the Menu key to the Multi key and the unuseful Windows key to Meta key, I feel better.

The next step will be to fix a bit Nautilus: I don't like the Windows-like behaviour when you double-click on files it "does not know". I was used to be able to open all files (known or not) with a text editor (Emacs usually) or an hex editor (ghex2), but it seems that I have to pass to the "windows-steps" to do it: if a filetype is unknown, pick a program from a list, remember the association (no, never!) and of course if the file is "known", the context menu will show the item to open it with the program that "understands" it. It can be ok, but I want also always the items to open it with an editor (text and hex). How do I do it?

From the packages point of view, I was able to install everything I needed; in particular I was worried about interpreters/compilers I had installed on Mandriva. Luckly I was able to install maybe all what I had, from C (of course!) to Algol 68 Genie, with apt-get or compiling it (MMIX, MMIXX and Algol 68G only, currently, while the rest was all found in .debs in the repo or on the net). Rebol being proprietary is still not installed, but I will; and also fasm (flat assembler) is missing (but of course there's nasm).

2010-10-09

My own thoughts about software development methodologies

work in progress draft
and it always will be


I wish to scatter some thoughts about the subject. There's no a particular order, deep analysis or claim for correctness.


  • Defining the project and its requirements


    • Do not let the developers define the requirements! Requirements come first, then it comes developing code (with feedback, if needed/required).


  • Documenting APIs (if appropriate)


    • Be brief and effective, do not add unuseful redundancy; it must be techinical, even when it is a "high level" API

    • Let the dirty work be done automagically: force programmers to document into the code, using a special, fixed syntax (well-known or created ad hoc), but no literate programming — maybe with exceptions to be defined

    • APIs' docs must not expose internals; if you need internals, put them in a separate document

    • References and links, if appropriate; no copy-pasting

    • If there's a protocol to access the API, describe it in a formal way (if it is a standard, just reference the standard document describing it, avoid giving your own incomplete simplistic description); if needed, train whoever has to understand the formal description, rather than sticking to a informal loosy description — which however is suitable as introduction and to give a general picture, if appropriate

    • Do not use a format which is not mechanically parseable and that it is hard to be generated automatically. The best choice is still plain text with an agreed encoding (e.g. UTF-8) and line ending convention (e.g. LF, the "unix" way to terminate lines); however, for documents generated by source code, ASCII is likely the safer choice. TeX-based or HTML-based conventions for formatting are good ideas.

    • Deploy the doc to the "rest of the world" in a read-only format, like PDF (which is likely also the best choice) — this implies the ability to generate the PDF from the "private" doc.

    • Avoid translations: if your doc must be read "internationally", do it directly in english; in Europe, if it would be widespread, esperanto could be a fine choice... but alas it is still english that it is better to use! If people working in the project can't write/read "technical english" (to be distinguished from "literary english" and real-world english), teach it to them!


  • Writing the code...


    • Conventions! Train programmers to use an agreed set of conventions (tabs, spaces, indenting, symbols' and variables' names, comments' style...)

    • Comment, but not too much; avoid "line by line" comments, prefer "logical block" comments

    • This is not always true but... more data structures, less code!

    • Break functionalities into as small as possible functions/objects (depending on programming paradigm in use)

    • Think functions as reusable bricks, leave room for enhancements without breaking existing working code (this is easier in OO programming, nonetheless it is not impossible in non-OO programming)

    • Refactor your code; possibly before it becomes a mess; even when this happens, refactor it anyway, it is not a waste of time/money (at least not in the long run), it is needed to keep your code manageable and...

    • Refactor before your code becomes unmanageable, and refactor/redesign if your code is already hardly manageable; this sometimes could be done in small steps while developing other features

    • Use well-known (in the community of your language) libraries, rather than your own implementation (if there are no licensing issues); but of course if you can't find what you need, take your time to implement it rather than sticking to other possible but poor solutions

    • Don't waste time for optimizations, if your bottle-necks are elsewhere! (Profile your code to see where it is slow, and optimize there!)

    • If your language misses a hashtable and you need to choose actions according to a "pattern" given as input, ... find a hashtable implementation/library rather than using a long list of if-then-elseif or a C-like switch. Match this suggestion with the optimization topic if appropriate; if not, consider that the hash-approach produces clea(n/r)er code, as "more data structures, less code" does! (In fact the specific hashtable example is about following the suggestion "more data structures, less code", avoiding linear search.)

    • The previous reasoning on the hashtable is applicable to other features too, with the appropriate changes

    • Avoid copy-pasting; sometimes it is not so bad, but if you do it or need it too often, it means your code is badly designed/implemented... time for refactoring!

    • Test functionalities apart; e.g. write "toy code" and check its output given a specific input

    • Do not use so called Hungarian Notation; if to you it's an issue, use a good IDE to help you!

    • Use meaningful names for symbols/variables; but iterations in language having no special syntax to iterate over a "collection" (or a slice of it) can (should!) use short common names for indexes (like i j k n m).


2010-09-12

The "basic" thing

For a project with friends of mine I was requested to write a C# class. Ok, it is a good opportunity to learn in practice some bits of that language, I thought. But at the end what happened was that large part of the C# code has to be generated automatically by a VBA script (I wanted to do it in Perl, but since the code must be generated from MS Excel data and formulae, VBA was suggested as a easier and more microsoft-friendly solution), and so currently I've written more VBA code than C# code!

At the very beginning I was almost disgusted, but now I've realized that after all it was funny. It remembered me ancient days of "basic" structured programming, and having limitations that push you to find "workarounds" is funny too — though scaring if you have milestones!

I would like to give more time to C# enhancing the google video downloader, which is still a horrorful immature crap (though working for what it's worth); nonetheless now I can add VBA to the list of languages I've frequented someway... just another brick in the wall.

2010-07-20

Syntax and efficiency in C

Time for a brand new post.

In my last job interview I was asked to write a function to copy a string. Thinking not about compactness, I wrote on the fly the following code


void strcpy(const char *s, char *d)
{
while( *s != '\0' ) {
*d = *s; s++; d++;
}
*d = '\0';
}


That of course works, but it is longer than the need (and by the way, it is not compliant with what standard says about real strcpy, but this is not interesting since making a standard-compliant function was not the point of the question).

As noticed by the person making the interview, such a copy can be written in one line. When I've finished the interview and went back with the mind to what I've said and what I could have said, I realized that I could have defended my on-the-fly implementation: it is clearer (isn't it?), it has the same efficiency of the "compact" version, being this one possible just because of the great flexibility of the C syntax... which is, however, a boomerang if abused.

Second thought. The "long" code is clearer. Maybe (usually, not using C compactness ability makes code clearer; but likely in this particular case the one-line version is even clearer). But, my doubt is now: is it really the same with respect to efficiency?

Let us first see how to "convert" the long version to the short one. First, post-increments can be done "in place".


*d++ = *s++;


Then, assignment can be put into the while:


while( (*d++ = *s++) != '\0' ) ;


And here's the trap. We don't need the final "terminator-missing fix". Apparently, the short version looks also more efficient. But if we unroll the loops, we have the same amount of assignments, of course. Moreover, the short version assign the terminator reading it from the source; this gives two memory access (one for reading from s, one for writing into d).

On the other hand, the long version, does not need the extra reading access to memory and can be optimized on many processor with a single instruction writing a byte constant (read at the instruction fetch time).

So, it seems that is is the opposite: long version is (slightly) more efficient!

Of course modern compilers' optimization can cancel this apparent advantage. But let's look what gcc does with the basic "implicit" optimization level.

The "short version" (S) x86 code (with added comments):


mstrcpy:
pushl %ebp # prolog
movl %esp, %ebp
.L2:
movl 8(%ebp), %eax # get s in eax M
movzbl (%eax), %edx # *s in edx (dl) M
movl 12(%ebp), %eax # get d in eax M
movb %dl, (%eax) # *d = *s (dl) M
movl 12(%ebp), %eax # M
movzbl (%eax), %eax # take char*d in eax M
testb %al, %al # I
setne %al # is not 0? in the same al I
addl $1, 12(%ebp) # these are the post M
addl $1, 8(%ebp) # increments of d and s M
testb %al, %al # check al I
jne .L2 # loop if not 0 ('\0') JT
popl %ebp # epilog
ret


The "long version" (L) x86 code (with added comments)


mstrcpy2:
pushl %ebp # prolog
movl %esp, %ebp
jmp .L5 # uncond. jump
.L6:
movl 8(%ebp), %eax # get s in eax M
movzbl (%eax), %edx # *s in edx (dl) M
movl 12(%ebp), %eax # get d in eax M
movb %dl, (%eax) # *d = *s M
addl $1, 12(%ebp) # d++ M
addl $1, 8(%ebp) # s++ M
.L5:
movl 8(%ebp), %eax # get s M
movzbl (%eax), %eax # *s in eax (al) M
testb %al, %al # I
jne .L6 # exec body if not 0 JT
movl 12(%ebp), %eax # get d M
movb $0, (%eax) # *d = '\0' M
popl %ebp # epilog
ret


The M marks instructions that do memory access; let's count only the in-loop instructions first. S has 8 memory access instrs in loops; L has 8 memory access instrs in loop but S has 3 "register" instrs in loop, while L has none. So for S we have 8+3 in loop, for L only 8.

In S, when the '\0' is reached, the whole 8+3 instruction are performed; then the jump is not taken (JT stands for Jump Taken most of the times). In L, when the '\0' is found, (2+2)+1 instructions are executed (2 M after the .L5 label, and 2 M after the conditional jump). In fact S performs all the "actions" of the C code (post-increments inclusive) in the while; but L does only the test on *s and the final termination of d. (This means also that if we access s and d in the S version, they point one char beyond '\0', while in the L version we do not increment them, since we exit the loop, so s and d will point to the '\0').

On modern processors it is hard to say which is really more efficient. Likely, the difference in performance is negligible. Moreover, optimizations done by the compilers may level down the difference, if any.

It is however worth noting that L version does not produce a so longer code (considering also loop unrolling), and it could be even more efficient than S version!

2010-06-17

Back to hardware

When I was very little I was used to copy a lot of circuits from electronics magazines my father bought, and also from his own diagrams. To me they meant nothing at all, but I found them beautiful and I sensed that there was a meaning in all those symbols. Growing, strangely my interests shifted more towards software, even after I studied those symbols and assigned their proper meanings. At some point in time, I can't remember exactly where, I thought it would have been cool to be able to create a computer; I focused this idea when I studied more deeply digital electronics, but indeed I never planned to realize it for real.

Now it is almost time to get back the idea... It is renewed as something a little bit different: since to create the hardware for real is pragmatically hard (maybe FPGA could do, but...), what if I create a software simulation? The simulation "models" the digital components that "are" the microprocessor, properly interconnected.

Excited by this approach, that seemed possible and not so hard, I've created a new task for Rosetta Code, Four bit adder. The C "model" seemed to me to work greatly, I thought "It can be done", but the task is just a very short step towards the final aim, proved to be not so well written, and the code showed some limits this morning, when I've taken a look at how I could implement a latch using basicall the very same "ideas" already used in the four bit adder.

A little bit demoralized, I began searching the net for similar stuffs... And of course I've found very interesting things, and more "realistic" and done by people more gifted for hardware — I used SPICE once upon a time, but indeed never ever gone into hardware projecting for real, universitary laboratory apart, where however we did not too much complex circuits, in fact we have not done any ambitious project at all; and I know of the existance of languages like VHDL and Verilog, but never used them (I started to take a look at them since today).

I am swinging between two position. In one position, I want just a "simulator" realized in a known language like C (even though OO languages in this case could make things easier). In the other position, I am taking into account the hypothesis of using VHDL for real... and this second position started to exist when I've installed FreeHDL, which has a VHDL to C++ "translator"!!

So even my idea of writing the simulator in code is not so new... Well, the original part could be that I would like to realize an emulator for the processor, using as core the finished simulation. In practice, I would like to run something like myemul ROM where ROM is a file that provides the ROM, and myemul does not realize the emulation as Bochs or QEMU (which have performance needs), but realizes it at a low-level, through logic gates emulation...

Ok, too much for now. I've installed FreeHDL, gEDA, GtkWave and also ngspice. I think I have everything I can have for free (and mostly free, too). But I still does not know VHDL and Verilog, and I have not the slightest idea from where to begin. Luckly, the net is full of informations and experiments, like The Harp Project.

Note: discussing several arguments on Rosetta Code about the task (and a replacement for its description, since the current does not satisfy me) I've discovered that there could be a certain interest in having Brainfuck code running in gate-logic; maybe it would be not a bad idea to start projecting the processor so that it can "host" BF easily.

2010-06-10

Q/A sites

Question/Answer sites are always the same. I mean, there they happen always the same kind of things that can be summarized with few words: biased subjectivity, misunderstandings, personal battles using the voting system. I am not saying I don't fall in those traps too, except for the last one and the first one(!): I do not use the voting or flagging system to fight my personal battles or to "destroy" an answer that does not convince me, but which I recognize as having its own "rightness". Instead, I use often the possibility of commenting (our freedom of speech).

My english is far from being perfect, perfectly unambiguous and grammatically correct, maybe sometimes the hurry makes it even worse...; nonetheless it seems to me it is not so bad. But it seems to me sometimes I get misunderstood because of it.

Recently I am losing a little bit of my time on StackOverflow. Appearing as a site for technical questions, it seems free from all those absolute crap often existing in many low level Q/A sites (I mean question like: where I can get The Program registration key, how can I cancel a file, how can I program a word-processor like Commercial-Wordprocessor in few lines of code, ...), and in fact it is.

But as said, it is run by people, who are defective by design and so the typical aspects of social interactions can arise, altogether with all kind of possible misunderstandings and arbitrariness.

I have a good example where of course I am the protagonist.

The StackOverflow (SO) question is here. My original answer was downvoted by someone (later the downvoter declared himself and it was the same asker), and this was disappointing to me since another answer, containing basically the same kind of informations/proposals, was voted up (I don't know by who).

I loose time, ok, my choice, but I am not prepared to loose time for an answer that is not appreciated and moreover I can't understand why, since compared to others (voted up) it appears of the same quality. So I wrote a comment, knowing no the voter nor understanding the reason.

The comment was:

can I vote myself up? I find this -1 really stupid, expecially since basically I am saying the same thing of an answer that's got +1 and say no word at all about refactorying the if with a subroutines. I've already said that: no matter the techinicalities, Q/A sites are frequented by the same kind of voters.


and I started writing it before the asker comment (As stated, CAN'T CHANGE RULE) appeared. The long "dialog" that follows in the comments came from two facts: my distraction, and their distraction. As usual I am not interested in pointless debates, I just try to understand.

My recognized distraction was that he stated already he could not modify Rule class, while unluckly I focused my "defense" on the fact that he did not, instead of focusing the attention on the fact that, no matter if he specified it or no, my answer contained already a possible solution that fit the requirement, exactly as another answer. Indeed, better than another answer, the one that was upvoted anyway!!

The original asker felt uncomfortable with my comments and decided to remove the downvote (since it was causing "issues"), later pointing out however that the requirement my answer did not talk about was in the original post. Unluckly I noticed I proposed an answer also for that requirement, but not focused it in time, and another thing that happens on Q/A sites is the lost of attention where things get complicated or too long to read.

As the following Original Poster comment shows, he focused on the "first" part of my answer, which talked about changing Rule class (exactly how another answer did, but was upvoted... again)

My downvote is because that your answer ignores my question. Yes, refactoring the Rule class to use an int would be useful, and in fact, I could keep the "else if(flag == "EXCLUDE") { included = false" given your example. So it's a very good answer to a completely different question


The fact is that my original answer contained already at the beginning the following excerpt:

Now if you want the whole [CODE] as a subroutine, you can make it return boolean value and check for it. The caller "decides" if break/return or not. Of course the caller must pass i also (if looping over i for get_op changed as described before).


This shows that indeed I've not totally ignored the requirement. My if looping over i for get_op changed a described before means exactly that: if (i.e. you are not required to) you change get_op as described, you have to pass i. If you do or not, no matter: you can turn the [CODE] into a subroutine returning true or false, and according to this returned value, the caller decides if to return directly or continue looping.

Moreover the OPer edited the question and added a code that basically do what I suggested, wanting as arguments get_opN and get_niN instead of N to allow the subroutine to pick the right get_op(N) and get_ni(N), but apart this detail (due to the requirement) the code is what one programmer can take from the suggestion I quoted before.

While another downvoter chimed in my answer, another up vote was given to the answer that contained same suggestion of mine (and no citation at all to the subroutine possibility); the author of this answer "defended" me some way, thanks. But of course my critics are not against him, but the illogical incoherence of the voters...

The brand new downvoter wanted also to explain its -1. And someone put +1 to its comment.

@Shin: I think the OP's downvote was fully justified. You wrote an answer entirely based on an assumption that the question clearly specifies is invalid. That means your answer doesn't solve the problem.


This confirms that jalf (the nick of the previous commentator) coherently should have downvoted another answer too (the one similar to the mine), but he did not. He downvoted mine instead, because of comments. Talking about what it can be just or not, it is a subjective thing. But also an imperfect "justice" needs to appear coherent: if A-by-X is wrong, it must be wrong also A-by-Y. An answer can't deserve a downvote where another answer, saying the same thing, deserve an upvote. A coherent "just" person can't vote down an answer A-by-X and let untouched A-by-Y (and A-by-Z and so on...). If he does, the only explanation brings us to the fight personal battles using the voting system, which is unfair. In this case, my "complainings" likely was what have driven jalf to the downvote. He did not evaluate the answer deeply, otherwise he wouldn't have used fully and entirely, and he would have noticed that I talk about subroutine with an if that allows for the other explicitly missing case. At least another answer doesn't solve the problem (here of course it should have been "does not try to solve the problem", or I should see that jalf has a huge amount of downvotes!), but is ignored, meaning this is not the real point.

Unluckly when I read that biased comment I have not realized what's wrong in all the "defense line" (i.e. the fact that indeed I treated also the case matching the requirement)... and when I did, it was too late: attention focus was lost and likely jalf won't realize and recognize his error, and this is normally what makes me put an user on my personal black list (in the sense I have to watch out when they are around, to avoid wasting time and get also downvoted unjustly).

Talking about incoherence, it could seem incoherent the fact that I am focusing on jalf and not the asker. It is not because he retired the downvote (I did not pretend him to do so, but to explain, at least why he did not downvote similar answer); it is because he interfered in the dialog like if he was the Knight of Justice (Restoring the Justice, i.e. the just downvote by the asker), unaware of his incoherence and moreover without reading carefully what was already written. People like that rarely come back and re-ponder, it would be wasting too much time on a single question, or even worse: on comments about an answer to a question.

I am not saying jalf is a bad guy to me of course. I am saying he bahaved the way I don't like people to behave on Q/A sites. Commenting is fine. Voting up or down should be more pondered on the current answer, not biased, and coherent. Since these people could make my "wasted time" less pleasant, I mark them for my black list.

The irony is that the asker mixing up answers or by himself produced a code that is already an answer, and that is contained, as idea, already in my criticized and downvoted answer (and not in the one similar to mine and upvoted)... so that my last vote mixed everything and proposed a solution, which fits fully the requirement. But jalf has not wasted time checking for updates, and no other readers have (currently) upvoted me, except who did it at the beginning (and gave the upvote purposely because of the part of the answer that does not meet the requirement, which is a poor bad requirement after all!)... while as said the other answer, talking no about refactoring in subroutine at all, has 2 votes!

2010-05-27

About a gfortran bug

Few weeks ago I have marked as incorrect an implementation of a task on RosettaCode in Fortran. It has struck me since it appeared elegant and short, so I wanted to run the code and... alas obtained a wrong result (AAAA). So without further analysing the code I have marked it as incorrect.

Yesterday I was looking for recent activities on RosettaCode and task to work on, when noticed that my "mark" was removed and in the talk page they've shown a gfortran bug... Unbelievable, I use gfortran to ru my computation, such a bug, as interpreted there, was horrorful!
(Even though recent gfortran versions fixed it).

Further investigation has shown that the bug is more subtle but less horrorful; first let us see how the bug manifests itself with the guilty fragment of code.


character(2), dimension(2), parameter :: list = (/ 'ab', 'ba' /)
integer :: i

i = 1
write(*, *) list(:)(1:1) == list(1)(1:1) ! ref. (A)
write(*, *) list(:)(1:1) == list(1)(i:i) ! ref. (B)


We expect the same output, but we get something different! So at a glance the bug is in how gfortran compute expression with variables in slices, like if it does not grab the right value of i.

The problem is instead another. When constant slices of a known (at compile time) parameter array are requested, the slices are indeed computed at compile time. So that where there's no expression to be evaluated (even the simple i), the result is computed at compile time and no bug is shown.

Compiling with -S option allows us to study assembly output (on x86 machine):


.section .rodata
.align 4

.type A.2.545, @object
.size A.2.545, 8
A.2.545:
.long 1
.long 0
.align 4


This is a piece of the readonly data section produced compiling a program containing only the line (A). What are those .long 1 and .long 0. Without breaking my head into debugging for real the code, I've suspected they are the result computed at compile time (True, False) and to verify it I simply changed 1 to 0, assembled/linked and run the code to obtain F F as output; if I change it into 1 1, I obtain T T, quod erat demonstrandum.

So the bug does not manifest itself since it is a runtime bug (maybe; we shall see it is not exactly so). At compile time, slicing is done correcly and so the comparing.

Let us see the code produced when only line (B) is in use:


.Ltext0:
.section .rodata
.LC0:
.ascii "a"
.LC1:
.ascii "b"
.align 4

.type A.2.545, @object
.size A.2.545, 8
A.2.545:
.long .LC0
.long .LC1
.align 4


Now the result (True and False values, i.e. 1 and 0) is missing since list(1)(i:i) must be computed at runtime. Nonetheless, list(:)(1:1) can still be computed at compile time and in fact it is: label A.2.545 point to pointers to two strings, and the "object" represents the array (/ "a", "b" /) which is the result of evaluation of list(:)(1:1).

Now list(1)(i:i) must be computed at runtime and the comparison must be carried at runtime, in fact later we find the fragment:

 
movl A.2.545(,%eax,4), %eax
movl %edx, 12(%esp)
movl $1, 8(%esp)
movl %eax, 4(%esp)
movl $2, (%esp)
call _gfortran_compare_string


which is missing from the case (A) (since de facto the comparison is done all at compiletime, and only the output routine appears). The problem is somewhere here or there... Again, since I am lazy, before going deep to debug the code let me see if there's a simpler way to grab a spark of enlightening.

Analysis I've done before and that you can find on talk page, made me think that the problem could be related someway with how Fortran handles "strings". I say, it is not so good at:D! They are indeed characters of given length and this is fixed. Let's see some code to understand better:

  character(2)  :: a
character(3) :: b
character :: c

c = 'a'
write(*,*) len(a)
b = 'cd'
write(*,*) len(b)
a = 'b'
write(*,*) len(c)
write(*, "(A,'|',A,'|',A,'|')") a, b, c


The output of this code (reformatted) is

2
3
1
b |cd |a|


And it shows that in Fortran the length of the "string" is not determined by the real "string" we put in but by the size of the container. Moreover, unused room is padded with spaces (we need trim to remove the padding spaces). Nonetheless, comparing functions seem to consider this oddity and

'a' == 'a    '


is True! And also


a = 'a '
b = 'a '
write(*,*) a == b


gives of course True, which is odd (indeed, considered wrong!) in any other language I know. So comparing function seems to ignore trailing spaces. But it sometimes could make some confusion and how trailing spaces are really ignored? With this awareness/doubt in mind, I modified


.LC0:
.ascii "a"
.LC1:
.ascii "b"
.align 4


into


.LC0:
.ascii "a "
.LC1:
.ascii "b "
.align 4


Then I "assembled and linked" (sounds cool, indeed I've just written gfortran modifiedcode.s!), run the code and BINGO the right output appears!! So now we think that the problem is at compiletime, since a trailing space is missing... But wait we've said trailing spaces are ignored so the comparing function should not consider them for real! Counterexample: let us add two spaces instead of one, to have "a ", or more, to have "a "... The right result does not change!1

So, it seems at least one space must be present. Is this a bug of the comparing function, or a bug in the code producing the static strings? Remember that those strings are the representation of the array (/ "a", "b" /) obtained with list(:)(1:1). Since list is an array of character(2), we could expect that the slice inherits the char(2) size. Maybe. But how it should be implemented in code? Maybe padding with spaces for real... But if this a requirement, then it means the comparing function is not buggy. If it is not a requirement and the rank of the "content" of a volatile object like list(:)(1:1) needs no to be inherited, then the string comparing function seems buggy... But why then it works when used "directly", in code as 'a' == 'a '?

Something is missing still.

Let us do another test. Let us "copy" the object list(:)(1:1) in another array, i.e. add this code


character(1), dimension(2) :: t

t = list(:)(1:1)
print *, t == list(1)(i:i)


Now the result is correct!! So the problem as stated before is not in the usage of an expression (simply i:i in this case) but it is subtly bound to how object are evaluated at compiletime and stored in the produced code. Copying in a non "static" array fix the problem!

So another way to fix the problem should be to avoid that list(:)(1:1) is computed at compiletime and (we're supposing currently) stored "badly" in a way that confund the comparing function... I.e. we use list(:)(i:i) too... And in fact, doing so, fix the problem too! We're nearer.

The "static" word used before made me think about the parameter attribute... What if we remove it... Let's try... and wow, without it, everything is fine, always!! So the bug is in how parameter object are handled. At this point I am curious to know how the bug was fixed in recent version of gfortran!

But first I take a look at what is changing when copying list(:)(1:1) in a "dimension" made of character(N). (Changing N from 1 to whatever, does not change the output, which is always right).
When doing so, in the readonly data section we find only the representation of the initial array.


.type list.544, @object
.size list.544, 4
list.544:
.ascii "ab"
.ascii "ba"


The rest is done at runtime (slicing, copying and so on). And so, everything works.

Now let us see what changes when the parameter attribute is removed. What can be noticed is that now everything is done at runtime, always. It is like that: parameter object are not supposed to change, so things done with them, when possible, can be computed at compile time (if there's an expression that must be computed, then the operations on a "parameter" array are done "deferred" at runtime). If no parameter attribute is given, the object may change at runtime, so the compiler does not try to do anything at compiletime.

So now we are more convinced that the bug is in the part of gfortran dealing with compile time "optimization" (for "immutable" objects). To let things work, we can remove the parameter attribute, or force the compiler to operate at runtime, e.g. using an expression that must be evaluated at runtime... Or we must be sure everything is computed at compiletime... Basically, if we mix things evaluated at compile time with things that must be evaluated at runtime, the bug shows up.

Note 1: look better at the code calling the comparing function... we see it appears a 2... hmm, what if this is the length of the string to be compared? Let's change


movl $2, (%esp)
call _gfortran_compare_string


into


movl $1, (%esp)
call _gfortran_compare_string


keeping "a" and "b" without spaces... Let's assembly and run and... bingo again. The fortran compare string function gets the length of the string to be compared as argument... So the bug is in here, since it should be $1 instead of $2... or in the static data... or ... hmmm

2010-05-24

The YT case (part one)

It is a problem that goes and returns back periodically: how can I download YouTube videos? It's not a problem that has a forever solution since YouTube changes things. Maybe it does so also to make it harder to download videos: we must "pass" through them, so they can control contents better (DRM!) and earn through traffic and data we produce using their service... Even supposing we're on the dark side of believing their shameless lies (rather than on the bright side of thinking they are just taming us turning our bodies in batteries that produce electro-money only for them), we could wish to download a video the author released after some permissive license.

Searching for already-made mass solutions we are just caught in the ads-worlds. So we seek a little bit differently, not too much, and we find interesting sites giving real solutions. I've started my alone researches but dropped them since I've found these working solutions (this is one of the cases you're happy to find people smarter and more efficient than you). Nonetheless I will write some of my researches here, they could be of interest for someone now or in the future.

But first let's see working solutions I've tried (my video target was always the same but I've no reasons why to think they should not work with other videos). Not everybody is able to use these solution and because of this I am working on a C# porting of one of these. Hope I will finish it (I am not a C# programmer, but it's time I start to taste it a bit...)


  • GAWK solution. This was the second found code I've tried and the first to work. Unluckly perl scripts (one-liners) by the same author (see his other post) failed. I suspect it could be only because of the User-Agent, but I've not done tests yet. If it would be so, there's an easy fix.

  • Youtube-dl.py; this one looks cool, it seems to support several sites (despite its name), it looks well-written, and the -g option can be specified if one wants to use his/her custom downloader: I've tried wget (without spoofing) and worked!



And now, the part no-one is interested in: my researches. Read at your own risk (if you can waste your time I suggest studying the code of the gawk or python solutions, rather than reading what follows). If you want to read it, consider it as a muddle of scattered thoughts; possible audience maybe should be a little bit computer literate.

Discontinued analysis/study of the YT case


Request URL is simply /watch?v=ID where ID is a video_id identifying the video. This brings us to /v/ID, through the browser it sends back a compressed swf file. Disassembling the file with flasm we see defining a set of variables; one seems to hold the URL of the skin of the player (another swf file at http://s.ytimg.com/yt/swf/cps-vfl165272.swf at least in this case); video_id set to the same value of ID; a variable sk holds a key, it changes every time I download the file. They may appear other variables "mirroring" URL "parameters". Searching it seems like the work this swf file does was indeed done by a simple html file in ancient ages...

At some point this swf contains code that seems to construct a URL, so I followed it a bit and wrote the following pieces. Not so interesting after all :( A a little bit more readable form of the flasm-flavoured flash disassembled code is

main = function ('clip') ( ... )
{
loadClip = createEmptyMovieClip ...........
r:2 = new MovieClipLoader .................
clip.addCallback .......

. (nothing interesting here...)
.
.

r:3 = clip.swf
i.e. "http://s.ytimg.com/yt/swf/cps-vfl165272.swf"
1 2 3 4
0123456789012345678901234567890123456789012

r:4 = clip.swf.split("/")[2]
gets the domain ("s.ytimg.com")

r:4 == "s.ytimg.com"
branchIfTrue label5

i.e. is the domain s.ytimg.com? Yes, in this case, so
branch

.
. (see CASE B code)
.
label5:
loadClip.loadClip(r:3, r:2)

where r:3 is clip.swf, see above
and r:2 is an instance of MovieClipLoader

}


In case the domain is not s.ytimg.com, it builds an URL; this is not the case but it could be interesting.

* CASE B *

r:5 = clip.swf.indexOf("-vfl")
r:5 is 29 in my case
r:6 = clip.swf.indexOf(".swf")
r:6 is 39 in my case
r:7 = clip.swf.indexOf("/swf/") + 5
r:7 is 21 + 5 (index pointing to cps- or whatever
comes after /swf/ part

r:8 = "cps"
if not (r:5 > -1) then
// -vfl not found in the string clip.swf
r:8 = clip.swf.substring(r:7, r:6)
i.e. everything after /swf/, less the three letters ext
end if

r:9 = loadClip._url.split("/")[2]
r:3 = "http://" + r:9 + "/swf/" + r:8 + ".swf"
i.e. builds
http://domain of _url/swf/thing.swf


The interesting part seems to be when the domain is not s.ytimg.com. But it appears the _url variable of clip, which is not set nowhere here... in this case at least. The domain matchs so no need to have _url set, but I wonder when it does not match. I suspect indeed this one is the wrong swf to look for. Maybe URL parameters may change things and this very same code serves other "purposes" too. Interesting to note that this flash says

System.security.allowDomain("*")


So theoretially this swf is usable also externally; this is obvious thinking about embedding. The set of variables the code assigns is:

// the "cover" of the video
iurl = 'http://i4.ytimg.com/vi/ID/hqdefault.jpg'
el = 'embedded' // embedded where? in the default flash player?
fs = '1' // full screen
title = '...'
avg_rating = '4.7547...' // wow how many digits for the rating!
video_id = 'ID'
length_seconds = '..' // number for its length
allow_embed = '1' // interesting...
swf = 'http://s.ytimg.com/yt/swf/cps-vfl165272.swf'
// Security Key? ?
sk = 'TK755pvEYU-oGqmzRTwz7fq1dipYreRnC' // or alike
rel = '1'
cr = 'US'
eurl = ''


I am wondering what happens if allow_embed is 0 and I modify it into 1 and use this as "embedding" trampoline.

In the html page of the video (the one we get with /watch?v=ID there are alternate addresses, serving informations in JSON+OMBED or XML+OEMBED, oembed, flying around these we can find an address to use with the RTS Protocol, tried this road, mplayer understand the protocol, but it seems the Google RTSP Server does not like too much it and stop the connection. (SDP used too).

Once upon a time it existed a so called get_video API, it seems to work still but it is different the way we can get the needed parameters (see youtube-dl.py with -g option), which are different too. In the URL given by youtube-dl.py appear video_id (which is ID), t (token... could it be sk? They are not the same, it seems), eurl (null...), el (detailpage), ps (default), gl (US), hl (en); some appears in the analysed swf too. But the most important is for sure the token (t).

Discontinued. Youtube-dl.py works, I'll look how it acts and write something runnable on Windows machine by people not interested in installing python on their system (bad very bad).

2010-05-16

If you know it, you can...

I confess it: I often creep in forums and "ask your question" sites, sometimes I add my answers to solve a problem or manifest an idea (and often I think it's the best answers of course). Mostly it's boring but it is also a cheap pastime and I find it relaxing.

Few hours ago someone asked help for the following problem: I have a JPG file N kbytes long, and I want to make it M kbytes long (M > N). A typical trait of these questions is an almost total inability of the asker to express what they want in a non-ambiguous way.

The already written answers talked about increasing JPG quality and/or resizing image's width and height; these actions have as consequence to make the file bigger.

The common answerer focused on the fact that the file was an image (that can be resized) in a specific format the "quality" can be set of. Since the asker talked about file sizes, I focused on the file instead and swifly thought of cat and dd: dd can be used to "cut" a stream at the desired size, while cat allows the creation of a stream bigger than M. Since the simplest way to make a file bigger is to pad it with zeros, I thought of /dev/zero too, and wrote the line

cat file.jpg /dev/zero |
dd of=modified.jpg bs=1 count=800k


if we want the final file to be 800k bytes (800*1024) long. Neat and easy, isn't it? At this point someone can ask: but this way the JPG gets corrupted while the other answers have the merit not to waste it! And at this point it is useful a little bit of knowledge about JPG files: you can append data to a well-terminated JPG file without problem, since decoders stop to a marker which says the JPG file is finished. If you put data after the stop-marker, you do no harm.

The story taught me that a little bit of knowledge about file formats and classical "*nix" tools (in their GNU versions in my case) can solve a lot of problems easily, and that who does not know these things have as only option the use of softwares that do the job as needed; if there are not such softwares then you've not bricks you can build the solution with. If you don't have a bigger picture of what computers are for, your only idea will be to load you preferred image manipulation software and try to match the file size changing "parameters", and maybe you'll get it after several trial-error cycles... only to notice that now it looks different and you don't want it to look so different!!

So the suggestion is read read read read and learn learn learn learn (and install GNU coreutils:D)

And now a brief explanation of what the line does. cat A B concatenate two streams (e.g. two files) one after the other. The /dev/zero can be considered as a "virtual" infinite file made of all zero bytes. For this reason we need dd to select the right amount of bytes from the stream, copying to "output file" only the quantity we want (see e.g. this Wikipedia link about dd).

2010-05-10

PDF (Portable Document Format)

Few days ago I've found PDFs of interesting books (basically the same), old (and marked obsolete!) editions: Numerical Recipes in C, Numerical Recipes in Fortran 77, Numerical Recipes in Fortran 90, obsoleted by Numerical Recipes in C++ (not downloadable).

These PDFs are encrypted through a hateable system, handled by the FileOpen plugin (they have a site and purchase their system to "protect" PDF). You have also to install the plugin, and the only usable viewer will be the one that can use the plugin... as far as I know, only the original Adobe Reader (and related products).

I understand, but I am absolutely adverse to, the wrong idea behind DRM (basically it's what FileOpen is about). The dangers in that idea can be discussed longly, and we also can remember this funny comics about DRM in music. But this is not the major point for FileOpen for these PDFs.

The problem here is that in order to do its job, the plugin needs to connect to somewhere, and this means that a working connection is always needed to read those PDFs! Thus the P in PDF becomes untrue, since I can't "port" those files everywhere: if there's no a working internet connection, those files become nothing but a bunch of unusable data.

Some friends of mine dream about the future: computers are terminals, the operating system being just a life support for a browser, and "in" it you do everything (cloud computing for the whole set of what a computer could be and do). This would work until there's a (fast!) connection everywhere. Otherwise the "terminal" would become really unuseful hardware only.

I hate this idea (cloud computing software as services for everything and for masses; it can't be "instead of", while it's ok to be "altogether with", according to "needs" and possibilities). It creates a dependency which is not desiderable at all. It is basicaly a threat for freedom of usage. I want to do the things I like to do with a computer without the need of an internet connection. There are things (like this blog) which implies the presence of internet, ok. But if I don't want to connect, I can, and I still can use the computer as an important tool or as recreational pastime.

What is the solution? (There's no solution since we are not the one who will decide). ... But I've removed the FileOpen plugin and deleted those PDFs (unreadable data without the plugin), after all I was just curious, I can wrote algorithms by myself and I can find freely accessible literature which does not depend on the net to be read ("download once, own forever" and public libraries).

By the way, RE or so on FileOpen was done (issues about avoiding connection, key expirations... exist). But it would mean to give too much importance to them; rather, we should simply reject every FileOpen-protected ebook/PDF, in favor of freely accessible one. So they would be forced to drop that silly technology and use instead a more user-friendly (internet-independent) solution to their problems ("protecting" their stuffs).

2010-05-06

Another lazy one-liner

Today I was back on twitter (guess, shintakezou) and found among people I am following an interesting link. I wanted to download almost everything and lazyly built the following line:

lynx http://address -dump |
egrep -v thumb |
egrep "[0-9]+\. http:" |
egrep "mp3|jpg" |sed -e 's/[0-9]\+\.//;s/^ \+//' |
xargs -n 1 wget


I am almost sure there's a better way, but I've just finished eating.

Since people may think this is mainly a p0rn scraping line (mixed with few good musical files), I claim it is not even though it could be used for that purpose too. In that address I've found many Bilibin images — I know this artist only because his paintings are often used on Fravia's site (R.I.P.) — and a mp3 file I am going to listen, the title is Nuclear weapons are morally infensible.

Flash must die!!

I've read this and felt better: I simply hate Flash from the very beginning. Moreover current GNU/Linux version I am using is resource hungry and sometimes crashs. I don't know if all the good things said in the article are true (on the Apple part), but I would really like to delete definitively Flash and Flash plugins from my system. Hope I can do it soon!

Post scriptum: I love PostScript and like PDF, so I also hope Adobe won't be destroyed by loosing their poor Flash technology market!

2010-04-26

Languages, OOP, Sather et al.

I used to dislike OO programming paradigm. I suppose the main reasons were that I started programming with imperative languages and loved so much mc68k assembly, and that people trying to convince me about the advantages of OOP failed —often talking about the wrong feature in the wrong way (e.g. magnifying functions overloading... which is not a OOP-only thing, e.g. Fortran 95 and later has it!)

Moreover, because of reasons I don't know, C++ does not light my enthusiasm. And so it is for Java (recently I've deleted Google Android SDK... tried to get interest into J2ME for a brand new phone I now own, but it is always... well... Java). I changed a little bit my mind about OOP when I first came in contact with Objective-C. I feel it is the right language that exhibits the right amount of OO features; OO concepts are insinuated in the "C core" better than C++, and Objective-C can at least one thing that currently C++ (AFAIK) can't: extending classes we have not the code of.

Objective-C takes something from Smalltalk, and this language too can extend a class without touching its code. I like Smalltalk too, and luckly GNU Smalltalk can be used as an interpreter, allowing file-based programming. Smalltalk is more "coherent", I mean: everything is really an object. It is not so for Objective-C and C++, which inherited (!) from C primitive types that are not objects. However C is still my preferrend imperative language (when I don't want to stick to Fortran and interpreted languages like Perl are inappropriate), so Objective-C win against Smalltalk, that, by the way, is also not so efficient (read: it is slow).

Got over the primitive "hate" for OOP and OO languages thanks to Objective-C and Smalltalk, there are other details that can make me dislike a programming language. E.g. I don't like that "white spaces" have syntactical meaning. Thus I could like Python (and Haskell), but indeed I ended to dislike them. Another detail is when upper-case or lower-case letters have syntactical meaning too. This detail would rule out other languages (even though I admit they can be cool and powerful).

But recently I decided to take a look at Sather. I wanted just to taste it. Sather uses all upper-case letters for classes, but other OO peculiarities are again more elegant than how C++ or Java take it. The syntax looks to me clear, I like how iterations are performed (I've seen something similar in Ruby), and ... we can create superclasses.

Creating a superclass could mean we can add "behaviours" to a "middle" class. Let us suppose we want the INT class respond to a factorial method (if not already defined into INT). We can create a superclass of INT holding the method. When it is called, the INT class, having no that method, "passes" it to the superclass... where our new method is defined. The result is that we can write

4.factorial


to obtain the factorial of 4, where 4 is an instance of INT which alone has no factorial method. It sounds good, but it does not work. I've tried to understand what's going wrong, but failed. Sather language documentation is spartan, lacks details (to know builtin methods of standard library I had to take a look at the code!). The tutorial uses superclassing, subclassing and abstract classes for typebounds (in generics), even though a sentece let me hope it is possible to use superclasses to obtain what I want. Likely I've not understood something important, but surely "extending" a class is not so natural and easy as it is in Objective-C
and Smalltalk.

I think I won't go too much far with Sather, but first I would like to try to write something really "OO" in Sather. I feel that, from the OOP point of view, Sather has some oddities; I would like to focus them to compare it better with other OO "solutions" (i.e. languages), in other words to confirm that the worst implementations of the OO concepts are those of the most used and widespread languages.

2010-03-30

One-liners and why to get it harder than it could be

Since I've updated kernel and X server, my computer became slower... new "softwares" require new hardware to do the same things. This is a fact that always upset me. Anyway, I abandoned KDE (unusable... and half-mixed KDE3-4!) and installed WindowMaker. I also abandoned full graphics file-manager (avoid the loading of Qt libs and KDE services, being used to use Konqueror, uncomparable to Dolphin), tried lightweight Gtk file-managers but none satisfied me. I used midnight commander for a while and tried also gnu-git ... But now mainly I use simply the command line, in a simple xterm of course...

So it happens that something easy and fast with a filemanager becomes a little bit harder. But still the command line, with all GNU tools around and interpreters of languages like perl, is powerful.

When I download pictures from the cam, I need renaming them according to EXIF date; so I've created a script (exifrename.sh), which uses exif the extract the date and rename the object the way I want. I use it like this

exifrename.sh *.jpg |bash


The pipe is because the script outputs the commands rather than executing them (it could be useful to check everything's fine).

My cam creates a directory for each day, like NUMcanon where NUM is a number not related to the day. What I want is to copy just some days, not all, into the local hard disk and in the same time I want to strip the "canon" part. It would be easier maybe to copy the dirs and then rename them. But less fun for me. So what I do is

for el in 101 102 105 ; do mkdir $el ; 
pushd $el ;
cp -r /media/usbpen/dcim/${el}canon/* . ;
popd ; done


Ok not so funny but useful to... Recently I had to convert several scattered .doc (alas Microsoft stuff) into HTML, I've used this

find . -iname "*.doc" |
(
while read line;
do wvHtml "$line" "${line%.doc}.html ; done
)


As last example, a "oneliner" to read FASTA headers / sequences and sort them (just the header). Don't ask why, I've found the request on Yahoo Answer.

perl -e '%p=();
while(<>) {
if (/^>/) {
s/^>//; chomp;
foreach $v (split("\x01", $_)) {
$p{$v} = 1 if !exists $p{$v} ;
}
}
}
print join(" ", sort keys %p)' <fasta.txt


A perl guru can make surely better (and what about perl6?), anyway the solution worked .

Now we discover that there are things that can't be done simply by a graphics interface but... These examples are taken directly from the history of the shell; there are other two similar examples. The history has 1045 entries. Of these, 1039 are mainly simple commands like cd, rm, cp, ls, mkdir (and sometimes mount and umount). Actions that would have been a lot easier using a graphical file manager!

So the problem said at the beginning is back: I need keeping running few processes/programs. ... I'm going to hate the computer business, because I can't see a reason why to do the same thing I did before now I need more memory or a more powerful processor... or downgrading the software (meaning I should compile by myself a lot of codes since oldest packaged pre-compiled programs are already too recent, or there are too many dependencies issues I don't want to cope with!)

I hope things will be better, but looking around I see things indeed get worse: we can say Berlusconi is the winner, and this means there's no hope to change for Italy (it seems OT, but is not...).



Ah, the exifrename.sh code:
for el in "$@"; do
pathpart=$( dirname "$el" )
namepart=$( basename "$el" )
data=$( exif "$el" |egrep "Date and Time \(orig" \
|sed -e 's/^.*|\([0-9]\{4\}\):\([0-9][0-9]\):\([0-9][0-9]\) \([0-9][0-9]\):\([0-9][0-9]\):.*$/\1\2\3-\4\5/' )
echo "mv \"$el\" \"$pathpart/$data-$namepart\""
done

2010-03-26

Just to Begin

What should I write just to introduce myself or rather this blog? I really don't know, thus I won't try to be smart, cool or whatever. Basically I've nothing to say, so I can say everything, though in an English that could make someone shudder. And I'll begin from the very end. Few days ago (maybe already a week or more!) I was back from hospital. Recently I underwent a surgical operation, something like this (indeed, precisely it). Has this fact changed my view of life and universe? Nothing at all. Has this fact brought to me a new awareness about something? Maybe yes. Did I see a light, a bright tunnel, a whirl and hear a voice telling me my time is not come yet? No. Simply turned off for a while. No vision, no memory, no thought, no pain. Satori? I don't know but it was not unpleasant, and this is the outstanding fact. Unpleasant were the painful awakenings (two).

Will I talk about this in the blog? Not at all, luckly! It was just a way to begin and explain why I haven't been online for several days, a month and an half, more or less. Since January, 30, until a day in March. So you're wondering what this blog will be about... I think, but I am not sure, I will put in it words people can disregard, maybe opinions, maybe witticisms, maybe codes, ... ideas coming into my mind, without necessarily a logic tie, nor the claim to be interesting for a supposed audience. And of course without the promise for a perfect English... The only language I can write in correctly is Italian. Why doesn't I write in italian? Because I prefer to imagine a supposed larger audience.

That's all for this first post...