2014-12-23

Coroutines or goroutines?

I have already skimmed the topic with two posts:

The next promised post should have been about the same matter, tackled using D.E.Knuth's MMIX, mainly because of the GO and PUSHGO instructions.

2014-12-13

Something about certificates, CA, SSL and alike

Disclaimer: this article is not about security best practices nor contains advices about how to set-up a secure and trusted communication between two peers or a correct and working PKI. Its aim is to give a general picture of a broader topic, whose surface is slightly scratched by this text.

2014-09-21

Swiftly, Swift

And so, Apple is pushing Swift 
an innovative new programming language for Cocoa and Cocoa Touch

2014-05-18

From blog posts to "html" to editable document

I received the following request: having a blog, I want to put every post altogether into a single document (a word-processor document), keeping just the title, the published date and the content. Can you help me?

I did as follow.
  • Retrieve the full atom feed of the blog… Since the blog was hosted at blogspot, this link was helpful. But I had to add  “?max-results=500” to the url, since otherwise it stops at 50 posts.
  • Now, it is nothing but an xml, so a proper XSLT should be enough. And in fact… I have built upon this, removing everything I didn't need and adding the published date — the date, then, was the only reason for a post processing, since I had no idea how to transform it as I wanted, therefore I have put it raw (almost raw, indeed) into the output html (generated by xsltproc), and then…
  • I wrote few lines of Perl to transform every date from YYYYMMDD to “Weekday name, DD Month Name YYYY” in the generated html;
  • Loaded the html into LibreOffice Writer, then exported to odt.
The result is not perfect, but mainly it's the content's fault, for it is sometimes from a Microsoft Office text (i.e. the entry was written in Microsoft Word, then copy-pasted in the blogger text editor area).

Just to keep this post longer than it could be, here the few lines of Perl code I wrote to reinterpret the dates.
#! /usr/bin/perl
use strict;
while (<>) {
    if (/##(\d{8})##/) {
      my $r = `LC_TIME="it_IT.utf8" date -d$1 +"%A %d %B %Y"`;
      s/##\d{8}##/$r/;
    }
    print $_;
}
In the generated html, the sequence ## was used to mark the date, extracted as YYYYMMDD (using properly substring). I had to set LC_TIME since I am used to set my locale to en_GB.utf8 (I try to keep my system consistent about the language and avoid the mixture that happens when you use locale-aware and locale-unaware softwares), but I needed italian names for week days and months.

Simply silly, but now this post can come to an end. (No, not yet: why do you ignore the export feature? since I have no access to the blog indeed, but I was able to ask for the necessary blogID).


2014-05-10

Particles of coroutines

Following the very same idea of the previous post, I've implemented the same stuff for x86. No worry about details — I am not a x86 fan and lover — except that there are few tests I've not done in the m68k version (namely the latter assumed the compressed stream is not corrupted). But it's just noise, not worth considering.

Intel x86 assembly instructions suck, but I admit I don't know it very well and likely I haven't used some cool feature and I don't know any cool feature which, once I'll know it, will make my mind change. Rants end.

Since x86 has not too many registers, and since I've used C library (compiled with nasm, linked with gcc/ld) and x86 calling conventions apply, and since I wanted to avoid special purpose registers (ECX, ESI, EDI…) as “global” variable storage, there are extra push and pop to keep values between calls to library functions, while each coroutine assumes also that the register it's interested in, are not trashed.

The register EBP can be used for “shared” (or global) storage; in fact, I've used it to store the pointer to the token buffer, and the “continuation” address.

The yield/resume feature is done with this code (kept into a macro):

    mov ebx,[ebp]
    mov dword [ebp],$+9
    jmp ebx

First the next address is put into EBX, then it's replaced by the address of the instruction following the jump, then the jump to the address in EBX is performed.

That's all. Readers interested in the whole code can find it at this gist, but I doubt it's worth it. It'd be far more interesting to study an implementation that could be used for really, as the result of compiling high level language code.

Different calling conventions can make it easier, but then you need extra code to call external functions — sticking to common C calling convention on a system is the key to access a lot of code without the need for any kind of glue — almost. Rants end, again; guess when they began.

This may work fine for two coroutines. Let's reason about a third coroutine. Does it work? No. If you need to create another cooperation, e.g. between the parser which extracts tokens (a “lexical scanner” indeed) and a grammar parser (i.e. a parser), you'll be fucked up.

E.g. our parser at some point, instead of got_token, need to give control to another routine, namely the one which understands the grammar. Thus, for each coroutine pair we need a “slot” similar to the one in [ebp]. A theoretical JCONT macro would be more complex, and take into account at least the coroutine we want to give control to. E.g.

 parse:
     JCONT   parse,getc
     test    eax,eax
     ...
 .wend:
     mov     eax,TWORD
     ; the grammar_parser'd like to have ptr to buffer too
     ; ... but this could be a global, as it is
     JCONT   parse,grammar_parser
     ...

If there's a hashmap for each coroutine, then we need to initialize it first, and the somewhere likely we could need a reset too. The macro could look, in pseudocode, something like

    get_slot_of    %2
    mov            ebx,[ebp]
    mov            dword [ebp],$+9
    store_slot_for %1
    jmp            ebx

Just an idea, at a very late hour.

Crumbs of coroutines

Playing with handmade lexical scanners and parsers you soon discover how cool it would be if you could use coroutines, but unfortunately language like C and C++ haven't such a feature, nor they have a general gear to manage continuations — even though setjmp/longjmp can be thought as what you need to begin, but they maybe do not bring you to the end, not always at least.

2014-04-13

Bison

Today I decided to bite something of Bison. In my far past I had already experimented (though in a strong yacc fashion) with it, and by this I mean I have read few quick and simple tutorials — nothing more than the omnipresent infix calculator with frills, which by the way is also the main example in the manual, with enhancements found in mfcalc (hopefully updated for the future). To me the matter (using the tool as well as understanding bits of its inner working) is vast, deep and really interesting, but this is also the reason why I was always pushed towards stack-based languages in my experimenting with this world. Stack-based languages can polish complexity, largely undesidered in toy languages. But in the very same time it makes these toy languages almost alien. Stack-based or not, beyond a point a computer language can't miss a tool like Bison, unless you want to make a lot of craft work by yourself — there could be good reasons to do so, but I can't imagine one that fits the world of a toy language.

So, maybe only to make a noise and a vibration here and there, here's the result and, ladies and gentlemen, it is … hold on tight … the omnipresent basic infix calculator! More or less. In fact, you can assign the result of an expression to a symbol, and use it later. The lexical scanner reads only from standard input, and … again hold on tight … you can write 0.5a instead of 0.5*a! I admit it, MetaFONT book was very influential over me, and so it was the MetaFONT language, which is by heart the only language I know that accepts a more natural notation for the multiplication. Think about it: 2a is a syntactic error in the vast majority of computer programming languages, at least among the most known. Even languages thought to handle math stuffs, I am thinking about R and Octave mainly (and also Maxima!), disallow this syntactic sugar. Nothing bad, but my very simply infix calculator makes it possible! This is an incredible feature!! (Irony here, of course).

If you are interested in these basic things for beginners and in a complete, messed up, but working example to play on, you can take it from a gist of mine. I have avoided full C++ style (other examples show the “driver class” C++ approach), just used C++ where it turns to be ease (STL map class, since for the rest C would have sufficed). The next idea will be similar, maybe, and it will be about lambda. Indeed, since easter is near, I have started these tests in order to build a toy tool to play with the Church lambda calculus (shame on you, it already exists cool Xyz you can use very profitably! Ok, that's not my plan to be profitable or whatever, I am just playing to keep my last two survivor neurons almost alive), but I suppose I will be late, as usual.

Final note: on gist, if you assign the name for the file, you can't use the highlight you want. So, since .yy is an unknown extension, it made me impossible to select C++ highlight. We live in a world dominated by extensions rather than by users' will.

2014-02-22

Plans for Golfrun

The C Golfrun interpreter is stuck, as it was before. I think there's no reason to keep the idea of a full formalization of the language. I still want to rewrite it in C++11, but I don't want to find myself trying to inflate this or that feature of the language for the sake of it. Rather, I will use bits of C++11 when and if they fit; otherwise, plain C++, and even not too much C++ized i.e., it will be C++ for the STL, rather than for heavy Object Orientation; likely STL will be just a cozy replacement for the glib, and that'll be all.

In the meantime the language drifted and diverged from GolfScript in my mind. And from current Golfrun too! The changes I thought about were
  • no comments: code golfing can lack comments; and if you need them, a string that then you will drop from the stack can be used instead. There's a difference of course, since the string is digested at the lexical analyzer level while a comment is consumed by the parser and never results in a token. In contrast with a common good practice, the rule is: avoid comments! The change makes the symbol # available for other magic;
  • strings: use only "; strings without escape characters interpretation can be added through another syntax, like _"string". Another symbol, ', will be available;
  • case unsensitive symbols: case change can be used to separate symbols; e.g. thisIS has two tokens, THIS and IS. It could be useful to save some extra space.  A sequence like ThIs will produce 4 tokes: T, H, I, S.
  • maybe, rational numbers: a syntax like 0r13/3 could be used. A number like 0.123 would be written as 0r123/100, which has length 9 against 5. Cumbersome, and bad for code golfing in few strokes. Maybe I should accept the fact that a new symbol must be exploited for this; e.g. 0'123. No, I don't want to make it impossible to duplicate a number without adding extra space(s), e.g. 12.13++ must leave 37 on stack and not give a stack underflow instead. So maybe the dup must become ' and the dot must be back to its common meaning as part of syntax for numbers.
    • Rather the ' could be used as part of the syntax to introduce some kind of literals, e.g. '0.123 (where the dot is the decimal separator)
  • underscore can't be part of a symbol anymore, so hey_ and _hey will result both in two symbol tokens. But if followed by a number, it will be the unary minus, as in J; so you can write 5_5+ instead of 5 -5+ and the minus will be only a dyadic operator.
  • assignment syntax can't be used to assign to single character non-alphabetic symbols, e.g. {5}:* won't work. Instead, the syntax could be used to mean some sort of symbol modifier, i.e. interpreted as the token :*. The longer assignment syntax will be used (to be defined; it will be similar to the lookup system service) 
Other syntax changes were already made, in particular those allowing to feed the stack with complex numbers and to write the colon symbol (simply doubling it). Data types are or will be:
  • numbers
    • integers (arbitrary precision using the GNU Multiprecision arithmetic library)
    • complex integers, i.e. there's a real (integer) part and an imaginary (integer) part
    • rationals (complex or not), maybe
  • strings (of bytes; not C strings, so that they can contain zero bytes as well)
  • blocks, they are strings after all, but with a different syntax and can trigger different behaviour of operators
  • arrays (collection of eterogeneous objects)
  • hashmaps (keys are only strings; these strings can be the string representation of an object)
There will be 2 stacks: operands stack, and context stack. The context stack "contains" the operands stack and the symbol table. Currently, Golfrun can restore the original symbol table using a specific “system service”. This won't be needed anymore and the mechanisms of the context can be used instead. The context provides basically local variables capabilities and local stack capabilities.

Some extra built-ins will be kept, e.g.
  • dd (as 2dup in Forth); shorter synonym: D
  • sys (“system service”); single symbol ":" (written :: in the syntax) as synonym
  • stack (debug purpose mainly: dump the stack => stack associated with the topmost context)
  • sqrt (now it could return rational numbers approximating the result); shorter synonym: ST.
  • type (return the type of the object on the stack, without dropping it; shorter synonym: T
Others, added: e.g. 2swap (Forth) as SS, and rotation of more than 3 objects (@), as R.

A lot of symbols are now "free", and others need to have a defined behaviour with some kind of arguments. Coercion/implicit conversions need a clear, easy to remember rule.

An example of unclear behaviour is: [97 98]""+1/ will result in an array with two single character strings, "a" and "b". How do you go back? Something more elaborated as {(\;}%. This means that ( (or )) over a string will behave as if the string is an array of integers, except that the "head" (or "tail") is pushed as such, while the string remains a string. This is the same in GolfScript, where "ab") will result in two objects on stack, the string "a" and the integer 98. To get the "head" or the "tail" as single character, we need some extra work. In the eye of some operators, a string is an array of integers. Which is not wrong, but sometimes the information of the original interepretation is lost and coercion makes sense only if I am going to sum it with an integer. Something like Erlang $a could be desiderable (not using $  but something else instead).

Ok, I think it's time I start to code, without needing to have all this already planned in details, otherwise I won't start it again anymore.