2014-02-22

Plans for Golfrun

The C Golfrun interpreter is stuck, as it was before. I think there's no reason to keep the idea of a full formalization of the language. I still want to rewrite it in C++11, but I don't want to find myself trying to inflate this or that feature of the language for the sake of it. Rather, I will use bits of C++11 when and if they fit; otherwise, plain C++, and even not too much C++ized i.e., it will be C++ for the STL, rather than for heavy Object Orientation; likely STL will be just a cozy replacement for the glib, and that'll be all.

In the meantime the language drifted and diverged from GolfScript in my mind. And from current Golfrun too! The changes I thought about were
  • no comments: code golfing can lack comments; and if you need them, a string that then you will drop from the stack can be used instead. There's a difference of course, since the string is digested at the lexical analyzer level while a comment is consumed by the parser and never results in a token. In contrast with a common good practice, the rule is: avoid comments! The change makes the symbol # available for other magic;
  • strings: use only "; strings without escape characters interpretation can be added through another syntax, like _"string". Another symbol, ', will be available;
  • case unsensitive symbols: case change can be used to separate symbols; e.g. thisIS has two tokens, THIS and IS. It could be useful to save some extra space.  A sequence like ThIs will produce 4 tokes: T, H, I, S.
  • maybe, rational numbers: a syntax like 0r13/3 could be used. A number like 0.123 would be written as 0r123/100, which has length 9 against 5. Cumbersome, and bad for code golfing in few strokes. Maybe I should accept the fact that a new symbol must be exploited for this; e.g. 0'123. No, I don't want to make it impossible to duplicate a number without adding extra space(s), e.g. 12.13++ must leave 37 on stack and not give a stack underflow instead. So maybe the dup must become ' and the dot must be back to its common meaning as part of syntax for numbers.
    • Rather the ' could be used as part of the syntax to introduce some kind of literals, e.g. '0.123 (where the dot is the decimal separator)
  • underscore can't be part of a symbol anymore, so hey_ and _hey will result both in two symbol tokens. But if followed by a number, it will be the unary minus, as in J; so you can write 5_5+ instead of 5 -5+ and the minus will be only a dyadic operator.
  • assignment syntax can't be used to assign to single character non-alphabetic symbols, e.g. {5}:* won't work. Instead, the syntax could be used to mean some sort of symbol modifier, i.e. interpreted as the token :*. The longer assignment syntax will be used (to be defined; it will be similar to the lookup system service) 
Other syntax changes were already made, in particular those allowing to feed the stack with complex numbers and to write the colon symbol (simply doubling it). Data types are or will be:
  • numbers
    • integers (arbitrary precision using the GNU Multiprecision arithmetic library)
    • complex integers, i.e. there's a real (integer) part and an imaginary (integer) part
    • rationals (complex or not), maybe
  • strings (of bytes; not C strings, so that they can contain zero bytes as well)
  • blocks, they are strings after all, but with a different syntax and can trigger different behaviour of operators
  • arrays (collection of eterogeneous objects)
  • hashmaps (keys are only strings; these strings can be the string representation of an object)
There will be 2 stacks: operands stack, and context stack. The context stack "contains" the operands stack and the symbol table. Currently, Golfrun can restore the original symbol table using a specific “system service”. This won't be needed anymore and the mechanisms of the context can be used instead. The context provides basically local variables capabilities and local stack capabilities.

Some extra built-ins will be kept, e.g.
  • dd (as 2dup in Forth); shorter synonym: D
  • sys (“system service”); single symbol ":" (written :: in the syntax) as synonym
  • stack (debug purpose mainly: dump the stack => stack associated with the topmost context)
  • sqrt (now it could return rational numbers approximating the result); shorter synonym: ST.
  • type (return the type of the object on the stack, without dropping it; shorter synonym: T
Others, added: e.g. 2swap (Forth) as SS, and rotation of more than 3 objects (@), as R.

A lot of symbols are now "free", and others need to have a defined behaviour with some kind of arguments. Coercion/implicit conversions need a clear, easy to remember rule.

An example of unclear behaviour is: [97 98]""+1/ will result in an array with two single character strings, "a" and "b". How do you go back? Something more elaborated as {(\;}%. This means that ( (or )) over a string will behave as if the string is an array of integers, except that the "head" (or "tail") is pushed as such, while the string remains a string. This is the same in GolfScript, where "ab") will result in two objects on stack, the string "a" and the integer 98. To get the "head" or the "tail" as single character, we need some extra work. In the eye of some operators, a string is an array of integers. Which is not wrong, but sometimes the information of the original interepretation is lost and coercion makes sense only if I am going to sum it with an integer. Something like Erlang $a could be desiderable (not using $  but something else instead).

Ok, I think it's time I start to code, without needing to have all this already planned in details, otherwise I won't start it again anymore.

No comments:

Post a Comment