2019-04-14

Nested functions

Some languages allow nested functions (or procedures, or subroutines, or whatever you prefer to call them), others don't.

I think those can be useful and I wonder why they aren't embraced by all living programming languages.

The example are made thinking of this scenario: you have a structure (a compound type, a record, a…), and an array of this structure/type, and you want to count the unique elements of this array using a very specific idea of equality (and less than…) which holds only for that purpose. That is, outside this specific count unique elements function, a different rule to compare elements exists (where two elements are equal if each field of the elements is equal).

What about C?

ISO C forbids nested functions. GNU C allows them, unless you specify the -pedantic or -Wpedantic option, in which case you have a warning that you can turn into an error by adding -Werror.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    int x[] = {1, 2, 3, 4, 5, 10, -4};
    const size_t n = sizeof (x) / sizeof (x[0]);

    const int shift = 1;

    int invcmp(const void *a, const void *b)
    {
        const int *i = a;
        const int *j = b;
        printf("INSIDE %zu\n", n);
        return *j - *i + shift;
    }

    printf("** %d\n", __STDC__);
    qsort(x, n, sizeof (x[0]), invcmp);

    size_t i = 0;
    for (i = 0; i < n; ++i)
        printf("%d\n", x[i]);

    return 0;
}

In this simple code shift and n are used just to show that the nested function can see them. Anything inside the function invcmp, instead, is out of scope outside it, of course. (Beware: it's not a closure! We don't need closures for this, anyway.)

The following example shows how you would do it in C, without nested functions.

This is the main:

#include "elements.h"

#include <stdio.h>

int main(void)
{
    element s[] = {
                   { "zak", 2, 0 },
                   { "mac", 3, 1 },
                   { "abc", 4, 2 },
                   { "zak", 5, 0 },
                   { NULL, -1, -1 }
    };

    printf("%zu\n", count_unique(s));

    return 0;
}

In order to avoid “exposing” the specific idea of equality needed by the count_unique() function to the code actually using count_unique(), you put it in a file of its own (a compilation unit).

In the place where you use count_unique(), the compare() function (see elements.c) isn't visible. You can't achieve this if you put count_unique() in the same file of the function using it — namely the file containing main(), in this case.

In elements.c, even if the two functions compare() and count_unique() are very close and work together, compare() can't access to anything visible to count_unique() — except for global variables and static symbols in the same compilation unit (I don't remember how it is called… top scope? outermost scope of the compilation unit?).

Files elements.h and elements.c can be found in this gist on github.

What about C++?

C++ suffers from the same problems as C. But modern C++ has lambda functions, which provide exactly for what we are looking for, even if with a different syntax.

size_t count_uniques(const std::vector<SElement> v)
{
    size_t c = 0;
    bool found;

    auto compare = [&v](int i, int j) -> int {
                       if (v[i].e2 == v[j].e2)
                           return 0;
                       else if (v[i].e1 != v[j].e1)
                           return v[j].e1 - v[i].e1;
                       else
                           return v[j].e2 - v[i].e2;
                   };
    
    for (size_t i = 0; i < v.size(); ++i) {
        found = false;
        for (size_t j = 0; j < i; ++j) {
            if (compare(i, j) == 0) {
                found = true;
                break;
            }
        }
        if (!found) ++c;
    }
    return c;    
}

With this tool we can write compare() using indexes instead of references to the elements.

The SElement is something like this (from the header file):

//...

struct SElement
{
    std::string s;
    int e1;
    int e2;
};

size_t count_uniques(const std::vector<SElement> v);

// ...

The complete example can be found as a gist on github.

What about Fortran?

Fortran allows for one level of nested functions or subroutine, and only in a contains of a module. For the common usage of a nested function, it makes sense — I suppose that too much nesting can be a sign of bad programming practice.

The idea is the same already used for the C++ example:

! ...
  function count_unique(s) result(c)
    implicit none
    integer :: c
    type(element), dimension(:), intent(in) :: s


    integer :: i, j
    logical :: found

    c = 0
    do i = 1, size(s)
       found = .false.
       do j = 1, i-1
          if (comp(i, j) == 0) then
             found = .true.
             exit
          end if
       end do
       if (.not. found) c = c + 1
    end do
    
  contains
  
    function comp(i, j) result(d)
      implicit none
      integer :: d
      integer, intent(in) :: i, j

      if (s(i)%e2 == s(j)%e2) then
         d = 0
      else if (s(i)%e1 /= s(j)%e1) then
         d = s(j)%e1 - s(i)%e1
      else
         d = s(j)%e2 - s(i)%e2
      end if
    end function comp
  end function count_unique

! ...

The complete example can be found as a gist on github. It won't compile with Fortran 95 or older versions.

The code for count_unique() must stay in its module file, so that from the point of view of the file organizations, it is more like C. The advantage here is the ability of the nested function to see the variables of its container, hence allowing us to use indexes (as in the C++ version) instead of references or pointers (as in the C version).

What about Haskell?

Haskell is a different beast, but anyway it is useful to take a look at it. Of course, it allows what we could call nested functions — but I believe this isn't the correct terminology.

The example can be written like this:

countUnique :: [Element] -> Int
countUnique [] = 0
countUnique (x:xs) | x `foundIn` xs = countUnique xs
                   | otherwise = 1 + countUnique xs
  where foundIn :: Element -> [Element] -> Bool
        foundIn x [] = False
        foundIn x xs = any (== 0) $ map (compare x) xs 
          where compare :: Element -> Element -> Int
                compare a b | (e2 a) == (e2 b) = 0
                            | (e1 a) /= (e1 b) = (e1 b) - (e1 a)
                            | otherwise = (e2 b) - (e2 a)

The where can be thought as the starts of the definition of the “nested” function(s). Here we have already two levels, and I suppose there's no limits.

You can also use let … in, even if this is an expression (see Let vs. Where), or mix let … in and where as in the next example:

countUnique2 :: [Element] -> Int
countUnique2 [] = 0
countUnique2 (x:xs) = let foundIn :: Element -> [Element] -> Bool
                          foundIn x [] = False
                          foundIn x xs = any (== 0) $ map (compare x) xs
                            where compare :: Element -> Element -> Int
                                  compare a b | (e2 a) == (e2 b) = 0
                                              | (e1 a) /= (e1 b) = (e1 b) - (e1 a)
                                              | otherwise = (e2 b) - (e2 a)
                      in
                        case () of _
                                     | x `foundIn` xs -> countUnique2 xs
                                     | otherwise -> 1 + countUnique2 xs

In general, I think you choose what it seems more expressive to explain what the code does.

In this Haskell example you can read a

           deriving (Show, Eq)

That is, the type has “embedded” its own obvious idea of equality. In C++ we could so something similar overloading the required operators, and in Fortran too. In C, we can't: we need to be more explicit and write other functions.

The test code is:

import Counter

main = do print $ countUnique2 [ Element "zak" 2 0,
                                 Element "mac" 3 1,
                                 Element "abc" 4 2,
                                 Element "zak" 5 0 ]

and the full code can be found in this gist.

Note: the functions in where can “look back”; e.g., foundIn could be written with a single argument, and compare too, because the first argument of these two functions is in fact the x which is the head of the list passed as argument to countUnique (see the unexported countUnique3 in the gist). The functions in let … in too can “look back”, but of course they can't “look forward”.

What about Ada?

Ada accepts nesting as you please.

   function Count_Unique (S : Elements) return Integer
   is
      Found : Boolean;
      C     : Integer := 0;
      
      function Compare (I, J : Integer) return Integer
      is
      begin
         if S(I).E2 = S(J).E2 then
            return 0;
         elsif S(I).E1 /= S(J).E1 then
            return S(J).E1 - S(I).E1;
         else
            return S(J).E2 - S(I).E2;
         end if;
      end;
      
   begin
      for I in S'Range loop
         Found := False;
         for J in S'First .. I - 1 loop
            if Compare(I, J) = 0 then
               Found := True;
               exit;
            end if;
         end loop;
         if not Found then
            C := C + 1;
         end if;
      end loop;
      return C;
   end Count_Unique;

This function could be put in the declarative part of the “main” procedure, hence making it an example of nesting with depth 3:

procedure CTest is

   function Count_Unique (S : Elements) return Integer
   -- ...
   end Count_Unique;   

   S : Elements := (
                    ("zak", 2, 0),
                    ("mac", 3, 1),
                    ("abc", 4, 2),
                    ("zak", 5, 0)
                   );
begin
   Put (Count_Unique (S));
   New_Line;
end CTest;

In the gist I've put it in its own file. Note that in the specification of course there's no clue about the fact that Count_Unique uses a helper function to achieve its purpose.

What about other languages?

Every language which has lambda but doesn't allow nested functions, can be considered like C++ (modern C++). If the language allows nested functions and has also lambda, which one to use? It depends.

Python

An example of a language that has lambdas and also allows nested functions defines is Python.

I don't know if those defs can be considered functions or they are methods of an anonymous class, like in Ruby if I remember correctly.

Anyway they can “see” the variables of the outer scopes.

Maybe using lambdas is better. But Python's lambda must be defined as expressions, so we need to write a thing like this:

    comp = lambda i, j: \
        0 \
        if a[i].e2 == a[j].e2 \
        else ( \
              a[j].e1 - a[i].e1 \
              if a[i].e1 != a[j].e1 \
              else a[j].e2 - a[i].e2 \
        )

Ruby

In Ruby it is possibile to nest defs, but the nested def behaves almost like a function defined outside. The scope of its name with respect to outside the “container” is different, but the problem here is that it doesn't see the variables of the container (said differently, it's not lexically scoped).

def a
  z = 1
  def b
    puts z # ERROR
  end
  b
  puts z
end

a

Luckly lambdas are nicer than Python and our example can be written with those:

# ...
def unique_count(a)
  comp = lambda do |i, j|
    if a[i].e2 == a[j].e2
      return 0
    elsif a[i].e1 != a[j].e1
      return a[j].e1 - a[i].e1
    else
      return a[j].e2 - a[i].e2
    end
  end
  # ...
  return c
end

The only quirk is that we need to write comp.call(i, j), not just comp(i, j).

Conclusions

For the purpose of this article, I would say that only the following languages have nested function (uniform syntax, lexical scoping):

  • C (only GNU extension)
  • Fortran
  • Ada
  • Python (?)

Those which can use lambda, can achieve the same expressiveness, maybe, but I don't consider them languages with nested functions.

  • C++
  • Ruby

Haskell belongs to another world, so I don't know if it makes sense to consider it for the exploration of this article — it was used to make an example.

No comments:

Post a Comment