2019-04-14

Nested functions

Some languages allow nested functions (or procedures, or subroutines, or whatever you prefer to call them), others don't.

I think those can be useful and I wonder why they aren't embraced by all living programming languages.

The example are made thinking of this scenario: you have a structure (a compound type, a record, a…), and an array of this structure/type, and you want to count the unique elements of this array using a very specific idea of equality (and less than…) which holds only for that purpose. That is, outside this specific count unique elements function, a different rule to compare elements exists (where two elements are equal if each field of the elements is equal).

ISO C forbids nested functions. GNU C allows them, unless you specify the -pedantic or -Wpedantic option, in which case you have a warning that you can turn into an error by adding -Werror.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
int x[] = {1, 2, 3, 4, 5, 10, -4};
const size_t n = sizeof (x) / sizeof (x[0]);

const int shift = 1;

int invcmp(const void *a, const void *b)
{
const int *i = a;
const int *j = b;
printf("INSIDE %zu\n", n);
return *j - *i + shift;
}

printf("** %d\n", __STDC__);
qsort(x, n, sizeof (x[0]), invcmp);

size_t i = 0;
for (i = 0; i < n; ++i)
printf("%d\n", x[i]);

return 0;
}

In this simple code shift and n are used just to show that the nested function can see them. Anything inside the function invcmp, instead, is out of scope outside it, of course. (Beware: it's not a closure! We don't need closures for this, anyway.)

The following example shows how you would do it in C, without nested functions.

This is the main:

#include "elements.h"

#include <stdio.h>

int main(void)
{
element s[] = {
{ "zak", 2, 0 },
{ "mac", 3, 1 },
{ "abc", 4, 2 },
{ "zak", 5, 0 },
{ NULL, -1, -1 }
};

printf("%zu\n", count_unique(s));

return 0;
}

In order to avoid “exposing” the specific idea of equality needed by the count_unique() function to the code actually using count_unique(), you put it in a file of its own (a compilation unit).

In the place where you use count_unique(), the compare() function (see elements.c) isn't visible. You can't achieve this if you put count_unique() in the same file of the function using it — namely the file containing main(), in this case.

In elements.c, even if the two functions compare() and count_unique() are very close and work together, compare() can't access to anything visible to count_unique() — except for global variables and static symbols in the same compilation unit (I don't remember how it is called… top scope? outermost scope of the compilation unit?).

Files elements.h and elements.c can be found in this gist on github.

C++ suffers from the same problems as C. But modern C++ has lambda functions, which provide exactly for what we are looking for, even if with a different syntax.

size_t count_uniques(const std::vector<SElement> v)
{
size_t c = 0;
bool found;

auto compare = [&v](int i, int j) -> int {
if (v[i].e2 == v[j].e2)
return 0;
else if (v[i].e1 != v[j].e1)
return v[j].e1 - v[i].e1;
else
return v[j].e2 - v[i].e2;
};

for (size_t i = 0; i < v.size(); ++i) {
found = false;
for (size_t j = 0; j < i; ++j) {
if (compare(i, j) == 0) {
found = true;
break;
}
}
if (!found) ++c;
}
return c;
}

With this tool we can write compare() using indexes instead of references to the elements.

The SElement is something like this (from the header file):

//...

struct SElement
{
std::string s;
int e1;
int e2;
};

size_t count_uniques(const std::vector<SElement> v);

// ...

The complete example can be found as a gist on github.

Fortran allows for one level of nested functions or subroutine, and only in a contains of a module. For the common usage of a nested function, it makes sense — I suppose that too much nesting can be a sign of bad programming practice.

The idea is the same already used for the C++ example:

! ...
function count_unique(s) result(c)
implicit none
integer :: c
type(element), dimension(:), intent(in) :: s

integer :: i, j
logical :: found

c = 0
do i = 1, size(s)
found = .false.
do j = 1, i-1
if (comp(i, j) == 0) then
found = .true.
exit
end if
end do
if (.not. found) c = c + 1
end do

contains

function comp(i, j) result(d)
implicit none
integer :: d
integer, intent(in) :: i, j

if (s(i)%e2 == s(j)%e2) then
d = 0
else if (s(i)%e1 /= s(j)%e1) then
d = s(j)%e1 - s(i)%e1
else
d = s(j)%e2 - s(i)%e2
end if
end function comp
end function count_unique

! ...

The complete example can be found as a gist on github. It won't compile with Fortran 95 or older versions.

The code for count_unique() must stay in its module file, so that from the point of view of the file organizations, it is more like C. The advantage here is the ability of the nested function to see the variables of its container, hence allowing us to use indexes (as in the C++ version) instead of references or pointers (as in the C version).

Haskell is a different beast, but anyway it is useful to take a look at it. Of course, it allows what we could call nested functions — but I believe this isn't the correct terminology.

The example can be written like this:

countUnique :: [Element] -> Int
countUnique [] = 0
countUnique (x:xs) | x foundIn xs = countUnique xs
| otherwise = 1 + countUnique xs
where foundIn :: Element -> [Element] -> Bool
foundIn x [] = False
foundIn x xs = any (== 0) $map (compare x) xs where compare :: Element -> Element -> Int compare a b | (e2 a) == (e2 b) = 0 | (e1 a) /= (e1 b) = (e1 b) - (e1 a) | otherwise = (e2 b) - (e2 a) The where can be thought as the starts of the definition of the “nested” function(s). Here we have already two levels, and I suppose there's no limits. You can also use let … in, even if this is an expression (see Let vs. Where), or mix let … in and where as in the next example: countUnique2 :: [Element] -> Int countUnique2 [] = 0 countUnique2 (x:xs) = let foundIn :: Element -> [Element] -> Bool foundIn x [] = False foundIn x xs = any (== 0)$ map (compare x) xs
where compare :: Element -> Element -> Int
compare a b | (e2 a) == (e2 b) = 0
| (e1 a) /= (e1 b) = (e1 b) - (e1 a)
| otherwise = (e2 b) - (e2 a)
in
case () of _
| x foundIn xs -> countUnique2 xs
| otherwise -> 1 + countUnique2 xs

In general, I think you choose what it seems more expressive to explain what the code does.

           deriving (Show, Eq)

That is, the type has “embedded” its own obvious idea of equality. In C++ we could so something similar overloading the required operators, and in Fortran too. In C, we can't: we need to be more explicit and write other functions.

The test code is:

import Counter

main = do print \$ countUnique2 [ Element "zak" 2 0,
Element "mac" 3 1,
Element "abc" 4 2,
Element "zak" 5 0 ]

and the full code can be found in this gist.

Note: the functions in where can “look back”; e.g., foundIn could be written with a single argument, and compare too, because the first argument of these two functions is in fact the x which is the head of the list passed as argument to countUnique (see the unexported countUnique3 in the gist). The functions in let … in too can “look back”, but of course they can't “look forward”.

   function Count_Unique (S : Elements) return Integer
is
Found : Boolean;
C     : Integer := 0;

function Compare (I, J : Integer) return Integer
is
begin
if S(I).E2 = S(J).E2 then
return 0;
elsif S(I).E1 /= S(J).E1 then
return S(J).E1 - S(I).E1;
else
return S(J).E2 - S(I).E2;
end if;
end;

begin
for I in S'Range loop
Found := False;
for J in S'First .. I - 1 loop
if Compare(I, J) = 0 then
Found := True;
exit;
end if;
end loop;
C := C + 1;
end if;
end loop;
return C;
end Count_Unique;

This function could be put in the declarative part of the “main” procedure, hence making it an example of nesting with depth 3:

procedure CTest is

function Count_Unique (S : Elements) return Integer
-- ...
end Count_Unique;

S : Elements := (
("zak", 2, 0),
("mac", 3, 1),
("abc", 4, 2),
("zak", 5, 0)
);
begin
Put (Count_Unique (S));
New_Line;
end CTest;

In the gist I've put it in its own file. Note that in the specification of course there's no clue about the fact that Count_Unique uses a helper function to achieve its purpose.

Every language which has lambda but doesn't allow nested functions, can be considered like C++ (modern C++). If the language allows nested functions and has also lambda, which one to use? It depends.

Python

An example of a language that has lambdas and also allows nested functions defines is Python.

I don't know if those defs can be considered functions or they are methods of an anonymous class, like in Ruby if I remember correctly.

Anyway they can “see” the variables of the outer scopes.

Maybe using lambdas is better. But Python's lambda must be defined as expressions, so we need to write a thing like this:

    comp = lambda i, j: \
0 \
if a[i].e2 == a[j].e2 \
else ( \
a[j].e1 - a[i].e1 \
if a[i].e1 != a[j].e1 \
else a[j].e2 - a[i].e2 \
)

Ruby

In Ruby it is possibile to nest defs, but the nested def behaves almost like a function defined outside. The scope of its name with respect to outside the “container” is different, but the problem here is that it doesn't see the variables of the container (said differently, it's not lexically scoped).

def a
z = 1
def b
puts z # ERROR
end
b
puts z
end

a

Luckly lambdas are nicer than Python and our example can be written with those:

# ...
def unique_count(a)
comp = lambda do |i, j|
if a[i].e2 == a[j].e2
return 0
elsif a[i].e1 != a[j].e1
return a[j].e1 - a[i].e1
else
return a[j].e2 - a[i].e2
end
end
# ...
return c
end

The only quirk is that we need to write comp.call(i, j), not just comp(i, j).

Conclusions

For the purpose of this article, I would say that only the following languages have nested function (uniform syntax, lexical scoping):

• C (only GNU extension)
• Fortran