Geek Blight - Wrong fact about calling C code from Ada

Wrong fact about calling C code from Ada

Posted on 2016-06-21T17:24Z. Updated on 2020-07-25T19:13Z.

I’ve been working on Ada code for several years now and I noticed I rarely blog about it if at all. There are several reasons for that. The main one is I don’t consider myself an Ada expert. I don’t use it for personal projects and when I have to work with it it’s about maintaining or adapting an existing huge code base that started its life as Ada 83. Modernizing it is usually out of the question due to budget and time constraints, which also limits my knowledge of modern Ada. Second, I’m totally out of touch with the Ada community. Anyway, today I’ll write about a very specific part of Ada that’s factually wrong in some important web sites and resources with the hope search engines will pick up this page and show it to anyone searching for details on the topic. For everybody else, this post is very obscure. So here’s the thing: you cannot, in general, ignore the return value of C functions when calling them from Ada by importing them as procedures.

But before I dive into it, let me give you a very brief opinion on Ada. What I don’t like: its syntax, the tooling support, its IO library, its weird object system that’s being slowly fixed and dragged into mainline with each new version of the language. What I like: the runtime checks for mostly everything that, while slowing execution a bit, provide a lot of safety that’s missing in C or C++. In Ada, speed is important but not as important as safety. Also, the great way you can pass information between tasks (i.e. threads). It’s very powerful and expressive and probably deserves its own post so people who don’t know Ada can get an idea.

Back into the main topic. A lesser-known part of the Ada language allows you to call C code from your Ada code. You can do the reverse too. The main purpose is to keep your codebase smaller. For example, you may want to have a library of routines that you could use from Ada programs and C++ programs. It’s easy to write the common part in careful C and use that same code from both Ada and C++. Or sometimes you want to access a low-level feature of the operating system that’s not exposed in any Ada package, nonstandard or not, but it’s, of course, exposed as C. You’d like to write a wrapper to access those routines from Ada.

The general way is to declare a function or procedure (in Ada they’re separated but I consider the difference impractical) with an appropriate prototype. Then, you tell the compiler it’s actually implemented in C, indicating the symbol it should look for. An example: let’s suppose you have the following C function.

int add(int a, int b);

The equivalent Ada prototype is:

function Add (a : in Integer; b : in Integer) return Integer;

Below the declaration, you indicate it’s actually implemented in C.

pragma Import(C, Add, "add");

And you’re basically done as long as you include the object code for “add” in the executable during the final link step. Do note that the code above generally works but to be 100% safe, you should use a slightly different prototype.

with Interfaces.C;
function Add (a : in Interfaces.C.int;
              b : in Interfaces.C.int)
    return Interfaces.C.int;
pragma Import(C, Add, "add");

The Interfaces.C package makes sure integer types are properly aligned in size and range between Ada and C.

Now let’s suppose the function does something significant but the return value is not important to you. For example, some C functions return values by writing them to output arguments and, at the same time, as the return value of the function. Or, for example, see the “dangerous” (by Ada standards) memmove() C function, which always returns its first argument to easily chain calls should you need to do it.

With our previous “add” function, we could write the following in C:

add(2, 3);

However, in Ada you cannot write the same code. The compiler won’t just spit out a warning. It will be considered an error.

interfaces_test.adb:8:04: cannot use function "add" in a procedure call
gnatmake: "interfaces_test.adb" compilation error

This would force you into declaring a variable to hold the return value. If you don’t use it for anything else, the compiler pesters you about it as soon as you compile with warnings enabled. See this example code:

$ cat interfaces_test.adb
procedure interfaces_test is
   function add (a, b: integer) return integer is
   begin
      return (a+b);
   end add;

   ret : integer;
begin
   ret := add(2, 3);
end interfaces_test;

$ gnatmake -Wall interfaces_test.adb
gcc -c -Wall interfaces_test.adb
interfaces_test.adb:7:04: warning: variable "ret" is assigned but never read
interfaces_test.adb:9:04: warning: useless assignment to "ret", value never referenced
gnatbind -x interfaces_test.ali
gnatlink interfaces_test.ali

Some notable resources you can find on the Internet, like the examples at the end of appendix B, section 3 of the Ada Reference Manual (essentially the official language specification) or section 16.1 of The Big Book of Linux Ada Programming seem to indicate you can ignore the returned value by declaring the function as a procedure from Ada, like this:

procedure Add (a, b : in Interfaces.C.int);

The first time I read it, it sounded suspicious at best. The Ada compiler will not check what the proper function or procedure prototype is because the symbol is only looked for when linking the final executable. I guess it would be technically doable but it would rely on second-guessing what the machine code for the function is doing. The Ada compiler relies on you to declare the right prototype.

Being barely familiar with assembly code and only having some general ideas about the low-level details of argument passing and value returning in C, I decided to investigate. I found a very clear explanation of the stack layout in C and Linux running on x86, and I supposed it was very similar for x86_64, replacing EAX with RAX among other minor changes. Compiling on x86_64 Linux using GNAT is, I guess, a very common scenario for modern Ada code.

Long story short, when a C function needs to return a value and it’s small enough to fit in a processor register, it will be returned in the EAX register (or RAX, I guess). Normally, it can be ignored. There’s no promise any subroutine will restore EAX when calling it, so even if you made a mistake and somehow a piece of code ended up believing the function returns void (which is essentially what declaring it as a procedure will do), if you have anything valuable in that register before calling it, you need to restore it after the call. If it was used to return a value, this will effectively ignore it. No problem, all good.

However, as the page I linked above explains, if the return value is too big to fit in a register, an address to store the result will be passed as if it was the function’s first argument. In other words, something like:

x = foo(a, b, c);

is actually transformed into something similar to

foo(&x, a, b, c);

when the size of “x” is too big. Due to the way the stack is organized, that would mean code supposing the return value is void will not store the arguments as expected in the stack, and the function code will try to interpret some stack values as an address when they may actually be other function parameters, potentially resulting in a crash or other wrong behavior. I decided to test the scenario with a small program, mixing Ada and C (actually, a workmate did it for me):

$ cat my_c_function.c
struct my_struct
{
   long x;
   long y;
   long z;
};

struct my_struct my_c_function(long a, long b)
{
   struct my_struct ret;
   ret.x = a + b;
   ret.y = a - b;
   ret.z = a * b;
   return ret;
}

$ cat interface_test.adb
with interfaces.c;

procedure interface_test is
   procedure my_ada_procedure(a, b: interfaces.c.long);
   pragma import(c, my_ada_procedure, "my_c_function");
begin
   my_ada_procedure(2, 3);
end interface_test;

Sure enough, the program crashes when it tries to return the structure.

$ gcc -Wall -g -c my_c_function.c
$ gnatmake -g -Wall interface_test.adb -largs my_c_function.o
gnatbind -x interface_test.ali
gnatlink interface_test.ali -g my_c_function.o
$ ./interface_test

raised STORAGE_ERROR : stack overflow (or erroneous memory access)

We can see the crash happens in the return statement when running it with GDB:

$ gdb ./interface_test
...
(gdb) r
Starting program: /tmp/interfaces-test/interface_test

Program received signal SIGSEGV, Segmentation fault.
0x0000000000401cf2 in my_c_function (a=3, b=6380384) at my_c_function.c:14
14         return ret;
(gdb)

Still, I’m pretty sure the specific details are more complicated than that. The program doesn’t crash if the structure only has 2 long integers instead of 3, and looking at the disassembly of the code it seems “my_c_function” is using both RAX and RDX to return values in that case. Anyway, I’ve proved you cannot simply ignore the return value in general by declaring the function as a procedure from Ada. If you don’t want warnings about unused variables, you’ll have to settle for the GNAT-specific “Unreferenced” pragma to silence the compiler.

Load comments