LanguagesArchitecture

Vale 0.2 is out, and it includes the beginnings of a feature we like to call Fearless FFI.

This is part of Vale's goal to be the safest native language. 0 Most languages compromise memory safety in some way which can lead to difficult bugs and security vulnerabilities.

Vale takes a big step forward here, by isolating unsafe and untrusted code and keeping it from undermining the safe code around it.

This page describes the proof-of-concept we have so far, plus the next steps. It involves some borderline-insane acrobatics with inline assembly, bitwise xor and rotate, and two simultaneous stacks. Buckle up!

The Challenge: Leaky Unsafe

Most languages have a Foreign Function Interface (FFI) to enable calling into another language's code.

Normally, when a safe language's code calls functions written in an unsafe language, any bugs in the unsafe language can cause problems in the safe language. For example:

  • When Python code sends a Python object into C, if the C code doesn't correctly call Py_INCREF, it will corrupt Python's memory and cause some mysterious behavior later on in the Python code.
  • When Rust hands a reference into C, the C code can type-cast it at will to write arbitrary memory to the Rust object, causing confounding bugs later on in the safe Rust code.
  • When Javascript hands the wrong kind of object to a Typescript function, it causes bugs down the line deep in the Typescript code, even though Typescript has static typing.

This is called "leaky safety", and its bugs are very difficult to track down, because their symptoms manifest so far from their cause.

This can also happen when a language has unsafe escape hatches. If some unsafe code corrupts some memory, it can cause undefined behavior in safe code. For example, see this Rust snippet where an unsafe block corrupts some memory that's later used by the safe code.

In all these cases, we know that the unsafe language was involved somewhere in the chain of events, but since the bugs actually happen later on, in supposedly safe code, there's no easy way to identify which unsafe code was the original culprit.

Worse, some people take advantage of these intentionally, by introducing these vulnerabilities into your dependencies. For example, the GitHub Advisory Database describes thousands of vulnerabilities in dependencies, even ones written in normally safe languages like Go and Rust.

Side Notes
(interesting tangential thoughts)
0

By native, we mean not running in a VM.

The Goals

We have two goals here:

  • Fearlessly call into our own C code without risking accidental corruption in our Vale code's data, by separating the Vale memory from C.
  • Fearlessly call into others' C code without risking malicious corruption in our Vale code's data, by using sandboxing.

This doesn't just apply to C code, but any code written in an unsafe language.

Fearless FFI

Vale protects against these bugs with a handful of different mechanisms, which combine to form Fearless FFI:

  • Separate the safe memory from the unsafe memory (such as the memory managed by C). This includes:
    • Not allowing safe objects to contain unsafe objects.
    • Not allowing unsafe objects to contain safe objects.
    • Using a different stack for the unsafe code.
  • Allowing references between the two:
    • A safe object can contain a reference to an unsafe object.
    • An unsafe object can contain a reference to a safe object, and it's automatically scrambled.
  • Enable passing memory between the two by copying, also known as message passing.
  • Whitelist dependencies that use FFI.
  • (Optionally) Sandbox the unsafe code for extra security, using either webassembly compilation or a subprocess.

Let's explore each of these mechanisms!

Copying Data Between Vale and C

In Vale, we often designate objects as immutable. When we send immutable data between C and Vale, we're actually sending a copy.

The C code can do whatever it likes with this copy, and there's no risk of corrupting Vale objects.

Here, a Vale main function is sending an immutable Vec3 struct into C.

vale
exported struct Vec3 imm { 1
x int; y int; z int;
}
exported func main() {
v = Vec3(10, 11, 12);
s = sum(v);
println(s);

}

extern func sum(v Vec3) int;
c
#include <stdint.h>
#include "mvtest/Vec3.h"
#include "mvtest/sum.h"
extern int mvtest_sum(mvtest_Vec3* v) {
  int result = v->x + v->y + v->z;
  free(v);
  return result;
}
stdout
33
1

We use the exported keyword to make it visible to the C code, and automatically generate the headers (like mvtest/Vec3.h).

References Between C and Vale

Copying data back and forth is great for most use cases, but we also want to be able to hand C some pointers to our Vale data. That way, C code can call methods on our Vale objects.

For example, we might want to wrap a C HTTP server and have it call into Vale to handle requests. 2

However, in most languages, it's risky to make a pointer to an object and hand it to unsafe code. The unsafe code could corrupt the Vale object, like in this example:

vale
exported struct Engine { fuel int; }
exported struct Ship { engine ^Engine; }
exported func main() {
s = Ship(^Engine(42));
halveFuel(&s);

}

extern func halveFuel(s &Ship) int;
c
extern int myproject_halveFuel(myproject_Ship* s) {
  // Whoops, accidentally overwrote a pointer!
  *(int*)ship->engine = ship->engine->fuel / 2;

  // Should have been:
  // ship->engine->fuel = ship->engine->fuel / 2;
}

Here, the C function myproject_halveFuel accidentally overwrites a pointer with an integer.

Luckily, Vale prevents this. Vale doesn't give C a pointer that it can use to dereference our Vale objects.

It instead gives a wide generational reference, which contains:

For those unfamiliar, a generational reference is a reference containing a pointer to an object, and a "remembered generation" which is an integer that matched the "actual generation" integer from the object itself. When we free the object, the object's actual generation is incremented. Before dereferencing a generational reference, Vale asserts that the generations matched, to ensure we haven't deallocated the object since then. By our last measurements, generational references are over twice as fast as reference counting, and could get even faster when we add our planned region borrow checker and hybrid-generational memory features.

The "wide" generational reference is then compressed into 32 bytes 3 and then scrambled:

  • We xor the entire 32 bytes by a constant factor.
  • We rotate the entire 32 bytes by a constant factor.

These constant factors are randomly generated at compile time, different for every build. 4 This is mainly to prevent C from accidentally dereferencing the contained pointers. 5

How does C read the data then?

The C code will need to hand it back to a Vale function, like the example here.

When a Vale function receives a scrambled reference, it will unscramble it and generation-check the region. If the C code gave an invalid reference, it will be detected right then.

2

I did something like this recently in RocketVale to implement a game server for a roguelike game.

3

There are quite a few unused bits in pointers, that can be repurposed for other things.

4

Generations are different every run (and so are addresses, thanks to Address Space Layout Randomization), so the entire wide generational reference is random. This randomness is useful further below, when giving these references to sandboxed subprocesses.

5

This "pointer obfuscation" isn't for security, it's just to prevent accidental memory corruption from C. We'll talk about additional measures for security below.

Separating Memory

One of the reasons it's risky to call into an unsafe language is because they can do buffer overruns on the stack, like this C snippet:

c
void badCFunction() {
  int myArray[10];
  myArray[-5] = 7;
  myArray[15] = 7;
}

This function is particularly sinister, because it will overwrite its caller's memory. What if our caller was this Vale function?

vale
struct Ship {
engine ^Engine; // heap-allocated Engine
}
func myValeFunction() {
ship Ship = Ship(^Engine(42));

// Call the C function
badCFunction();

}

Here, the C function is reaching backwards in the stack, into the caller's memory, and changing something there. This might make ship.engine point to address 0x7, because ship lives on the stack.

To solve this particular problem, the Vale compiler runs the C code on a secondary stack. Basically, it sets the stack pointer register to a new chunk of memory, to serve as our new stack.

This involves some inline assembly, which will set the stack pointer to some new memory, and then call a "wrapper" function using that new stack.

c
    asm volatile(
        // Set the stack pointer to new_stack_top.
        "mov %[rs], %%rsp \n"
        // Call badCFunction_wrapper function.
        "call *%[bz] \n"
      : [ rs ] "+r" (new_stack_top), [ bz ] "+r" (badCFunction_wrapper) ::
    );

As you can see, Vale calls some sort of badCFunction_wrapper on the "new stack".

This automatically generated wrapper function will call badCFunction, and when it's done, it will jump back to our original stack.

Here is the wrapper:

c
void badCFunction_wrapper() {
  // Extract args from thread local storage:
  size_t original_stack_state_scrambled =
      thread_local_current_wrapper_args->original_stack_state_scrambled;
  // If badCFunction had any arguments, we'd read them here.

  // Call the actual badCFunction.
  badCFunction();

  // Jump back to the safe stack.
  // This will undo the stack pointer to what it was when we called setjmp.
  // Supplying the 1 will send it into its else block.
  longjmp(*(jmp_buf*)unscramblePtr(original_stack_state_scrambled), 1);
}

Notice the longjmp, which is another way to switch stacks. Here we're using it to switch back to our original stack. Our "original stack state" was stored (and scrambled) in thread local storage.

Remember that assembly code we saw above? Below we see it in context. This C code sets up the original stack state, puts a pointer to it in thread local storage, and then uses the assembly code to switch to the new stack.

c
  // Set up the original_stack_state, which the other stack will use
  // to switch back to here.
  jmp_buf original_stack_state;
  if (setjmp(original_stack_state) == 0) {

    // Put the return destination into the thread-local "wrapper args".
    SinisterCwrapperFunctionArgs args = {
      scramblePtr(&original_stack_state),
      // If badCFunction had any arguments, they would go here.
    };
    thread_local_current_wrapper_args = &args;

    asm volatile(
        // Set the stack pointer to new_stack_top.
        "mov %[rs], %%rsp \n"
        // Call badCFunction_wrapper function.
        "call *%[bz] \n"
      : [ rs ] "+r" (new_stack_top), [ bz ] "+r" (badCFunction_wrapper) ::
    );
  } else {
    // Continue on
  }

We also make sure that Vale will never reuse memory previously freed by the C code. This can happen if C calls free and then Vale calls malloc. Instead, we plan to use a separate address range for all Vale allocations, using mimalloc (note that this is not implemented yet, only planned).

Protection from Accidental Corruption

So far, these mechanisms protect us from accidental corruption. With the system above, it's pretty much impossible to accidentally corrupt Vale objects. With this, projects or teams can call into their own C code, and have confidence that they aren't causing memory unsafety in their Vale objects.

This system also protects us from accidental memory safety in our dependencies. Dependencies' unsafety is a big problem in projects nowadays.

With these mechanisms, we can be confident that any problems in our dependencies won't cause memory unsafety in our code.

We implemented a proof-of-concept of this! The proof of concept works for macOS by simply supply --enable_side_calling true to the valec invocation. The other measures (copying data, references, scrambling) already work on all platforms and are enabled by default.

Vulnerabilities and Supply-Chain Attacks

There are two more problems to solve here:

  • A dependency intentionally reads or writes our safe data. This is known as a kind of supply-chain attack.
  • An accidental vulnerability in a dependency could allow an attacker to read or write from our Vale objects.

We'll need something additional to protect against these problems.

We believe these problems need to be solved at the language level (or below), so our plan is to introduce sandboxing for untrusted third-party C code. Basically:

  • We can run the C code in a subprocess. The C code will run at top speed, though FFI calls to it might be slow.
  • We can run the C code in a webassembly sandbox. The C code must be portable, and might not run as fast, but FFI calls to it would be fast.
  • We can compile through webassembly as a build step using something called wasm2c, as described in Bobby Holley's WebAssembly and Back Again: Fine-Grained Sandboxing in Firefox 95. This allows the C code to run even faster.
Note: Sending copies, stack switching, and reference scrambling are implemented in Vale, but we've only started on sandboxing. We're describing it here to let you know the direction we're heading. We're starting with the last option, using wasm2c.

A Vale program might use different strategies per dependency:

  • A team writing a lot of Vale and C code could forego sandboxing, since they wrote the C themselves.
  • A team doing a lot of FFI calls into a library that does less calculation could use webassembly.
  • A team doing a few FFI calls into a library that does a lot of calculation, or a library in non-portable C, could use a subprocess.
  • If security isn't as much of a concern (such as in certain kinds of games or other client-side code) then one can turn off Fearless FFI completely.

As we implement the sandboxing portion, we'll be posting follow-up articles about the details and tradeoffs involved here. Stay tuned!

Security, Whitelisting, Permissions

Of course, memory unsafety isn't the only source of vulnerabilities. For example, a dependency could read and write files maliciously.

For this reason, we plan to have whitelisting. Specifically, every project must explicitly allow every dependency on a library which uses FFI.

For example, if we have:

  • MyProgram, which depends on:
    • ListDirectoryLib, which depends on:
      • AFileLib, which does some FFI calls.

Then MyProgram's valec will need to explicitly specify that ListDirectoryLib is allowed to depend on AFileLib which does FFI, with the flag --allow_ffi ListDirectoryLib:AFileLib.

This will also be required for standard library modules, such as stdlib.Subprocess, stdlib.File, stdlib.Network, etc. Any dependency using any of these will need to be explicitly whitelisted by the ultimate program.

This will, in effect, give Vale compile-time per-module capability-based security, which should help mitigate problems from dependencies.

Of course, that only applies to Vale code. For native code, we can sandbox the subprocesses, or use WebAssembly System Interface (WASI), which gives us capability-based security for the dependencies that we run in webassembly.

This could be a great boon to the software world. Capability-based security could help mitigate a lot of supply-chain attacks, like the ones explained in Supply chain attacks on open source software grew 650% in 2021 and Backdooring Rust Crates for Fun and Profit.

Putting It All Together

We've described five mechanisms to help protect our Vale data from problems in our C code:

  • Sending copies. (Done!)
  • Scrambled references. (Done!)
  • Separate stacks. (Prototyped in macOS!)
  • Sandboxing, with subprocesses or wasm2c. (In progress!)
  • Dependency whitelisting. (Planned!)

This should give Vale programs the tools to run their native code and dependencies with much more confidence.

Thanks for visiting, and we hope you enjoyed this article!

In the coming weeks, we'll be writing more about our "deterministic replayability" proof-of-concept which eliminates heisenbugs and helps us reproduce race conditions, so subscribe to our RSS feed, twitter, or the r/Vale subreddit, and come hang out in the Vale discord!