| jmcph4 |

State of the New Cairo 1.0 Compiler Stack

[2022-12-19 15:30:00 +1000]

Note: Obviously Starkware move at an extremely fast pace and the Cairo 1.0 codebase that is the subject of this article changes very rapidly as a consequence of this development speed. As such, the information contained within this article may very well have a pretty short shelf life, but I still feel it's valuable to both distill and communicate my thoughts and findings on it regardless. For posterity, I'm working off of 742bcc0.

Cairo 1.0 is the latest refactoring of the Cairo programming language and represents some significant technical changes to the Starkware ecosystem.

Apart from some blog posts, there's basically no documentation on it right now (there are docs in the monorepo but they're very sparse). This is to be expected as the Starkware team notoriously sophisticated and are constantly shipping new technology. Until the documentation catches up (which it is near daily), some concerted manual effort is needed to grok everything going on.

The new stack is entirely contained within this monorepo. There's a decent number of crates in here, but I think this captures the modularity very well. I encourage you to read through the code crate-by-crate if you want a thorough understanding (after all, we have no docs right now).

Of particular significance are the test cases for both the Cairo parser and the Sierra intermediate representation. Also, there are full LALRPOP specifications for Sierra in the sierra_generator crate.

Additionally, it looks like work has also begun on a standard library of sorts1.

Using the Tools

I literally crave operation.

$ git clone git@github.com:starkware-libs/cairo.git
$ cd cairo
$ cargo run
error: `cargo run` could not determine which binary to run. Use the `--bin` option to specify a binary, or the `default-run` manifest key.
available binaries: cairo-compile, cairo-language-server, cairo-run, cairo-test, formatter_cli, generate_syntax, sierra-compile, starknet-compile, starknet-sierra-compile

So, we have a lot of binaries to choose from. Let's compile some Cairo 1.0 code?

$ cargo run --bin cairo-compile
error: The following required arguments were not provided:
  <PATH>

Usage: cairo-compile <PATH> [OUTPUT]

For more information try '--help'

Let's just write some quick Cairo code:

func main() {
    let x: felt = 12;
    let y: felt = x + 2;
}
$ cargo run --bin cairo-compile prog.cairo
type [0] = felt;
type [1] = Struct<ut@Tuple>;

libfunc [2] = felt_const<12>;
libfunc [3] = felt_const<2>;
libfunc [4] = store_temp<[0]>;
libfunc [0] = felt_add;
libfunc [7] = drop<[0]>;
libfunc [1] = struct_construct<[1]>;
libfunc [5] = store_temp<[1]>;
libfunc [6] = rename<[1]>;

[2]() -> ([0]);
[3]() -> ([1]);
[4]([0]) -> ([0]);
[0]([0], [1]) -> ([2]);
[7]([2]) -> ();
[1]() -> ([3]);
[5]([3]) -> ([3]);
[6]([3]) -> ([4]);
return([4]);

[0]@0() -> ([1]);

Woah! Spoiler alert: it turns out that this is what Sierra looks like.

Let's turn this into CASM2:

$ cargo run --bin sierra-compile prog.sierra prog.casm
$ cat prog.casm
[ap + 0] = 12, ap++;
ret;

This output should look more familiar -- this will be directly interpreted by the Cairo VM (a "runner" in Starkware parlance).

Crate Walkthrough

The syntax_codegen Crate

Follow along.

This crate is really interesting. This crate allows us to write syntax specifications -- for any language -- entirely in Rust and then generate a parser according to this specification. In short, it's a parser-generator. This is what a specification looks like and this is the specific specification for Cairo 1.0. This is the code that generates a parser from a specification.

The sierra_generator Crate

Follow along.

This crate handles the Sierra to Cairo leg of the journey. The guts of it are here:

pub fn get_sierra_program(
    db: &dyn SierraGenGroup,
    requested_crate_ids: Vec<CrateId>,
) -> Maybe<Arc<sierra::program::Program>> {
    let mut requested_function_ids = vec![];
    for crate_id in requested_crate_ids {
        for module_id in db.crate_modules(crate_id).iter() {
            for (free_func_id, _) in db.module_data(*module_id)?.free_functions {
                requested_function_ids.push(free_func_id)
            }
        }
    }
    db.get_sierra_program_for_functions(requested_function_ids)
}

The sierra Crate

Follow along.

This crate is entirely automatically generated from the above crate, which is very cool. Despite being autogenerated, it's still very informative to read through manually.

The centrepiece of this crate is the parser-generator definition for Sierra, written in LALRPOP. This gives us automatic generation of a Rust-language LALR(1) parser.

The Program type is definitely the next most valuable takeaway from this crate, in my opinion:

/// A full Sierra program.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Program {
    /// Declarations for all the used types.
    pub type_declarations: Vec<TypeDeclaration>,
    /// Declarations for all the used library functions.
    pub libfunc_declarations: Vec<LibFuncDeclaration>,
    /// The code of the program.
    pub statements: Vec<Statement>,
    /// Descriptions of the functions - signatures and entry points.
    pub funcs: Vec<Function>,
}

Another great part of this crate is the collection of pure-Sierra examples, like the recursive Fibonacci one.

The parser Crate

Follow along.

Hands down the most valuable thing in this crate (until official Cairo 1.0 tutorials are published, anyway) are the test cases -- you can basically teach yourself the new language!

The top-level for parsing is in Parser::parse_syntax_file:

pub fn parse_syntax_file(mut self) -> SyntaxFileGreen {
    let items = ItemList::new_green(
        self.db,
        self.parse_list(Self::try_parse_top_level_item, is_of_kind!(), "item"),
    );
    // This will not panic since the above parsing only stops when reaches EOF.
    assert_eq!(self.peek().kind, SyntaxKind::TerminalEndOfFile);

    // Fix offset in case there are skipped tokens before EOF. This is usually done in
    // self.take_raw() but here we don't call self.take_raw as it tries to read the next
    // token, which doesn't exist.
    self.offset += self.current_width;

    let eof = self.add_trivia_to_terminal::<TerminalEndOfFile>(self.next_terminal.clone());
    SyntaxFile::new_green(self.db, items, eof)
}

One really fascinating technique is the use of red-green trees -- a data structure I was unaware of until encountering this part of the codebase. This is why so many types in this crate have the Green suffix.

1

I started reading through the codebase when it was initially open sourced and it looks like the core library stuff is very recent.

2

I really love that the Cairo compiler outputs to stdout by default; I don't get the inconsistency with the Sierra compiler though. Additionally, the Sierra compiler doesn't seem to recognise - as an output file either.