← writings

What actually happens between writing code and running it?

Before the nerd stuff: happy 4th of July. America turns 250 today, which is a genuinely wild number. I'm not a citizen, but I live here and I get to build things here, and I don't take that lightly. Big day for this country. Enjoy the fireworks, then come learn about compilers.

A few weeks ago I asked what I was sure was a dumb question. I was learning Rust, and everyone kept telling me to install Clippy, which is a linter. I couldn't figure out why. The Rust compiler is the strictest piece of software I have ever interacted with. It checks my types, it tracks who owns every piece of memory, it rejects code for reasons I need two browser tabs to understand. If that thing already approves my code, what exactly is left for a linter to check?

That question had an answer. The answer had a hole in it, so I asked another question. That one led to another. Six or seven questions later I was reading about CPU instruction sets at 1am, and somewhere along the way, one of the first "facts" I was ever taught about programming had quietly fallen apart.

The thing every tutorial teaches

You've seen this taxonomy. C and Rust are compiled languages. Python and JavaScript are interpreted languages. Java is somewhere awkward in the middle. Every intro course teaches it, every interviewer half expects it, and it's genuinely useful for about five minutes.

Then you look slightly closer and it starts leaking. Python ships with a compiler. You can literally ask it to show you the bytecode it compiles your functions into. JavaScript engines like V8 compile your code into actual machine code while your page is running. And Java gets compiled to bytecode before it ever runs, then interpreted, then compiled again into machine code while it runs. The neat little buckets don't survive contact with how any of these tools actually work.

The tutorial taxonomy puts C, C++ and Rust in a compiled bucket and Python and JavaScript in an interpreted bucket, with Java straddling both. But Python ships with a compiler, JavaScript compiles to machine code at runtime, and Java is all of the above. "compiled languages" C C++ Rust "interpreted languages" Python (ships with a compiler) JavaScript (compiles to machine code at runtime) Java ...? compiled AND interpreted AND JIT compiled, in one run
The taxonomy every tutorial teaches. It holds up right until you look at how any of these languages actually run.

Here's the thing though. The labels aren't exactly wrong. They're answers to a question that was framed wrong. "Compiled or interpreted" sounds like a fact about a language, like whether it uses curly braces. It isn't. Compiling and interpreting are things that tools do to your program, and most modern languages get some mix of both depending on which tool is running them and when. Asking "is Python compiled?" is a bit like asking "is bread toasted?" Depends on what you do with it.

The model that actually holds up

So here's the picture that survived all my questions, and every part of this series is one stop on it. No matter the language, your program starts its life as text and ends as instructions executing on a physical CPU. Between those two points, three broad things happen.

First, the tools have to understand the text. Your file gets chopped into meaningful pieces, the pieces get assembled into a structure, and the structure gets checked to make sure it actually means something. Then that structure gets transformed, step by step, into forms that are less and less like what you wrote and more and more like what a machine wants. And finally something has to run it, and there are really only three strategies: translate everything to machine code ahead of time, have a program step through it on the machine's behalf, or start stepping through it and translate the busiest parts on the fly.

Source code flows down through three stages: understanding via lexer, parser and semantic checks, then lowering through intermediate forms, then one of three execution strategies, ahead of time compilation, interpretation in a virtual machine, or just in time compilation, all ending at the CPU. let answer: i32 = 40 + 2; 1 · understand it lexer finds the pieces · parser builds the tree semantic checks make sure it all means something 2 · lower it the tree becomes simpler, more machine shaped forms, one step at a time (intermediate representations, bytecode) 3 · run it ahead of time · compile everything, then run it (Rust, C, Go) interpret · a virtual machine walks through it step by step (CPython) just in time · interpret, but compile the hot parts mid run (V8, JVM) CPU
Every language you've ever used goes through some version of this pipeline. The main difference between them is which exit they take at step 3.

Once you have this picture, the old labels shrink into shorthand. "Rust is compiled" just means Rust's toolchain takes the ahead-of-time exit. "Python is interpreted" means the standard Python takes the virtual machine exit, right after compiling your code to bytecode, which is the part the label conveniently hides. The labels describe exits from a pipeline. The tutorials taught us the exits and skipped the pipeline.

The series

The reason this is a series and not one post is that every stage of that pipeline is something I had a real, specific question about, and each one deserves a proper answer instead of a paragraph. Six parts, in the order the questions actually happened to me:

  1. Why does Rust need a linter if the compiler is so smart? What compilers actually promise, what linters are actually for, and how a linter reads your code.
  2. The compiler sees characters, not code. Lexers, tokens, parsers, and the tree that everything else is built on.
  3. What "compiling" actually means. The pipeline inside the compiler, and the middle layers nobody mentions.
  4. Bytecode is not machine code. Virtual machines, interpreters, and what JIT actually is.
  5. Is Python compiled? Yes. Is it interpreted? Also yes. Languages versus implementations, and why the old question was broken.
  6. Why your program won't run on my machine. Machine code, executables, and what cross-platform really means.

My rule for the whole series: no step gets skipped, and no step gets explained using a word I haven't already explained. If I do my job right, by the end "compiled versus interpreted" won't be confusing anymore. It'll just look small.

Start with part 1. It begins, like most good rabbit holes, with me being confused about a tool everyone else seemed perfectly fine with.

← more writings