The Best Way for a Programming Language to Do Variables (so far): The ML Way

Max Heiber
6 min readMar 19, 2023
scope

The design space for variables in programming languages is large, but so far (afaik) only one good solution has been found, and many bad solutions. I’ll try to make this post somewhat objective by giving reasonable criteria for how variables should work and showing how the bad choices are bad.

It’s hopefully uncontroversial that a programming language’s variable design should be:

  • Safe. That is, avoid footguns.
  • Predictable. That is, avoid confusion.

A variable design needs to answer at least the following questions:

  • (A). What kinds of shadowing to allow
  • (B). Whether to allow reassignment
  • (C). Whether to distinguish declaration from use

I’ll go through A-through-C and then summarize. The TLDR is that the way OCaml and similar languages do it is the best that’s been discovered so far.

A: What kinds of shadowing to allow

Shadowing is when a new variable is introduced s.t. a previous variable with the same name is no longer visible. Here are two examples of shadowing:

// javascript
let x = true;
function foo() {
let x = 3;
// the first `x` is not accessible here
}
foo();
console.log(x) // true
(* ocaml *)
let () =
let x = true in
let x = 3 in
(* the first `x` is not accessible here *)
x

It’s rare to find a language with variables that doesn’t allow some form of shadowing. Where languages differ is where shadowing is allowed.

If we port the OCaml example to JavaScript, we get an error:

// javascript
let x = true;
let x = 3; // SyntaxError: redeclaration of let x

Not allowing shadowing everywhere is less safe than allowing shadowing. The problem comes up the most when doing “functional updates”, which can produce a lot of intermediate variables. It’s common to see Erlang code like this:

% erlang
example() ->
State1 = init(),
State2 = foo(State1),
State3 = bar(State2),
io:format("State3 is ~s~n", State3),
State2. % implicit return. The author probably meant to return `State2`

It’s quite easy to accidentally reference the wrong State. I've seen this kind of bug in both Erlang and Scala.

Shadowing improves safety because variables you no longer care about become inaccessible, so there is no danger of accidentally using them. Here’s the example directly ported to OCaml:

(* ocaml, but not very idiomatic *)
let example () =
let state = State.init () in
let state = foo state in
let state = bar state in
Printf.printf "state is %s\n" (State.show state);
state

Allowing shadowing everywhere seems to be better w.r.t. safety.

B: Whether to allow reassignment

In some languages reassignment looks a lot like shadowing, so first I’ll illustrate the difference.

Here the x in the loop is distinct from the x outside the loop. This is shadowing:

// javascript
let x = 1;
for (const i of [1, 2, 3]) {
let x = 20 + i;
}
console.log(x); // 1

Here the the loop reassigns to the x that is outside the loop. This is reassignment:

// javascript
let x = 1;
for (const i of [1, 2, 3]) {
x = 20 + i;
}
console.log(x); // 23

Reassignment is subtle. It’s mutation of a mapping of names to values. Misunderstanding reassignment is surprisingly common:

If a language already has mutable values, it doesn’t need to also have mutable mappings of names to values within scopes. One concept is enough, and a lot less confusing. Here are the two JS examples above, ported to OCaml:

(* ocaml: shadowing *)
let () =
let x = 1 in
for i = 1 to 3 do
let x = 20 + i in
let _ = x in (* avoid unused variable warning *)
()
done;
print_int x (* prints 1 *)

Compared with the JS version of the shadowing example, it’s much more obvious what’s happening in the OCaml, since there is no reassignment.

Here’s how to get the same expressiveness and convenience of reassignment, without the footguns:

(* ocaml: references *)
let () =
let x = ref 1 in
for i = 1 to 3 do
x := 20 + i;
done;
print_int !x (* prints 23. `!` is the dereferencing operator *)

OCaml references aren’t special-cased, they’re just mutable records. Unfolding the definitions of := and ! gives us:

(* ocaml: desugared references *)
type 'a ref = { mutable contents: 'a }
let ref a = { contents = a }

let () =
let x = ref 1 in
for i = 1 to 3 do
x.contents <- 20 + i
done;
print_int x.contents (* prints 23 *)

Refs are so useful that they show up even in langauges with reassignment. For example, (Hack) and (JavaScript).

Not having reassignment is better w.r.t. avoiding confusion. It’s also simpler: we can get the same expressivity with fewer concepts.

C: Whether to distinguish declaration from use

Declaring a variable makes a new variable in the scope. Using a variable includes referencing the variable and reassinging, if the language allows reassignment.

This distinction comes “for free” if a language does not have reassignment and disallows referencing undeclared variables, so if you’re designing a language and already made the right choice for (B) you can stop here.

Not distinguishing declaration from use is dangerous and confusing. I’ll explain by example. For each of the following Python programs, try to figure out what is printed. I’ll give the answers below.

v = [1]
def foo():
v.append(2)
v = []
foo()
print(v) # python q1: what is printed?
def foo():
x = 1
def bar():
x = 2
bar()
print(x) # python q2: what is printed?
foo()
def foo():
x = 1
def bar():
nonlocal x
x = 2
bar()
print(x) # python q3: what is printed?
foo()
x = 1
def bar():
nonlocal x
x = 2
bar()
print(x) # python q4: what is printed?
from random import choice
if choice([True, False]):
x = 3
print(x) # python q5: what is printed?
from random import choice
# adapted from https://twitter.com/elfprince13/status/1572944786763530241
def main():
e = None
while e is None:
try:
if choice([True, False]):
raise Exception("")
except Exception as e:
pass
main()
x = 3
print(x) # python q6: what is printed?

Here are the answers, tested using Python 3.8.9:

  • q1: nothing. There is an UnboundLocalError. Even though Python usually behaves like a line-by-line interpreter, the v = [] on the last line of the function has the effect of reaching back in time and changing the meaning of the first line of the function.
  • q2: “1”
  • q3: “2”
  • q4: nothing. There is a SyntaxError: "no binding for nonlocal x found". Moving code between a nested function scope and a top-level function scope can lead to a runtime error if you forget to change nonlocal to global.
  • q5: Either “5” or a NameError depending on your luck.
  • q6: nothing. There is an UnboundLocalError: e referenced before assignment

Not distinguishing declaration from use doesn’t imply exactly the same weird behavior as in Python, but that’s even worse: the rules for what’s an assignment and what’s a reassignment vary a lot across languages that fail to make this distinction.

Distinguishing between declaration and use is better than conflating declaration and use for both safety and predictability.

Conclusion

It’s best for a programming language to:

  • Allow shadowing variables liberally.
  • Not allow reassigning to variables. Shadowing and refs get the job done with fewer concepts and greater clarity.
  • Distinguish declaration from use. This comes for free if you don’t have reassignment.

--

--