Eric Paul

Some names of Lean tactics don’t work

2026-01-14T00:00:00+00:00

Take a look at this example:

syntax "test~" : tactic

example : True := by
  test~ -- error here!

We added a new parsing rule that says the token "test~" is a tactic. But we get the error unknown tactic underlining test. Why did this fail?

If we remove the tilde, then it does work. So we might guess that perhaps we can’t have tildes in names of things, but it does work basically everywhere else:

syntax "test~" : command

test~ -- no parsing error here

(This example adds a new parsing rule that says the token "test~" is a command. It then parses the following "test~" successfully as a command.)

So for some reason, our tilde is only failing in our tactic example. What’s going on?

Tactics are different

The names of tactics are indeed being registered differently inside of the parser. Here’s another way we can see this difference:

syntax "name" : tactic

def name := 3 -- no parsing error here

Here we’ve registered name as the name of a tactic and we were also able to make a variable name as well.

But if we do this with a term instead of a tactic, it fails:

syntax "name" : term

def name := 3 -- error here!

We get an error on the definition saying that it expected an identifier but got name.

This is in fact the core difference between tactic and other categories that caused our original error: the names of tactics do not become reserved keywords.

How do reserved keywords work?

The parser stores a set of reserved keywords. Every time we add a new syntax rule in a normal category, any words used in it become reserved keywords.

So when we wrote

syntax "test~" : command

test~ -- no parsing error here

a new reserved keyword "test~" was added to the parser. The parser then looks for the next token it needs to parse. The next token function looks for the longest identifier or reserved keyword it can find. The longest identifier is "test" because "~" is not a valid symbol within an identifier. The longest reserved keyword is "test~" and since this is longer than the longest identifier, the next token is "test~". Then because we added a syntax rule that says the token "test~" is a command, it is successfully parsed as a command.

But the tactic category is special and does not make the first word used in any of its rules a reserved keyword. Thus, when we wrote

syntax "test~" : tactic

example : True := by
  test~ -- error here!

we did not add a new reserved keyword. Once again, the parser then looks for the next token it needs to parse by finding the longest identifier or reserved keyword. The longest identifier is again "test" since "~" cannot be in an identifier. But now there is no longer any reserved keyword applicable and so the next token is "test". Since there is no parsing rule that says "test" is a valid tactic, we get an error saying that "test" is an unknown tactic.

So to recap, there are characters like "~" that can appear in keywords but not identifiers. Since the tactic category does not make the first word of any of its parsing rules a reserved keyword, if the first word contains a character that can’t appear in an identifier, it won’t be found by the next token parsing function and so will fail to be a valid tactic name.

And that’s our answer!

Some extra details

We can actually tell Lean to not reserve a keyword explicitly. For example, in the command category, the following does reserve a keyword:

syntax "test" : command

def test := 3 -- error here!

and so we get an error in the definition as the token test is now a reserved keyword instead of identifier.

If we add an ampersand before the text, it tells Lean to not reserve a keyword:

syntax &"test" : command

def test := 3 -- no error here

and so we do not get an error as test is not a reserved keyword.

The tactic category just automatically adds that ampersand to the first word in the syntax rule for us. But on top of automatically adding the ampersand, the tactic category has another special feature!

Take a look at this example:

syntax &"test" : command

test -- error here!

We are saying that "test" is a command and the ampersand is telling Lean not to make it a reserved keyword. We then get an error on test saying unexpected identifer; expected command. What went wrong here?

Well when we write the normal line syntax "test" : command, it tells Lean that if the next token is the keyword "test", then it is a command and that "test" is now a reserved keyword. When we add an ampersand, it is not longer a reserved keyword, but the parsing rule still only triggers when the next token is the keyword "test", not the identifier "test".

So when the parser reaches the input test in the example, the next token function is called. As before, it tries to find the longest identifier or the longest reserved keyword. The longest identifier is "test" and there is no longest reserved keyword due to the ampersand. So the next token is the identifier "test" and not the keyword "test". Thus, our parsing rule is never triggered as it expected the keyword "test" and we get an error.

But this error doesn’t occur in the tactic category! The tactic category automatically adds ampersands to the first word in the parsing rule and so we’d expect the parsing rule to similarly never trigger and so see an error. However, on top of adding ampersands automatically, the tactic category also tells the parser to attempt the parsing rule on identifiers as well as keywords.

Knowing this, let’s walk through what happens in this successful example:

syntax "test" : tactic

example : True := by
  test -- no parsing error here

When the parser reaches test, it calls the next token function. The next token function tries to find the longest identifier or reserved keyword. Since we are in the tactic category, our new parsing rule did not make "test" a reserved keyword. Thus, the next token function says that the next token is the identifier "test". Now our added parsing rule only applies to the keyword "test", but since we are in the tactic category, parsing rules that apply to the keyword "test" are also attempted for the identifier "test". So our added parsing rule is run and it says that "test" is a tactic. So we succeed in parsing.

Lean calls this feature the leading identifier behavior.

Having default leading identifier behavior means that if the next token is an identifier, only parsing functions for identifiers are tried.
Having symbol leading identifier behavior means that if the next token is an identifier, then parsing functions for the corresponding keyword are not only tried, but also prioritized over any identifier parsing functions.
Having both leading identifier behavior means that if the next token is an identifier, then parsing functions for the corresponding keyword are tried and are on equal footing with identifier parsing functions.

And any category whose leading identifier behavior is not default, also gets the ampersands automatically inserted for the first word in any of its parsing rules. So the complete and short description of what makes the tactic category different from other categories is that its leading identifier behavior is both instead of default.

How Lean tracks your definitions

2025-12-08T00:00:00+00:00

The big idea is that Lean has a dictionary that maps names of previously defined constants to their definitions. When it encounters a constant, it looks up what the constant means inside of this dictionary. Very reasonable.

Here’s an example of it working:

import Lean
open Lean

def test := "hi"

run_meta
  let env : Environment ← getEnv
  
  let info := env.find? ``test |>.get!
  Lean.logInfo m!"{info.type}"

We define a new constant test which Lean stores in its dictionary. Then we get the environment (the environment has our dictionary), look up the name test inside of the environment dictionary, and print some of the information we get back. Great.

Now here’s a strange situation where things don’t work:

import Lean
open Lean

def test := "hi"

run_meta
  let env : Environment ← getEnv
  let kernelEnv : Kernel.Environment := env.toKernelEnv
  let env : Environment := Environment.ofKernelEnv kernelEnv

  let info := env.find? ``test |>.get!
  Lean.logInfo m!"{info.type}"

Here it fails to find the information about test. That is not ideal. But ok we’ve added some weird stuff at the start. What’s going on?

The two environments

We begin by getting the environment env which has type Environment. We then convert it into kernelEnv which has type Kernel.Environment. Lastly, we convert it into env which again has type Environment.

The type Kernel.Environment is the final actual environment that Lean produces. However, Lean doesn’t actually process each of our definitions one after another: it tries to process them asynchronously. And that’s where Environment comes into play. The Environment tracks all these asynchronous things happening and builds up Kernel.Environment as things are completed.

When we convert env into kernelEnv, it tells Lean to wait for all the asynchronous things to finish so that we can have a Kernel.Environment representing all the constants that have been defined so far. Then when we convert kernelEnv back into env, it just sets all the extra asynchronous tracking stuff to be empty since nothing asynchronous is going on.

So why would this conversion back and forth mean we can’t find our defined constant test anymore? One might wonder if perhaps our code made Lean forget about all the constants. But the following code does work:

import Lean
open Lean

run_meta
  let env ← getEnv
  let kernelEnv := env.toKernelEnv
  let env := Environment.ofKernelEnv kernelEnv

  let info := env.find? ``String |>.get!
  Lean.logInfo m!"{info.type}"

Here we just tried to find the information about the definition of String and it succeeds. So our messing with the environment is forgetting test but not String?

The staged map

It turns out that Lean doesn’t actually store all the constants we’ve defined in a single dictionary. It splits things into two dictionaries! The first dictionary stores all the constants defined in stuff that we’re importing while the second dictionary stores all the constants we’re defining locally. And so String is in the first dictionary and test is in the second. Somehow our environment conversions are losing everything in the second dictionary of local constants.

But wait why does Lean split things up into two dictionaries? The answer is speed and the key is that the first dictionary is implemented as a hashmap while the second as a hash trie.

The first hashmap has type Std.HashMap. Let’s look at how that works. This is the hashmap implementation that one is typically going to use in Lean. Since things are immutable in a functional language like Lean, inserting into an Std.HashMap should be pretty slow as it has to create an entirely new array every time we insert. But Lean is clever. So long as we only have a single reference to an array, it will do the update mutably and thus quickly. This works great for reading in the constants from imported modules. They’ve already been processed so Lean just reads them all in quickly into the Std.HashMap doing mutable updates.

In the current file, each definition has to be fully processed before adding itself to the list of constants and that’s why Lean does all this asynchronous stuff. However, the asynchronous things happening means that there is more than one reference to the dictionary storing these constants and so using Std.HashMap would be slow. That’s where the PersistentHashMap comes into play. The way it is implemented is called a hash trie and it looks like a shallow and wide tree. Insertions just have to update a single path in the tree and all the rest of the tree can be reused. Thus, when there are many references to the dictionary and things need to be copied, the PersistentHashMap is faster.

Ok so we understand that Lean stores imported constants in one dictionary and locally defined constants in another in order to be fast. But the question still remains: why does going from Environment to Kernel.Environment back to Environment lose all the constants in the dictionary used for locally defined constants?

Finding a constant in the maps

When you have a Kernel.Environment and try to look for a constant in it, Lean checks for the constant in both of the dictionaries it has. However, if you have an Environment there might be constants that have been defined but are still being processed asynchronously. In order to handle this, Lean has yet another dictionary! This dictionary is called asyncConstsMap. Each definition first adds itself immediately to the asyncConstsMap with its associated value being a promise (accessing a promise blocks until its asynchronous processing is completed). And then once the processing is complete, the constant is added to the dictionary in the Kernel.Environment that stores locally defined constants.

So when we try to find a constant in the Environment, it first checks the Kernel.Environment that it has built up so far for the constant and then checks to see if the constant is inside asyncConstsMap. But every constant that ends up in the second dictionary in Kernel.Environment was first added to asyncConstsMap. So in order to avoid looking through these constants twice, Lean does not check the second dictionary of the Kernel.Environment.

And so there is root of our issue. Lean assumes that every locally defined constant that appears in the second dictionary in its current Kernel.Environment also appears in asyncConstsMap. When we turn our Environment into a Kernel.Environment and then back into an Environment, we lose everything inside of asyncConstsMap and break that invariant.

To drive the point home, let’s modify our failing example:

import Lean
open Lean

def test := "hi"

run_meta
  let env : Environment ← getEnv
  let kernelEnv : Kernel.Environment := env.toKernelEnv
  let env : Environment := Environment.ofKernelEnv kernelEnv

  -- Succeeds
  let info := kernelEnv.find? ``test |>.get!
  Lean.logInfo m!"{info.type}"

  -- Fails
  let info := env.find? ``test |>.get!
  Lean.logInfo m!"{info.type}"

If we look for test inside of kernelEnv it succeeds because it checks both dictionaries. So indeed the final env knows about test but fails to find it due to the broken invariant.

I hate it when my code parses

2025-11-21T00:00:00+00:00

Take a look at this Lean code: Link to code editor

import Lean

run_meta Lean.modifyEnv fun env =>
  Lean.Parser.parserExtension.modifyState env fun _ => {}

def test := "hello"

Lines 3-4 delete all the parsing rules. And then on line 6, we get a parsing error on the keyword def.

Wait what? Yep, after line 4 you will always get a parse error as we have deleted all the parsing rules.

Why would we want this? While you might not want to do exactly this, the ability to easily modify what Lean is able to parse is what lets people add nice mathematical notation.

How does this work?

Lean tracks all the state it cares about during the processing of your file inside of something it calls the environment. Among many other things, the environment stores the parsing rules.

Then we use the run_meta command and provide it with some code. Any code provided to run_meta will be given access to the current environment. The code we passed in to run_meta modifies the current environment by deleting all the information that the parser has.

And that’s it!

Perhaps a more typical usage

Normally we don’t directly interact with the environment but instead use nice higher level tools that do that for us. Take a look at this example: Link to code

def test := 3 + ~

Lean is quite reasonably complaining that ~ is not a token it recognizes. Now let’s modify the parsing rules so that it is a token: Link to code

syntax "~" : term

def test := 3 + ~

Ok we’re still getting an error. But it’s no longer a parsing error! Lean is successfully recognizing that ~ is a valid term. Lean is now complaining that it has no idea what ~ means.

Although syntax is a nice high level command, we now know that it eventually becomes a function that takes in the current environment and adds ~ to the environment’s list of valid tokens.

You can modify hover info in Lean!

2025-10-28T00:00:00+00:00

When you hover over code in Lean, it shows some information: (That’s the result of me hovering over the text "hi" in this example command. It shows that its type is a String.)

I can modify what I do in the code defining #test and change what the hover says: I think that’s pretty neat.

How is this possible?

When you hover over something, some program is called whose job is to return the information about the text you are hovering above. That program needs to know what all the code in the file means in order to say what the type information is.

Lean’s metaprogramming means that in a Lean file we can control what a piece of syntax means. And thus, we control the hovers!

An example

Let’s see an example. Try hovering yourself in the online editor.

import Lean

elab "#test " e:term : command => return ()

#test "hi"

Here we define a new piece of syntax #test that takes in one argument e and overall it is a command. And then we provide what that code means, which here is nothing.

So on the next line #test "hi" does nothing. And when you hover over it, nothing is displayed because that code has no meaning. Even though it may look like "hi" is a string, it has no hover because we gave this code no meaning.

Now let’s give "hi" its normal meaning. Lean calls the process of giving code meaning elaboration. So we are going to elaborate e. Online editor

import Lean
open Lean Elab Command Term

elab "#test " e:term : command => liftTermElabM do
  let _ ← elabTerm e none

#test "hi"

Since we have called elabTerm e none, we can now hover over "hi" and see that its type is a String.

Let’s make it so that hovering over "hi" shows instead what would happen if we were hovering over 5. Online editor

import Lean
import Qq
open Lean Elab Term Command Qq

elab "#test " e:term : command => liftTermElabM do
  addTermInfo' e q(5)

#test "hi"

And when we hover over "hi" we now see the following:

What’s going on in this code? The program that handles the hover information is specifically getting information from what is called the InfoTree. We can add hover information to this tree with the function addTermInfo'. Here we said that the syntax e has the information for 5. And thus our hover shows 5 : Nat!

In more complicated code, the InfoTree is gradually built up as code is given meaning during elaboration. Since this example was simple, we were able to just make the InfoTree what we wanted it to be in one go.