Volker Schwaberow
Volker Schwaberow
A Lexer And Parser For A Simple Language

A Lexer And Parser For A Simple Language

February 11, 2025
6 min read
Table of Contents

A Lexer And Parser For A Simple Language

Have you ever paused to wonder what really happens under the hood when your code runs? I still remember the thrill and the confusion of my early coding days. I’d write something that, to my surprise, actually worked, but I never really understood how the computer made sense of my instructions. It felt a bit like whispering secrets to a machine that knew nothing about my intent.

What if you give your friend instructions who insists on literal instructions to find his way? You couldn’t just say, “Get to the store.” Instead, you had to map out every single move: “Walk two blocks, turn left at the corner, cross the street, and then you’ll find it.” That’s pretty much what a lexer and parser do for your code. The lexer takes your instructions apart, breaking them into tiny, digestible pieces, while the parser reassembles those pieces into a plan that the computer can follow. It’s a clever system that turns abstract ideas into precise actions, and yes, it’s pretty neat when you think about it.

Step 1: Lexing

First step: breaking your code into bite-sized pieces. Just like you’d break a sentence into words, the lexer breaks your code into what we call ‘tokens’. Take MOVE 10 UP – it’s three distinct pieces, right? The lexer identifies MOVE as a command, 10 as a number, and UP as a direction. These tokens are the building blocks for what comes next.

Here’s a simple lexer in Rust:

#[derive(Debug, PartialEq)]
pub enum Token {
    Command(String),
    Number(i64),
    Direction(String),
    EOF,
}

Now we’ll write the lexing function. It breaks down the input into tokens:

pub fn lex(input: &str) -> Vec<Token> {
    let mut tokens = Vec::new();
    let mut chars = input.chars().peekable();
 
    while let Some(&ch) = chars.peek() {
        match ch {
            'A'..='Z' => {
                let mut command = String::new();
                while let Some('A'..='Z') = chars.peek() {
                    command.push(chars.next().unwrap());
                }
                if command == "MOVE" {
                    tokens.push(Token::Command(command));
                } else {
                    tokens.push(Token::Direction(command));
                }
            }
            '0'..='9' => {
                let mut number = 0;
                while let Some('0'..='9') = chars.peek() {
                    number = number * 10 + chars.next().unwrap().to_digit(10).unwrap() as i64;
                }
                tokens.push(Token::Number(number));
            }
            ' ' => {
                chars.next(); // Skip spaces
            }
            _ => panic!("Unexpected character: {}", ch),
        }
    }
 
    tokens.push(Token::EOF);
    tokens
}

Here’s what happens during lexing: Characters from the input are processed one at a time. Commands like MOVE are categorized as Token::Command, numbers are grouped into Token::Number, and directions such as UP are labeled as Token::Direction. If an unexpected character appears, the program will panic and show an error message.

Step 2: Parsing

Now that our tokens are sorted, it’s time to turn them into something useful. Parsing organizes these tokens into a structure that represents what the program needs to do. For instance, it would combine the MOVE command with its arguments to create a clear representation of the action.

For our movement commands, we want a structure that represents the action and its parameters. Here’s how we define that structure:

#[derive(Debug)]
pub struct Command {
    pub action: String,
    pub value: i64,
    pub direction: String,
}

Next, we’ll create a parser that takes the tokens and builds the Command struct:

pub struct Parser {
    tokens: Vec<Token>,
    position: usize,
}
 
impl Parser {
    pub fn new(tokens: Vec<Token>) -> Self {
        Parser { tokens, position: 0 }
    }
 
    fn current_token(&self) -> &Token {
        &self.tokens[self.position]
    }
 
    pub fn parse(&mut self) -> Command {
        let action = match self.current_token() {
            Token::Command(action) => {
                let action = action.clone();
                self.position += 1;
                action
            }
            _ => panic!("Expected a command, found {:?}", self.current_token()),
        };
 
        let value = match self.current_token() {
            Token::Number(value) => {
                let value = *value;
                self.position += 1;
                value
            }
            _ => panic!("Expected a number, found {:?}", self.current_token()),
        };
 
        let direction = match self.current_token() {
            Token::Direction(direction) => {
                let direction = direction.clone();
                self.position += 1;
                direction
            }
            _ => panic!("Expected a direction, found {:?}", self.current_token()),
        };
 
        Command {
            action,
            value,
            direction,
        }
    }
}

When you feed the input MOVE 10 UP to this parser, you get:

Command {
    action: "MOVE",
    value: 10,
    direction: "UP",
}

This step ensures the program understands the intent behind the tokens, creating a clear and actionable structure.

Step 3: Executing the Command

Once we have a command, we can make it do something. Here’s a simple way to execute it:

impl Command {
    pub fn execute(&self) {
        println!("Executing: {} {} {}", self.action, self.value, self.direction);
        // Add actual logic here, like moving an object in a simulation.
    }
}
 
fn main() {
    let input = "MOVE 10 UP";
    let tokens = lex(input);
    let mut parser = Parser::new(tokens);
    let command = parser.parse();
 
    command.execute();
}

When you run this program, it prints:
Executing: MOVE 10 UP.\

What’s Next?

This is where things start to open up. You could expand the lexer to recognize additional commands, like ROTATE or STOP. You could also enhance the parser to handle sequences of commands, making it capable of processing inputs like MOVE 10 UP; ROTATE 90 LEFT. Adding error handling is another critical step, ensuring the program can guide users through mistakes rather than simply crashing. The possibilities are vast. The more you experiment, the more you’ll see how these tools can transform plain text into powerful functionality.

These skills can take you far, whether you’re building interpreters for scripting languages, processing data pipelines, or even creating the logic for game engines. Start small, try things out, and see where your experiments lead you.