
A Lexer And Parser For A Simple Language
A Lexer And Parser For A Simple Language
Have you ever paused to wonder what really happens under the hood when your code runs? I still remember the thrill and the confusion of my early coding days. Iâd write something that, to my surprise, actually worked, but I never really understood how the computer made sense of my instructions. It felt a bit like whispering secrets to a machine that knew nothing about my intent.
What if you give your friend instructions who insists on literal instructions to find his way? You couldnât just say, âGet to the store.â Instead, you had to map out every single move: âWalk two blocks, turn left at the corner, cross the street, and then youâll find it.â Thatâs pretty much what a lexer and parser do for your code. The lexer takes your instructions apart, breaking them into tiny, digestible pieces, while the parser reassembles those pieces into a plan that the computer can follow. Itâs a clever system that turns abstract ideas into precise actions, and yes, itâs pretty neat when you think about it.
Step 1: Lexing
First step: breaking your code into bite-sized pieces. Just like youâd break a sentence into words, the lexer breaks your code into what we call âtokensâ. Take MOVE 10 UP
â itâs three distinct pieces, right? The lexer identifies MOVE
as a command, 10
as a number, and UP
as a direction. These tokens are the building blocks for what comes next.
Hereâs a simple lexer in Rust:
#[derive(Debug, PartialEq)]
pub enum Token {
Command(String),
Number(i64),
Direction(String),
EOF,
}
Now weâll write the lexing function. It breaks down the input into tokens:
pub fn lex(input: &str) -> Vec<Token> {
let mut tokens = Vec::new();
let mut chars = input.chars().peekable();
while let Some(&ch) = chars.peek() {
match ch {
'A'..='Z' => {
let mut command = String::new();
while let Some('A'..='Z') = chars.peek() {
command.push(chars.next().unwrap());
}
if command == "MOVE" {
tokens.push(Token::Command(command));
} else {
tokens.push(Token::Direction(command));
}
}
'0'..='9' => {
let mut number = 0;
while let Some('0'..='9') = chars.peek() {
number = number * 10 + chars.next().unwrap().to_digit(10).unwrap() as i64;
}
tokens.push(Token::Number(number));
}
' ' => {
chars.next(); // Skip spaces
}
_ => panic!("Unexpected character: {}", ch),
}
}
tokens.push(Token::EOF);
tokens
}
Hereâs what happens during lexing:
Characters from the input are processed one at a time. Commands like MOVE
are categorized as Token::Command
, numbers are grouped into Token::Number
, and directions such as UP
are labeled as Token::Direction
. If an unexpected character appears, the program will panic and show an error message.
Step 2: Parsing
Now that our tokens are sorted, itâs time to turn them into something useful. Parsing organizes these tokens into a structure that represents what the program needs to do. For instance, it would combine the MOVE
command with its arguments to create a clear representation of the action.
For our movement commands, we want a structure that represents the action and its parameters. Hereâs how we define that structure:
#[derive(Debug)]
pub struct Command {
pub action: String,
pub value: i64,
pub direction: String,
}
Next, weâll create a parser that takes the tokens and builds the Command
struct:
pub struct Parser {
tokens: Vec<Token>,
position: usize,
}
impl Parser {
pub fn new(tokens: Vec<Token>) -> Self {
Parser { tokens, position: 0 }
}
fn current_token(&self) -> &Token {
&self.tokens[self.position]
}
pub fn parse(&mut self) -> Command {
let action = match self.current_token() {
Token::Command(action) => {
let action = action.clone();
self.position += 1;
action
}
_ => panic!("Expected a command, found {:?}", self.current_token()),
};
let value = match self.current_token() {
Token::Number(value) => {
let value = *value;
self.position += 1;
value
}
_ => panic!("Expected a number, found {:?}", self.current_token()),
};
let direction = match self.current_token() {
Token::Direction(direction) => {
let direction = direction.clone();
self.position += 1;
direction
}
_ => panic!("Expected a direction, found {:?}", self.current_token()),
};
Command {
action,
value,
direction,
}
}
}
When you feed the input MOVE 10 UP
to this parser, you get:
Command {
action: "MOVE",
value: 10,
direction: "UP",
}
This step ensures the program understands the intent behind the tokens, creating a clear and actionable structure.
Step 3: Executing the Command
Once we have a command, we can make it do something. Hereâs a simple way to execute it:
impl Command {
pub fn execute(&self) {
println!("Executing: {} {} {}", self.action, self.value, self.direction);
// Add actual logic here, like moving an object in a simulation.
}
}
fn main() {
let input = "MOVE 10 UP";
let tokens = lex(input);
let mut parser = Parser::new(tokens);
let command = parser.parse();
command.execute();
}
When you run this program, it prints:
Executing: MOVE 10 UP
.\
Whatâs Next?
This is where things start to open up. You could expand the lexer to recognize additional commands, like ROTATE
or STOP
. You could also enhance the parser to handle sequences of commands, making it capable of processing inputs like MOVE 10 UP; ROTATE 90 LEFT
. Adding error handling is another critical step, ensuring the program can guide users through mistakes rather than simply crashing.
The possibilities are vast. The more you experiment, the more youâll see how these tools can transform plain text into powerful functionality.
These skills can take you far, whether youâre building interpreters for scripting languages, processing data pipelines, or even creating the logic for game engines. Start small, try things out, and see where your experiments lead you.
You Might Also Like
Discover more articles related to your interests

A Better Way to Handle Errors in C++23
Learn more on modern aspects in C++23 for improved error handling, replacing exceptions and error codes with a cleaner, more robust approach.

The Door Wedge Is A Lesson in Cybersecurity
The common door wedge is a great analogy for cybersecurity: Learn more about balancing security and convenience, and how well-intentioned shortcuts can lead to unintended consequences.

When TLS Is Not Enough
A recent court ruling in Germany highlights the limitations of TLS encryption for email security and the need for end-to-end encryption.

Building Secure Password Generation Tools: My Tool npwg
Exploring the intricacies of password generation and how npwg aims to create secure and user-friendly passwords