Interpreter Architecture

AIScript's interpreter architecture follows a traditional compilation pipeline with modern enhancements for flexibility, performance, and AI integration capabilities. This chapter aims to help new contributors understand the system's architecture, key components, and how they interact.

Overview

The AIScript interpreter follows these main stages:

Lexical Analysis → Parsing → Type Checking → Code Generation → Virtual Machine Execution

This design allows for clear separation of concerns while maintaining flexibility for language evolution. Let's explore each component in detail.

Lexical Analysis (Lexer)

The lexer is the first stage of compilation, responsible for converting source code text into a stream of tokens. Each token represents a meaningful unit in the language (like keywords, identifiers, operators, and literals).

// Example of how the lexer works:
// Input: "let x = 10 + 20;"
// Output: [Token(Let), Token(Identifier, "x"), Token(Equal), Token(Number, "10"), 
//          Token(Plus), Token(Number, "20"), Token(Semicolon)]

Key responsibilities:

Breaking source code into tokens
Handling string/numeric literals
Managing line numbers for error reporting
Skipping whitespace and comments
Recognizing keywords and operators

Important structures in the lexer:

TokenType enum: Defines all possible token types
Token struct: Contains the token type, lexeme (original text), and line number
Scanner struct: Manages the scanning state and provides methods for token consumption

Parsing

The parser converts the token stream into an Abstract Syntax Tree (AST), which represents the hierarchical structure of the program. AIScript uses a recursive descent parser with Pratt parsing for expressions.

Key components:

AST node definitions
Parsing functions for statements and expressions
Precedence handling for operators
Error recovery mechanisms
Type annotation handling

The parser also performs some early validation, such as:

Checking for valid syntax
Validating enum variants
Ensuring valid function declarations
Validating match patterns

Type Checking and Resolution

AIScript includes a type checking system (ty/resolver.rs) that validates types at compile time when possible. This improves error detection before runtime and enables better performance optimizations.

Main features:

Type annotation validation
Class and enum type checking
Validation of object literals against class definitions
Function parameter type checking
Error type validation

The type resolver is introduced early in the parsing phase to catch type errors as soon as possible.

Code Generation

The code generator transforms the AST into bytecode that can be executed by the virtual machine. This phase also performs several optimizations.

Key aspects:

Generation of VM opcodes from AST nodes
Handling variable scope and closures
Managing function parameters and defaults
Implementing control flow (if/else, loops, etc.)
Error handling code generation
Enum and class compilation

The code generator produces a set of functions with associated bytecode chunks, which are then executed by the VM.

Virtual Machine

The virtual machine is a stack-based interpreter that executes the generated bytecode. It maintains execution state and provides runtime facilities like garbage collection.

Important components:

Call frames for function invocation
Value stack for computations
Global and local variable storage
Upvalue handling for closures
Garbage collection (via gc_arena)
Runtime error handling

The VM also handles built-in functions, modules, and AI operations.

Value Representation

Values in AIScript are represented using a tagged union approach, allowing efficient storage and manipulation of different data types:

pub enum Value<'gc> {
    Number(f64),
    Boolean(bool),
    String(InternedString<'gc>),
    IoString(Gc<'gc, String>),
    Closure(Gc<'gc, Closure<'gc>>),
    NativeFunction(NativeFn<'gc>),
    Array(GcRefLock<'gc, Vec<Value<'gc>>>),
    Object(GcRefLock<'gc, Object<'gc>>),
    Enum(GcRefLock<'gc, Enum<'gc>>),
    EnumVariant(Gc<'gc, EnumVariant<'gc>>),
    Class(GcRefLock<'gc, Class<'gc>>),
    Instance(GcRefLock<'gc, Instance<'gc>>),
    BoundMethod(Gc<'gc, BoundMethod<'gc>>),
    Module(InternedString<'gc>),
    Agent(Gc<'gc, Agent<'gc>>),
    Nil,
}

This design allows for efficient operations while supporting garbage collection and reference semantics.

Memory Management

AIScript uses the gc_arena crate for memory management, which provides:

Tracing garbage collection
Memory safety through lifetime parameters
Efficient allocation and collection
Cycle detection

All heap-allocated objects are wrapped in Gc or GcRefLock pointers, allowing the garbage collector to track and manage memory.

AI Integration

AIScript has special handling for AI operations:

prompt for sending requests to AI models
Agent system for complex AI interactions
AI function compilation and execution

OpCode System

AIScript uses a bytecode instruction set defined in chunk.rs:

pub enum OpCode {
    Constant(u8),     // Load constant value
    Return,           // Return from function
    Add, Subtract,    // Arithmetic operations
    GetLocal(u8),     // Get local variable
    SetLocal(u8),     // Set local variable
    // ... many more instructions
}

Each instruction operates on the VM's stack and affects program execution flow.

How to Contribute

Now that you understand the architecture, here are some ways to contribute:

Start Small: Look for issues labeled "good first issue" in our GitHub repository.
Improve Error Messages: Clear error messages help users debug their code. The parser and VM error systems are good places to contribute improvements.
Add Language Features: New syntax features typically require changes to the lexer, parser, code generator, and VM.
Optimize Performance: Look for opportunities to improve bytecode generation or VM execution.
Enhance Type System: Contribute to the type resolver to improve static analysis capabilities.
Fix Bugs: Bug fixes are always valuable contributions.

Before making significant changes, please open a GitHub issue to discuss your approach with the community. This ensures your efforts align with the project's goals and direction.

Development Workflow

Fork the repository on GitHub
Create a feature branch
Make your changes, following our code style
Add tests for your changes
Run the existing test suite to ensure nothing breaks
Submit a pull request with a clear description of your changes

Code Organization Conventions

Each major component has its own module
Opt for composition over inheritance
Prefer immutable data where possible
Use descriptive naming
Document public APIs with comments
Follow Rust's naming conventions

The AIScript interpreter is designed to be modular and extensible, making it possible for contributors to work on different parts independently. We're excited to see what you'll build with us!

ON THIS PAGE

Interpreter Architecture#

Overview#

Lexical Analysis (Lexer)#

Parsing#

Type Checking and Resolution#

Code Generation#

Virtual Machine#

Value Representation#

Memory Management#

AI Integration#

OpCode System#

How to Contribute#

Development Workflow#

Code Organization Conventions#