Interpreter Architecture

AIScript's interpreter architecture follows a traditional compilation pipeline with modern enhancements for flexibility, performance, and AI integration capabilities. This chapter aims to help new contributors understand the system's architecture, key components, and how they interact.

Overview

The AIScript interpreter follows these main stages:

Lexical AnalysisParsingType CheckingCode GenerationVirtual Machine Execution

This design allows for clear separation of concerns while maintaining flexibility for language evolution. Let's explore each component in detail.

Lexical Analysis (Lexer)

The lexer is the first stage of compilation, responsible for converting source code text into a stream of tokens. Each token represents a meaningful unit in the language (like keywords, identifiers, operators, and literals).

// Example of how the lexer works:
// Input: "let x = 10 + 20;"
// Output: [Token(Let), Token(Identifier, "x"), Token(Equal), Token(Number, "10"), 
//          Token(Plus), Token(Number, "20"), Token(Semicolon)]

Key responsibilities:

  • Breaking source code into tokens
  • Handling string/numeric literals
  • Managing line numbers for error reporting
  • Skipping whitespace and comments
  • Recognizing keywords and operators

Important structures in the lexer:

  • TokenType enum: Defines all possible token types
  • Token struct: Contains the token type, lexeme (original text), and line number
  • Scanner struct: Manages the scanning state and provides methods for token consumption

Parsing

The parser converts the token stream into an Abstract Syntax Tree (AST), which represents the hierarchical structure of the program. AIScript uses a recursive descent parser with Pratt parsing for expressions.

Key components:

  • AST node definitions
  • Parsing functions for statements and expressions
  • Precedence handling for operators
  • Error recovery mechanisms
  • Type annotation handling

The parser also performs some early validation, such as:

  • Checking for valid syntax
  • Validating enum variants
  • Ensuring valid function declarations
  • Validating match patterns

Type Checking and Resolution

AIScript includes a type checking system (ty/resolver.rs) that validates types at compile time when possible. This improves error detection before runtime and enables better performance optimizations.

Main features:

  • Type annotation validation
  • Class and enum type checking
  • Validation of object literals against class definitions
  • Function parameter type checking
  • Error type validation

The type resolver is introduced early in the parsing phase to catch type errors as soon as possible.

Code Generation

The code generator transforms the AST into bytecode that can be executed by the virtual machine. This phase also performs several optimizations.

Key aspects:

  • Generation of VM opcodes from AST nodes
  • Handling variable scope and closures
  • Managing function parameters and defaults
  • Implementing control flow (if/else, loops, etc.)
  • Error handling code generation
  • Enum and class compilation

The code generator produces a set of functions with associated bytecode chunks, which are then executed by the VM.

Virtual Machine

The virtual machine is a stack-based interpreter that executes the generated bytecode. It maintains execution state and provides runtime facilities like garbage collection.

Important components:

  • Call frames for function invocation
  • Value stack for computations
  • Global and local variable storage
  • Upvalue handling for closures
  • Garbage collection (via gc_arena)
  • Runtime error handling

The VM also handles built-in functions, modules, and AI operations.

Value Representation

Values in AIScript are represented using a tagged union approach, allowing efficient storage and manipulation of different data types:

pub enum Value<'gc> {
    Number(f64),
    Boolean(bool),
    String(InternedString<'gc>),
    IoString(Gc<'gc, String>),
    Closure(Gc<'gc, Closure<'gc>>),
    NativeFunction(NativeFn<'gc>),
    Array(GcRefLock<'gc, Vec<Value<'gc>>>),
    Object(GcRefLock<'gc, Object<'gc>>),
    Enum(GcRefLock<'gc, Enum<'gc>>),
    EnumVariant(Gc<'gc, EnumVariant<'gc>>),
    Class(GcRefLock<'gc, Class<'gc>>),
    Instance(GcRefLock<'gc, Instance<'gc>>),
    BoundMethod(Gc<'gc, BoundMethod<'gc>>),
    Module(InternedString<'gc>),
    Agent(Gc<'gc, Agent<'gc>>),
    Nil,
}

This design allows for efficient operations while supporting garbage collection and reference semantics.

Memory Management

AIScript uses the gc_arena crate for memory management, which provides:

  • Tracing garbage collection
  • Memory safety through lifetime parameters
  • Efficient allocation and collection
  • Cycle detection

All heap-allocated objects are wrapped in Gc or GcRefLock pointers, allowing the garbage collector to track and manage memory.

AI Integration

AIScript has special handling for AI operations:

  • prompt for sending requests to AI models
  • Agent system for complex AI interactions
  • AI function compilation and execution

OpCode System

AIScript uses a bytecode instruction set defined in chunk.rs:

pub enum OpCode {
    Constant(u8),     // Load constant value
    Return,           // Return from function
    Add, Subtract,    // Arithmetic operations
    GetLocal(u8),     // Get local variable
    SetLocal(u8),     // Set local variable
    // ... many more instructions
}

Each instruction operates on the VM's stack and affects program execution flow.

How to Contribute

Now that you understand the architecture, here are some ways to contribute:

  1. Start Small: Look for issues labeled "good first issue" in our GitHub repository.

  2. Improve Error Messages: Clear error messages help users debug their code. The parser and VM error systems are good places to contribute improvements.

  3. Add Language Features: New syntax features typically require changes to the lexer, parser, code generator, and VM.

  4. Optimize Performance: Look for opportunities to improve bytecode generation or VM execution.

  5. Enhance Type System: Contribute to the type resolver to improve static analysis capabilities.

  6. Fix Bugs: Bug fixes are always valuable contributions.

Before making significant changes, please open a GitHub issue to discuss your approach with the community. This ensures your efforts align with the project's goals and direction.

Development Workflow

  1. Fork the repository on GitHub
  2. Create a feature branch
  3. Make your changes, following our code style
  4. Add tests for your changes
  5. Run the existing test suite to ensure nothing breaks
  6. Submit a pull request with a clear description of your changes

Code Organization Conventions

  • Each major component has its own module
  • Opt for composition over inheritance
  • Prefer immutable data where possible
  • Use descriptive naming
  • Document public APIs with comments
  • Follow Rust's naming conventions

The AIScript interpreter is designed to be modular and extensible, making it possible for contributors to work on different parts independently. We're excited to see what you'll build with us!