AIScript's interpreter architecture follows a traditional compilation pipeline with modern enhancements for flexibility, performance, and AI integration capabilities. This chapter aims to help new contributors understand the system's architecture, key components, and how they interact.
The AIScript interpreter follows these main stages:
Lexical Analysis → Parsing → Type Checking → Code Generation → Virtual Machine Execution
This design allows for clear separation of concerns while maintaining flexibility for language evolution. Let's explore each component in detail.
The lexer is the first stage of compilation, responsible for converting source code text into a stream of tokens. Each token represents a meaningful unit in the language (like keywords, identifiers, operators, and literals).
Key responsibilities:
Important structures in the lexer:
TokenType
enum: Defines all possible token typesToken
struct: Contains the token type, lexeme (original text), and line numberScanner
struct: Manages the scanning state and provides methods for token consumptionThe parser converts the token stream into an Abstract Syntax Tree (AST), which represents the hierarchical structure of the program. AIScript uses a recursive descent parser with Pratt parsing for expressions.
Key components:
The parser also performs some early validation, such as:
AIScript includes a type checking system (ty/resolver.rs) that validates types at compile time when possible. This improves error detection before runtime and enables better performance optimizations.
Main features:
The type resolver is introduced early in the parsing phase to catch type errors as soon as possible.
The code generator transforms the AST into bytecode that can be executed by the virtual machine. This phase also performs several optimizations.
Key aspects:
The code generator produces a set of functions with associated bytecode chunks, which are then executed by the VM.
The virtual machine is a stack-based interpreter that executes the generated bytecode. It maintains execution state and provides runtime facilities like garbage collection.
Important components:
gc_arena
)The VM also handles built-in functions, modules, and AI operations.
Values in AIScript are represented using a tagged union approach, allowing efficient storage and manipulation of different data types:
This design allows for efficient operations while supporting garbage collection and reference semantics.
AIScript uses the gc_arena crate for memory management, which provides:
All heap-allocated objects are wrapped in Gc
or GcRefLock
pointers, allowing the garbage collector to track and manage memory.
AIScript has special handling for AI operations:
prompt
for sending requests to AI modelsAgent
system for complex AI interactionsAIScript uses a bytecode instruction set defined in chunk.rs:
Each instruction operates on the VM's stack and affects program execution flow.
Now that you understand the architecture, here are some ways to contribute:
Start Small: Look for issues labeled "good first issue" in our GitHub repository.
Improve Error Messages: Clear error messages help users debug their code. The parser and VM error systems are good places to contribute improvements.
Add Language Features: New syntax features typically require changes to the lexer, parser, code generator, and VM.
Optimize Performance: Look for opportunities to improve bytecode generation or VM execution.
Enhance Type System: Contribute to the type resolver to improve static analysis capabilities.
Fix Bugs: Bug fixes are always valuable contributions.
Before making significant changes, please open a GitHub issue to discuss your approach with the community. This ensures your efforts align with the project's goals and direction.
The AIScript interpreter is designed to be modular and extensible, making it possible for contributors to work on different parts independently. We're excited to see what you'll build with us!