Choose the right structure at the start

May 01, 2025

The talk I gave covers the basics of making your own parser, lexer, LSP, extensions, auto-formatting, debugger, and playground! -

Noah

I gave a talk this week on a new programming language I created specifically for that talk. Let me tell you about Tegan. Tegan means “toy” in Welsh, literally a toy language. It's stack-based, so every operation is the manipulation of either the main stack, short lived local stacks, or named stacks. It has no way to define functions, only to use those which are built in. It doesn't support floating point, or booleans, or objects - only single line strings and integers. It doesn’t have types. By most standards, it’s not a good language.

It does however have an LSP. It has syntax highlighting in VSCode. It has a debugger. It has a REPL. It can run interpreted in Node, or in the browser. It can compile to native code. It has a web-based playground. It has an auto-formatter. It has inline errors.

The idea behind Tegan isn’t to be a good language. But to show how simple some features can be, if you approach a problem can be from the right direction. Often developers, myself included, try to tackle too many problems all at once. But taking a minute to pause and think about the structure of code before starting is often fruitful later.

For compilers, getting the AST for both the tokens and the expressions can vastly impact how complex it is to later add errors, multiple targets, or editor support. If you think about line numbers, indexes, or a full representation of the tokens and syntax tree of a program, so many things become easier.

The same thing applies to most programs. Before I write code, I ask myself: “what unknowns exist, and what do I already know that could be re-applied?”. With most coding, your knowledge from different domains can be applied again. Humans (and AI, I guess) tend to make code in similar ways, over and over, with slight changes. When was the last time you encountered an API that worked in a totally unfamiliar, new, and novel way for you? It’s a pretty infrequent occurrence. The structure you chose to use, based on what past experiences have taught you, can reduce the amount of time it takes to integrate with an API.

A good data structure can simplify code. Data structures with too many optional fields, unclear nesting, or mismatched impossible states tend to complicated code. Breaking things down into distinct, separate members of a union help clear up code paths and reduce the complexity in introducing safe changes. It’s one reason I start with types before I start with code, and recommend that for others.

Tegan itself

Tegan is a mostly a side-note for this article, but I figured I’d include further reading if people were interested. Tegan is a great example of how the right data structures simplifies everything. Beyond the parsing and runtime logic, the majority of tooling features (LSP, etc) came for free from the data structures used.

Here’s some screenshots. You could also watch the talk, or just look at the slides or the repo.

Some screenshots of the playground:

The playground supports error highlighting + messages, simple if you store token start and end indexes

The playground can run the code, with print going to the console, but the stack is shown at the end

Because the AST + Tegan generators are written in JavaScript, they’re available in the playground as well as locally

Screenshots of vscode:

The Tech Enabler

Choose the right structure at the start

Tegan itself