Compiler Internals
This section describes how compiler works.
Total of
12+
passes are planned for the compiler. What happens when you compile a .yaka
file?
Let’s look at a sample
Input code
This section needs updating
def factorial(x: int) -> int:
if x <= 0:
return 1
return x * factorial(x - 1)
def main() -> int:
a: int = 10
b: str = "b"
while a > 0:
print(factorial(a))
print("\n")
a = a - 1
b = "a" + b
print(b)
print("\n")
return 0
Output C code
// --yaksha header section--
#include "yk__lib.h"
int32_t yy__factorial(int32_t);
int32_t yy__main();
// --yaksha body section--
int32_t yy__factorial(int32_t yy__x) {
if ((yy__x <= 0)) { return 1; }
return (yy__x * yy__factorial((yy__x - 1)));
}
int32_t yy__main() {
int32_t yy__a = 10;
yk__sds t__0 = yk__sdsnew("b");
yk__sds yy__b = yk__sdsdup(t__0);
while (1) {
if (!((yy__a > 0))) { break; }
{
printf("%d", (yy__factorial(yy__a)));
yk__sds t__1 = yk__sdsnew("\n");
printf("%s", (t__1));
yy__a = (yy__a - 1);
yk__sds t__2 = yk__sdsnew("a");
yk__sds t__3 = yk__sdscatsds(yk__sdsdup(t__2), yy__b);
yk__sdsfree(yy__b);
yy__b = yk__sdsdup(t__3);
yk__sdsfree(t__3);
yk__sdsfree(t__2);
yk__sdsfree(t__1);
}
}
printf("%s", (yy__b));
yk__sds t__4 = yk__sdsnew("\n");
printf("%s", (t__4));
yk__sdsfree(t__4);
yk__sdsfree(t__0);
yk__sdsfree(yy__b);
return 0;
}
// --yaksha footer section--
int main(void) { return (int) yy__main(); }
- Notice
yk__sdsfree
is automatically generated for simple string uses. - This makes strings immutable. However, it is doing a lot of unnecessary processing. 😓
- Therefore
sr
is useful to avoid unnecessary memory allocations.
Phases of the compiler
Tokenizer
in-progress
- Tokenizer breaks down input to individual tokens and Identifies Keywords.
- Parse numbers and strings and check if they are valid according to the grammar.
If any error is detected at this point we still continue up to parser so we can identify maximum number of errors.
Block analyzer
in-progress
- Convert indentation to
ba_indent
,ba_dedent
type tokens. - Remove comments.
- Remove extra new lines.
- Only 2-spaces, 4-spaces or tab based indents are supported.
- Yaksha will try to guess indentation type.
- Still continue to parser even after errors.
Parser
in-progress
- Parses tokens returned by block analyzer an AST.
- AST is represented as a
std::vector
ofstmt*
.
Any errors from previous stages and parsing stage is printed here and program will exit.
Import analyzer
in-progress
- Analyzes imports.
- Parses imported files.
- This step will use more instances of Tokenizer, Parser and Compiler objects.
Def-Struct-Const Visitor
in-progress
- Visit
def
statements and collect functions to a map. - Visit
class
statements and collect structures to a map. - Visit global constants.
Return path analyzer
in-progress
- Analyzes return paths.
- Ensure all functions return.
Type checker
in-progress
- Type checker visits AST and check for invalid types.
- Checks for undefined variables.
- Checks for undefined functions.
- Checks for undefined structures.
- Check all return types are same as that of the encapsulating functions.
Template Compiler
not-started
- Rewrite @template to be normal functions based on what’s passed to them.
- Rewrite
fncall
expressions to use newly created functions.
Optimizer
in-progress
- Remove dead code.
- Basic constant folding.
To-CL-Compiler
not-started
- Convert
@device
code to OpenCL program code. - Copy necessary structures.
- Check validity - no generics, no str, no allocations.
To-C-Compiler
in-progress
- Writes
C
code from AST. - Do any simple optimizations.
- Handle
defer
statements. - Handle
str
deletions. - Create
struct
and function declarations.
C-To-C Compiler
in-progress
- We generate code to a single
C
file which can then be further optimized. - Parse and optimize subset of generated
C
code.
How does the Yaksha-lang library get packaged?
- Multiple sources are packaged into a single header file.
- Library functionality is exposed by prefixing with
yk__
orYK__
.
Packer components
packer.py
- Run packer DSL scripts and create packaged single header libraries.inctree.py
- Topological sort#include
DAG to determine best order for combining headers.
DAG - Directed Acyclic Graph.
cids.exe
- Usestb_c_lexer.h
library to parse C code and extract identifiers.single_header_packer.py
- ApoorvaJ’s single header C code packager. Repopython-patch
- techtonik’s patch script. Repofcpp
- Frexx C Preprocessor by Daniel Stenberg. (Patch for Windows compilation was needed, failed to compile with MSVC, works with MingW with patch).
Third Party Libraries
If you stack few giants, you can stand very tall on top of them.
sds
- Salvatore Sanfilippo’s string library. (Needed a patch to support MSVC 2019)stb
- Single header libraries by Sean Barrett.utf8proc
- UTF-8 library - Jan Behrens, Public Software Group and Julia developers.
Note - currently only sds and stb_ds is used. This selection of libraries may change.
Packer DSL
import re
use_source("libs")
for lib in ["ini", "thread", "http"]:
ids = extract_ids(lib + ".h")
P = [x for x in ids if x.startswith(lib)]
PU = [x for x in ids if x.startswith(lib.upper())]
prefix(lib + ".h", PREFIX, P)
prefix(lib + ".h", PREFIX_U, PU)
rename(lib + ".h", [[re.escape(r'yk__http://'), 'http://'],
["YK__THREAD_PRIORITY_HIGHEST", "THREAD_PRIORITY_HIGHEST"]])
copy_file(lib + ".h", PREFIX + lib + ".h", is_temp=False)
clang_format(PREFIX + lib + ".h", is_temp=False)
It is just Python 3.x with extra functions added and evaluated.🤫