Compiler Internals

This section describes how compiler works.

Total of 12+ passes are planned for the compiler.

What happens when you compile a .yaka file?

Let’s look at a sample

Input code

This section needs updating
def factorial(x: int) -> int:
    if x <= 0:
        return 1
    return x * factorial(x - 1)


def main() -> int:
    a: int = 10
    b: str = "b"
    while a > 0:
        print(factorial(a))
        print("\n")
        a = a - 1
        b = "a" + b
    print(b)
    print("\n")
    return 0

Output C code

// --yaksha header section--
#include "yk__lib.h"
int32_t yy__factorial(int32_t);
int32_t yy__main();
// --yaksha body section--
int32_t yy__factorial(int32_t yy__x) {
  if ((yy__x <= 0)) { return 1; }
  return (yy__x * yy__factorial((yy__x - 1)));
}
int32_t yy__main() {
  int32_t yy__a = 10;
  yk__sds t__0 = yk__sdsnew("b");
  yk__sds yy__b = yk__sdsdup(t__0);
  while (1) {
    if (!((yy__a > 0))) { break; }
    {
      printf("%d", (yy__factorial(yy__a)));
      yk__sds t__1 = yk__sdsnew("\n");
      printf("%s", (t__1));
      yy__a = (yy__a - 1);
      yk__sds t__2 = yk__sdsnew("a");
      yk__sds t__3 = yk__sdscatsds(yk__sdsdup(t__2), yy__b);
      yk__sdsfree(yy__b);
      yy__b = yk__sdsdup(t__3);
      yk__sdsfree(t__3);
      yk__sdsfree(t__2);
      yk__sdsfree(t__1);
    }
  }
  printf("%s", (yy__b));
  yk__sds t__4 = yk__sdsnew("\n");
  printf("%s", (t__4));
  yk__sdsfree(t__4);
  yk__sdsfree(t__0);
  yk__sdsfree(yy__b);
  return 0;
}
// --yaksha footer section--
int main(void) { return (int) yy__main(); }
Yaksha Programming Language Mascot
  • Notice yk__sdsfree is automatically generated for simple string uses.
  • This makes strings immutable. However, it is doing a lot of unnecessary processing. 😓
  • Therefore sr is useful to avoid unnecessary memory allocations.

Phases of the compiler

Tokenizer

in-progress
  • Tokenizer breaks down input to individual tokens and Identifies Keywords.
  • Parse numbers and strings and check if they are valid according to the grammar.
If any error is detected at this point we still continue up to parser so we can identify maximum number of errors.

Block analyzer

in-progress
  • Convert indentation to ba_indent, ba_dedent type tokens.
  • Remove comments.
  • Remove extra new lines.
Yaksha Programming Language Mascot
  • Only 2-spaces, 4-spaces or tab based indents are supported.
  • Yaksha will try to guess indentation type.
  • Still continue to parser even after errors.

Parser

in-progress
  • Parses tokens returned by block analyzer an AST.
  • AST is represented as a std::vector of stmt*.
Any errors from previous stages and parsing stage is printed here and program will exit.

Import analyzer

in-progress
Import Analyzer
  • Analyzes imports.
  • Parses imported files.
  • This step will use more instances of Tokenizer, Parser and Compiler objects.

Def-Struct-Const Visitor

in-progress
  • Visit def statements and collect functions to a map.
  • Visit class statements and collect structures to a map.
  • Visit global constants.

Return path analyzer

in-progress
  • Analyzes return paths.
  • Ensure all functions return.

Type checker

in-progress
  • Type checker visits AST and check for invalid types.
  • Checks for undefined variables.
  • Checks for undefined functions.
  • Checks for undefined structures.
  • Check all return types are same as that of the encapsulating functions.

Template Compiler

not-started
  • Rewrite @template to be normal functions based on what’s passed to them.
  • Rewrite fncall expressions to use newly created functions.

Optimizer

in-progress
  • Remove dead code.
  • Basic constant folding.

To-CL-Compiler

not-started
  • Convert @device code to OpenCL program code.
  • Copy necessary structures.
  • Check validity - no generics, no str, no allocations.

To-C-Compiler

in-progress
  • Writes C code from AST.
  • Do any simple optimizations.
  • Handle defer statements.
  • Handle str deletions.
  • Create struct and function declarations.

C-To-C Compiler

in-progress
  • We generate code to a single C file which can then be further optimized.
  • Parse and optimize subset of generated C code.

How does the Yaksha-lang library get packaged?

Yaksha Programming Language Runtime
Yaksha Programming Language Mascot
  • Multiple sources are packaged into a single header file.
  • Library functionality is exposed by prefixing with yk__ or YK__.

Packer components

  • packer.py - Run packer DSL scripts and create packaged single header libraries.
  • inctree.py - Topological sort #include DAG to determine best order for combining headers.
DAG - Directed Acyclic Graph.
  • cids.exe - Use stb_c_lexer.h library to parse C code and extract identifiers.
  • single_header_packer.py - ApoorvaJ’s single header C code packager. Repo
  • python-patch - techtonik’s patch script. Repo
  • fcpp- Frexx C Preprocessor by Daniel Stenberg. (Patch for Windows compilation was needed, failed to compile with MSVC, works with MingW with patch).

Third Party Libraries

If you stack few giants, you can stand very tall on top of them.
  • sds - Salvatore Sanfilippo’s string library. (Needed a patch to support MSVC 2019)
  • stb - Single header libraries by Sean Barrett.
  • utf8proc - UTF-8 library - Jan Behrens, Public Software Group and Julia developers.
Note - currently only sds and stb_ds is used. This selection of libraries may change.

Packer DSL

import re
use_source("libs")
for lib in ["ini", "thread", "http"]:
    ids = extract_ids(lib + ".h")
    P = [x for x in ids if x.startswith(lib)]
    PU = [x for x in ids if x.startswith(lib.upper())]
    prefix(lib + ".h", PREFIX, P)
    prefix(lib + ".h", PREFIX_U, PU)
    rename(lib + ".h", [[re.escape(r'yk__http://'), 'http://'],
                        ["YK__THREAD_PRIORITY_HIGHEST", "THREAD_PRIORITY_HIGHEST"]])
    copy_file(lib + ".h", PREFIX + lib + ".h", is_temp=False)
    clang_format(PREFIX + lib + ".h", is_temp=False)
It is just Python 3.x with extra functions added and evaluated.🤫