semantic analysis — Jemoka Knowledge Base

lower things into LLVM IR, HLO, etc. — “analyze the sentences” check type catch inconsistency scoping rules, etc. Typical, we implement multiple passes for this because optimizing for multiple passes is quite hard. We need this because most grammars are not actually context free (for instance, scoping). what to check declaration of identifiers types inheritance relationships class efined once methods in a class defined only once reserved identifiers are not misused (i.e., inherits, etc.) example type errors let y: String <- "abc" in y+3 variables that don’t exist let y : Int in x + 3 scope the scope of an identifier is the portion of a program in which a particular identifier is accessible. same identifier may refer to differet things different scopes for the same name don’t overlap an identifier may have a restricted scope static scope The scope you think scope means. dynamic scope g(y) = let a <- 4 in f(3); f(x) = a; so now the a within f differs due to the call stack. Older programming languages use this, modern languages don’t use this largely. introduction of names/scopes class declarations (class names) method definitions (method names) let expressions (objids) formal parameters (objids) attribute definitions (objids) case expressions (objids) Different classes have different scopes rules; classes, for instance, are globally visible. Attribute names are global within the class (this makes sense). Now, consider: x: Int <- y; y: Int <- x+1; this means x=0, y=1; why? this is because the operational semantics gives that the objects are first initialized to default before their values/initalizers are used. shadowing Needing to parse shadowing. int i = 3; { int i = 4; cout << i; } } implementing scope most semantic analysis is a recursive AST descent before: begin process AST node n recurse: process children of n after: finish processAST node n that is, we have to descend down and then reprocess up each node based on information gathered by the children. think: let x: Int <- 0 in .... to typecheck this node, we first have to type check the initializer, and then add x to a symbol table, then check that the initialized value is the same as that of x declared, and then the rest of the body can be typed. A symbol table, therefore, has a common API: enter_scope(); // start a new tested scope; when you are done you will pop it off find_symbol(x); // finds current x, or returns null add_symbol(x); // add a symbol x to the table check_scope(x); // true if x is defined in the current, top scope // we have this because we can thereby check for duplicated variables within the same scope exit_scope(x); // exit the current scope type There’s only one type, a class, for OOP languages. Many operations don’t make sense for a particular type: it doesn’t make sense to adding a function pointer and an integer it does make sense to add two integer so… you have to check for yourself. Type systems specifies which operations are valid for which types. sound (type system) a sound type system is such that \vdash e: T, then e evaluates to a value of type T. We want sound rules, but not all sound rules are made the same

\begin{equation} \frac{\text{i is an int literal}}{\vdash i: \text{Object}} \end{equation}

is technically a correct rule, but it isn’t the world’s most precise one. statically typed All or almost all type checking is a part of compilation. C, Rust, etc. Static typing makes stuff faster often, because type checking involves a lot of dynamic computation which can’t be branch predicted, which is quite expensive. Most of the time, most typed languages has a way to break the type system (i.e., unsafe casts) so its still sometimes run time checked types. dynamically typed Almost all checking is done as a part of program execution (Scheme, Python, JS, etc.) Optional type rules exists, too, in Python. untyped that is, assembly. types in COOL There are only two types of types in COOL, since COOL is an OOP language: class names SELF_TYPE you can only declare a new type by declaring a class. The cool compiler has 2 jobs check types infer types for expressions (i.e., we will insert into the AST, on every single node, the type of the node—all inferred) Two main parts: 1) type checking — making sure that the programmers are not doing stuff they are not supposed to do 2) type inference — inferring the types of some constructs based on other constructs. rules of COOL booleans

\begin{equation} \frac{}{\vdash \text{false}: \text{Bool} } \end{equation}

new

\begin{equation} \frac{}{\vdash \text{new T} : \text{T}} \end{equation}

negation

\begin{equation} \frac{\vdash e: \text{Bool}}{\vdash !e : \text{Bool}} \end{equation}

while loop:

\begin{equation} \frac{\vdash e_1: \text{Bool}, \vdash e_2: T}{\vdash \text{ while } e_1 \text{ loop } e_2 \text{ pool }: \text{ Object}} \end{equation}

we report an error if nothing matches. Environment lookups!

\begin{equation} \frac{O\left(x\right) = T}{O \vdash x : T} \end{equation}

Let expressions

\begin{equation} \frac{O[T_0 / x] \vdash e_1: T_1}{O \vdash \text{let } x : T_0 \text{ in } e_1: T_1} \end{equation}

the notation O[T_0 / x] says “if, in environment O, we set x to have type T_0”. This environment is implemented as a symbol table.

\begin{equation} \frac{O \vdash e_0: T_0, O[T / x] \vdash e_1: T_1}{O \vdash \text{let } x : T_0 \leftarrow e_0 \text{ in } e_1: T_1} \end{equation}

this is not quite great since subtyping is not allowed; let’s try again:

\begin{equation} \frac{O \vdash e_0: T_0, O[T / x] \vdash e_1: T_1, T_0 \leq T}{ O \vdash \text{let } x: T \leftarrow e_0 \text{ in } e_1: T_1} \end{equation}

now let’s think about assignments:

\begin{equation} \frac{O\left(x\right) = T_0, O \vdash e_1: T_1, T_1 \leq T_0}{O \vdash x \leftarrow e_1: T_1} \end{equation}

which looks similar to attribute initalizaiton

\begin{equation} \frac{O\left(x\right) = T_0, O \vdash e_1: T_1, T_1 \leq T_0}{O \vdash x \leftarrow e_1: T_1} \end{equation}

example \begin{equation} \left(e_1 : \text{Int} \wedge e_2: \text{Int}\right) \implies e_1 + e_2 : Int \end{equation} In the bar notation:

\begin{equation} \frac{\vdash e_1: \text{Int}, \vdash e_2: \text{Int}}{\vdash e_1 + e_2 : \text{Int}} \end{equation}