# HG changeset patch # User Adam Chlipala # Date 1228582872 18000 # Node ID 2bf2d0e0fb61df1a13748a4add56d8ef25b5950d # Parent 9eefa0cf3219cbbdadf730c6b6067df85ecab29c Type inference diff -r 9eefa0cf3219 -r 2bf2d0e0fb61 doc/manual.tex --- a/doc/manual.tex Sat Nov 29 15:04:57 2008 -0500 +++ b/doc/manual.tex Sat Dec 06 12:01:12 2008 -0500 @@ -358,7 +358,7 @@ \Gamma \vdash c \hookrightarrow C }$$ -\subsection{Definitional Equality} +\subsection{\label{definitional}Definitional Equality} We use $\mathcal C$ to stand for a one-hole context that, when filled, yields a constructor. The notation $\mathcal C[c]$ plugs $c$ into $\mathcal C$. We omit the standard definition of one-hole contexts. We write $[x \mapsto c_1]c_2$ for capture-avoiding substitution of $c_1$ for $x$ in $c_2$. @@ -530,7 +530,7 @@ We use an auxiliary judgment $\overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'$, expressing the enrichment of $\Gamma$ with the types of the datatype constructors $\overline{dc}$, when they are known to belong to datatype $x$ with type parameters $\overline{y}$. -This is the first judgment where we deal with type classes, for the $\mt{class}$ declaration form. We will omit their special handling in this formal specification. In the compiler, a set of available type classes and their instances is maintained, and these instances are used to fill in expression wildcards. +This is the first judgment where we deal with type classes, for the $\mt{class}$ declaration form. We will omit their special handling in this formal specification. Section \ref{typeclasses} gives an informal description of how type classes influence type inference. We presuppose the existence of a function $\mathcal O$, where $\mathcal(M, \overline{s})$ implements the $\mt{open}$ declaration by producing a context with the appropriate entry for each available component of module $M$ with signature items $\overline{s}$. Where possible, $\mathcal O$ uses ``transparent'' entries (e.g., an abstract type $M.x$ is mapped to $x :: \mt{Type} = M.x$), so that the relationship with $M$ is maintained. A related function $\mathcal O_c$ builds a context containing the disjointness constraints found in $S$. @@ -879,4 +879,40 @@ \mt{proj}(M, \mt{class} \; x = c \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\ \end{eqnarray*} + +\section{Type Inference} + +The Ur/Web compiler uses \emph{heuristic type inference}, with no claims of completeness with respect to the declarative specification of the last section. The rules in use seem to work well in practice. This section summarizes those rules, to help Ur programmers predict what will work and what won't. + +\subsection{Basic Unification} + +Type-checkers for languages based on the Hindly-Milner type discipline, like ML and Haskell, take advantage of \emph{principal typing} properties, making complete type inference relatively straightforward. Inference algorithms are traditionally implemented using type unification variables, at various points asserting equalities between types, in the process discovering the values of type variables. The Ur/Web compiler uses the same basic strategy, but the complexity of the type system rules out easy completeness. + +Type-checking can require evaluating recursive functional programs, thanks to the type-level $\mt{fold}$ operator. When a unification variable appears in such a type, the next step of computation can be undetermined. The value of that variable might be determined later, but this would be ``too late'' for the unification problems generated at the first occurrence. This is the essential source of incompletness. + +Nonetheless, the unification engine tends to do reasonably well. Unlike in ML, polymorphism is never inferred in definitions; it must be indicated explicitly by writing out constructor-level parameters. By writing these and other annotations, the programmer can generally get the type inference engine to do most of the type reconstruction work. + +\subsection{Unifying Record Types} + +The type inference engine tries to take advantage of the algebraic rules governing type-level records, as shown in Section \ref{definitional}. When two constructors of record kind are unified, they are reduce to normal forms, with like terms crossed off from each normal form until, hopefully, nothing remains. This cannot be complete, with the inclusion of unification variables. The type-checker can help you understand what goes wrong when the process fails, as it outputs the unmatched remainders of the two normal forms. + +\subsection{\label{typeclasses}Type Classes} + +Ur includes a type class facility inspired by Haskell's. The current version is very rudimentary, only supporting instances for particular types built up from abstract types and datatypes and type-level application. + +Type classes are integrated with the module system. A type class is just a constructor of kind $\mt{Type} \to \mt{Type}$. By marking such a constructor $c$ as a type class, the programmer instructs the type inference engine to, in each scope, record all values of types $c \; \tau$ as \emph{instances}. Any function argument whose type is of such a form is treated as implicit, to be determined by examining the current instance database. + +The ``dictionary encoding'' often used in Haskell implementations is made explicit in Ur. Type class instances are just properly-typed values, and they can also be considered as ``proofs'' of membership in the class. In some cases, it is useful to pass these proofs around explicitly. An underscore written where a proof is expected will also be inferred, if possible, from the current instance database. + +Just as for types, type classes may be exported from modules, and they may be exported as concrete or abstract. Concrete type classes have their ``real'' definitions exposed, so that client code may add new instances freely. Abstract type classes are useful as ``predicates'' that can be used to enforce invariants, as we will see in some definitions of SQL syntax in the Ur/Web standard library. + +\subsection{Reverse-Engineering Record Types} + +It's useful to write Ur functions and functors that take record constructors as inputs, but these constructors can grow quite long, even though their values are often implied by other arguments. The compiler uses a simple heuristic to infer the values of unification variables that are folded over, yielding known results. Often, as in the case of $\mt{map}$-like folds, the base and recursive cases of a fold produce constructors with different top-level structure. Thus, if the result of the fold is known, examining its top-level structure reveals whether the record being folded over is empty or not. If it's empty, we're done; if it's not empty, we replace a single unification variable with a new constructor formed from three new unification variables, as in $[\alpha = \beta] \rc \gamma$. This process can often be repeated to determine a unification variable fully. + +\subsection{Implicit Arguments in Functor Applications} + +Constructor, constraint, and type class witness members of structures may be omitted, when those structures are used in contexts where their assigned signatures imply how to fill in those missing members. This feature combines well with reverse-engineering to allow for uses of complicated meta-programming functors with little more code than would be necessary to invoke an untyped, ad-hoc code generator. + + \end{document} \ No newline at end of file