annotate doc/manual.tex @ 557:0d3db8d6a586

Building an application
author Adam Chlipala <adamc@hcoop.net>
date Tue, 09 Dec 2008 11:57:17 -0500
parents 5703a2ad5221
children 390cba747188
rev   line source
adamc@524 1 \documentclass{article}
adamc@554 2 \usepackage{fullpage,amsmath,amssymb,proof,url}
adamc@524 3
adamc@524 4 \newcommand{\cd}[1]{\texttt{#1}}
adamc@524 5 \newcommand{\mt}[1]{\mathsf{#1}}
adamc@524 6
adamc@524 7 \newcommand{\rc}{+ \hspace{-.075in} + \;}
adamc@527 8 \newcommand{\rcut}{\; \texttt{--} \;}
adamc@527 9 \newcommand{\rcutM}{\; \texttt{---} \;}
adamc@524 10
adamc@524 11 \begin{document}
adamc@524 12
adamc@524 13 \title{The Ur/Web Manual}
adamc@524 14 \author{Adam Chlipala}
adamc@524 15
adamc@524 16 \maketitle
adamc@524 17
adamc@540 18 \tableofcontents
adamc@540 19
adamc@554 20
adamc@554 21 \section{Introduction}
adamc@554 22
adamc@554 23 \emph{Ur} is a programming language designed to introduce richer type system features into functional programming in the tradition of ML and Haskell. Ur is functional, pure, statically-typed, and strict. Ur supports a powerful kind of \emph{metaprogramming} based on \emph{row types}.
adamc@554 24
adamc@554 25 \emph{Ur/Web} is Ur plus a special standard library and associated rules for parsing and optimization. Ur/Web supports construction of dynamic web applications backed by SQL databases. The signature of the standard library is such that well-typed Ur/Web programs ``don't go wrong'' in a very broad sense. Not only do they not crash during particular page generations, but they also may not:
adamc@554 26
adamc@554 27 \begin{itemize}
adamc@554 28 \item Suffer from any kinds of code-injection attacks
adamc@554 29 \item Return invalid HTML
adamc@554 30 \item Contain dead intra-application links
adamc@554 31 \item Have mismatches between HTML forms and the fields expected by their handlers
adamc@554 32 \item Attempt invalid SQL queries
adamc@554 33 \item Use improper marshaling or unmarshaling in communication with SQL databases
adamc@554 34 \end{itemize}
adamc@554 35
adamc@554 36 This type safety is just the foundation of the Ur/Web methodology. It is also possible to use metaprogramming to build significant application pieces by analysis of type structure. For instance, the demo includes an ML-style functor for building an admin interface for an arbitrary SQL table. The type system guarantees that the admin interface sub-application that comes out will always be free of the above-listed bugs, no matter which well-typed table description is given as input.
adamc@554 37
adamc@554 38 The Ur/Web compiler also produces very efficient object code that does not use garbage collection. These compiled programs will often be even more efficient than what most programmers would bother to write in C.
adamc@554 39
adamc@554 40 \medskip
adamc@554 41
adamc@554 42 The official web site for Ur is:
adamc@554 43 \begin{center}
adamc@554 44 \url{http://www.impredicative.com/ur/}
adamc@554 45 \end{center}
adamc@554 46
adamc@555 47
adamc@555 48 \section{Installation}
adamc@555 49
adamc@555 50 If you are lucky, then the following standard command sequence will suffice for installation, in a directory to which you have unpacked the latest distribution tarball.
adamc@555 51
adamc@555 52 \begin{verbatim}
adamc@555 53 ./configure
adamc@555 54 make
adamc@555 55 sudo make install
adamc@555 56 \end{verbatim}
adamc@555 57
adamc@555 58 Some other packages must be installed for the above to work. At a minimum, you need a standard UNIX shell, with standard UNIX tools like sed and GCC in your execution path; and MLton, the whole-program optimizing compiler for Standard ML. To build programs that access SQL databases, you also need libpq, the PostgreSQL client library. As of this writing, in the ``testing'' version of Debian Linux, this command will install the more uncommon of these dependencies:
adamc@555 59
adamc@555 60 \begin{verbatim}
adamc@555 61 apt-get install mlton libpq-dev
adamc@555 62 \end{verbatim}
adamc@555 63
adamc@555 64 It is also possible to access the modules of the Ur/Web compiler interactively, within Standard ML of New Jersey. To install the prerequisites in Debian testing:
adamc@555 65
adamc@555 66 \begin{verbatim}
adamc@555 67 apt-get install smlnj libsmlnj-smlnj ml-yacc ml-lpt
adamc@555 68 \end{verbatim}
adamc@555 69
adamc@555 70 To begin an interactive session with the Ur compiler modules, run \texttt{make smlnj}, and then, from within an \texttt{sml} session, run \texttt{CM.make "src/urweb.cm";}. The \texttt{Compiler} module is the main entry point.
adamc@555 71
adamc@555 72 To run an SQL-backed application, you will probably want to install the PostgreSQL server. Version 8.3 or higher is required.
adamc@555 73
adamc@555 74 \begin{verbatim}
adamc@555 75 apt-get install postgresql-8.3
adamc@555 76 \end{verbatim}
adamc@555 77
adamc@555 78 To use the Emacs mode, you must have a modern Emacs installed. We assume that you already know how to do this, if you're in the business of looking for an Emacs mode. The demo generation facility of the compiler will also call out to Emacs to syntax-highlight code, and that process depends on the \texttt{htmlize} module, which can be installed in Debian testing via:
adamc@555 79
adamc@555 80 \begin{verbatim}
adamc@555 81 apt-get install emacs-goodies-el
adamc@555 82 \end{verbatim}
adamc@555 83
adamc@555 84 Even with the right packages installed, configuration and building might fail to work. After you run \texttt{./configure}, you will see the values of some named environment variables printed. You may need to adjust these values to get proper installation for your system. To change a value, store your preferred alternative in the corresponding UNIX environment variable, before running \texttt{./configure}. For instance, here is how to change the list of extra arguments that the Ur/Web compiler will pass to GCC on every invocation.
adamc@555 85
adamc@555 86 \begin{verbatim}
adamc@555 87 GCCARGS=-fnested-functions ./configure
adamc@555 88 \end{verbatim}
adamc@555 89
adamc@555 90 Some OSX users have reported needing to use this particular GCCARGS value.
adamc@555 91
adamc@555 92 The Emacs mode can be set to autoload by adding the following to your \texttt{.emacs} file.
adamc@555 93
adamc@555 94 \begin{verbatim}
adamc@555 95 (add-to-list 'load-path "/usr/local/share/emacs/site-lisp/urweb-mode")
adamc@555 96 (load "urweb-mode-startup")
adamc@555 97 \end{verbatim}
adamc@555 98
adamc@555 99 Change the path in the first line if you chose a different Emacs installation path during configuration.
adamc@555 100
adamc@555 101
adamc@556 102 \section{Command-Line Compiler}
adamc@556 103
adamc@556 104 \subsection{Project Files}
adamc@556 105
adamc@556 106 The basic inputs to the \texttt{urweb} compiler are project files, which have the extension \texttt{.urp}. Here is a sample \texttt{.urp} file.
adamc@556 107
adamc@556 108 \begin{verbatim}
adamc@556 109 database dbname=test
adamc@556 110 sql crud1.sql
adamc@556 111
adamc@556 112 crud
adamc@556 113 crud1
adamc@556 114 \end{verbatim}
adamc@556 115
adamc@556 116 The \texttt{database} line gives the database information string to pass to libpq. In this case, the string only says to connect to a local database named \texttt{test}.
adamc@556 117
adamc@556 118 The \texttt{sql} line asks for an SQL source file to be generated, giving the commands to run to create the tables and sequences that this application expects to find. After building this \texttt{.urp} file, the following commands could be used to initialize the database, assuming that the current UNIX user exists as a Postgres user with database creation privileges:
adamc@556 119
adamc@556 120 \begin{verbatim}
adamc@556 121 createdb test
adamc@556 122 psql -f crud1.sql test
adamc@556 123 \end{verbatim}
adamc@556 124
adamc@556 125 A blank line always separates the named directives from a list of modules to include in the project; if there are no named directives, a blank line must begin the file.
adamc@556 126
adamc@556 127 For each entry \texttt{M} in the module list, the file \texttt{M.urs} is included in the project if it exists, and the file \texttt{M.ur} must exist and is always included.
adamc@556 128
adamc@556 129 A few other named directives are supported. \texttt{prefix PREFIX} sets the prefix included before every URI within the generated application; the default is \texttt{/}. \texttt{exe FILENAME} sets the filename to which to write the output executable; the default for file \texttt{P.urp} is \texttt{P.exe}. \texttt{debug} saves some intermediate C files, which is mostly useful to help in debugging the compiler itself. \texttt{profile} generates an executable that may be used with gprof.
adamc@556 130
adamc@557 131 \subsection{Building an Application}
adamc@557 132
adamc@557 133 To compile project \texttt{P.urp}, simply run
adamc@557 134 \begin{verbatim}
adamc@557 135 urweb P
adamc@557 136 \end{verbatim}
adamc@557 137
adamc@557 138 To time how long the different compiler phases run, without generating an executable, run
adamc@557 139 \begin{verbatim}
adamc@557 140 urweb -timing P
adamc@557 141 \end{verbatim}
adamc@557 142
adamc@556 143
adamc@529 144 \section{Ur Syntax}
adamc@529 145
adamc@529 146 In this section, we describe the syntax of Ur, deferring to a later section discussion of most of the syntax specific to SQL and XML. The sole exceptions are the declaration forms for tables, sequences, and cookies.
adamc@524 147
adamc@524 148 \subsection{Lexical Conventions}
adamc@524 149
adamc@524 150 We give the Ur language definition in \LaTeX $\;$ math mode, since that is prettier than monospaced ASCII. The corresponding ASCII syntax can be read off directly. Here is the key for mapping math symbols to ASCII character sequences.
adamc@524 151
adamc@524 152 \begin{center}
adamc@524 153 \begin{tabular}{rl}
adamc@524 154 \textbf{\LaTeX} & \textbf{ASCII} \\
adamc@524 155 $\to$ & \cd{->} \\
adamc@524 156 $\times$ & \cd{*} \\
adamc@524 157 $\lambda$ & \cd{fn} \\
adamc@524 158 $\Rightarrow$ & \cd{=>} \\
adamc@529 159 $\neq$ & \cd{<>} \\
adamc@529 160 $\leq$ & \cd{<=} \\
adamc@529 161 $\geq$ & \cd{>=} \\
adamc@524 162 \\
adamc@524 163 $x$ & Normal textual identifier, not beginning with an uppercase letter \\
adamc@525 164 $X$ & Normal textual identifier, beginning with an uppercase letter \\
adamc@524 165 \end{tabular}
adamc@524 166 \end{center}
adamc@524 167
adamc@525 168 We often write syntax like $e^*$ to indicate zero or more copies of $e$, $e^+$ to indicate one or more copies, and $e,^*$ and $e,^+$ to indicate multiple copies separated by commas. Another separator may be used in place of a comma. The $e$ term may be surrounded by parentheses to indicate grouping; those parentheses should not be included in the actual ASCII.
adamc@524 169
adamc@526 170 We write $\ell$ for literals of the primitive types, for the most part following C conventions. There are $\mt{int}$, $\mt{float}$, and $\mt{string}$ literals.
adamc@526 171
adamc@527 172 This version of the manual doesn't include operator precedences; see \texttt{src/urweb.grm} for that.
adamc@527 173
adamc@552 174 \subsection{\label{core}Core Syntax}
adamc@524 175
adamc@524 176 \emph{Kinds} classify types and other compile-time-only entities. Each kind in the grammar is listed with a description of the sort of data it classifies.
adamc@524 177 $$\begin{array}{rrcll}
adamc@524 178 \textrm{Kinds} & \kappa &::=& \mt{Type} & \textrm{proper types} \\
adamc@525 179 &&& \mt{Unit} & \textrm{the trivial constructor} \\
adamc@525 180 &&& \mt{Name} & \textrm{field names} \\
adamc@525 181 &&& \kappa \to \kappa & \textrm{type-level functions} \\
adamc@525 182 &&& \{\kappa\} & \textrm{type-level records} \\
adamc@525 183 &&& (\kappa\times^+) & \textrm{type-level tuples} \\
adamc@529 184 &&& \_\_ & \textrm{wildcard} \\
adamc@525 185 &&& (\kappa) & \textrm{explicit precedence} \\
adamc@524 186 \end{array}$$
adamc@524 187
adamc@524 188 Ur supports several different notions of functions that take types as arguments. These arguments can be either implicit, causing them to be inferred at use sites; or explicit, forcing them to be specified manually at use sites. There is a common explicitness annotation convention applied at the definitions of and in the types of such functions.
adamc@524 189 $$\begin{array}{rrcll}
adamc@524 190 \textrm{Explicitness} & ? &::=& :: & \textrm{explicit} \\
adamc@525 191 &&& \; ::: & \textrm{implicit}
adamc@524 192 \end{array}$$
adamc@524 193
adamc@524 194 \emph{Constructors} are the main class of compile-time-only data. They include proper types and are classified by kinds.
adamc@524 195 $$\begin{array}{rrcll}
adamc@524 196 \textrm{Constructors} & c, \tau &::=& (c) :: \kappa & \textrm{kind annotation} \\
adamc@530 197 &&& \hat{x} & \textrm{constructor variable} \\
adamc@524 198 \\
adamc@525 199 &&& \tau \to \tau & \textrm{function type} \\
adamc@525 200 &&& x \; ? \; \kappa \to \tau & \textrm{polymorphic function type} \\
adamc@525 201 &&& \$ c & \textrm{record type} \\
adamc@524 202 \\
adamc@525 203 &&& c \; c & \textrm{type-level function application} \\
adamc@530 204 &&& \lambda x \; :: \; \kappa \Rightarrow c & \textrm{type-level function abstraction} \\
adamc@524 205 \\
adamc@525 206 &&& () & \textrm{type-level unit} \\
adamc@525 207 &&& \#X & \textrm{field name} \\
adamc@524 208 \\
adamc@525 209 &&& [(c = c)^*] & \textrm{known-length type-level record} \\
adamc@525 210 &&& c \rc c & \textrm{type-level record concatenation} \\
adamc@525 211 &&& \mt{fold} & \textrm{type-level record fold} \\
adamc@524 212 \\
adamc@525 213 &&& (c^+) & \textrm{type-level tuple} \\
adamc@525 214 &&& c.n & \textrm{type-level tuple projection ($n \in \mathbb N^+$)} \\
adamc@524 215 \\
adamc@525 216 &&& \lambda [c \sim c] \Rightarrow c & \textrm{guarded constructor} \\
adamc@524 217 \\
adamc@529 218 &&& \_ :: \kappa & \textrm{wildcard} \\
adamc@525 219 &&& (c) & \textrm{explicit precedence} \\
adamc@530 220 \\
adamc@530 221 \textrm{Qualified uncapitalized variables} & \hat{x} &::=& x & \textrm{not from a module} \\
adamc@530 222 &&& M.x & \textrm{projection from a module} \\
adamc@525 223 \end{array}$$
adamc@525 224
adamc@525 225 Modules of the module system are described by \emph{signatures}.
adamc@525 226 $$\begin{array}{rrcll}
adamc@525 227 \textrm{Signatures} & S &::=& \mt{sig} \; s^* \; \mt{end} & \textrm{constant} \\
adamc@525 228 &&& X & \textrm{variable} \\
adamc@525 229 &&& \mt{functor}(X : S) : S & \textrm{functor} \\
adamc@529 230 &&& S \; \mt{where} \; \mt{con} \; x = c & \textrm{concretizing an abstract constructor} \\
adamc@525 231 &&& M.X & \textrm{projection from a module} \\
adamc@525 232 \\
adamc@525 233 \textrm{Signature items} & s &::=& \mt{con} \; x :: \kappa & \textrm{abstract constructor} \\
adamc@525 234 &&& \mt{con} \; x :: \kappa = c & \textrm{concrete constructor} \\
adamc@528 235 &&& \mt{datatype} \; x \; x^* = dc\mid^+ & \textrm{algebraic datatype definition} \\
adamc@529 236 &&& \mt{datatype} \; x = \mt{datatype} \; M.x & \textrm{algebraic datatype import} \\
adamc@525 237 &&& \mt{val} \; x : \tau & \textrm{value} \\
adamc@525 238 &&& \mt{structure} \; X : S & \textrm{sub-module} \\
adamc@525 239 &&& \mt{signature} \; X = S & \textrm{sub-signature} \\
adamc@525 240 &&& \mt{include} \; S & \textrm{signature inclusion} \\
adamc@525 241 &&& \mt{constraint} \; c \sim c & \textrm{record disjointness constraint} \\
adamc@525 242 &&& \mt{class} \; x & \textrm{abstract type class} \\
adamc@525 243 &&& \mt{class} \; x = c & \textrm{concrete type class} \\
adamc@525 244 \\
adamc@525 245 \textrm{Datatype constructors} & dc &::=& X & \textrm{nullary constructor} \\
adamc@525 246 &&& X \; \mt{of} \; \tau & \textrm{unary constructor} \\
adamc@524 247 \end{array}$$
adamc@524 248
adamc@526 249 \emph{Patterns} are used to describe structural conditions on expressions, such that expressions may be tested against patterns, generating assignments to pattern variables if successful.
adamc@526 250 $$\begin{array}{rrcll}
adamc@526 251 \textrm{Patterns} & p &::=& \_ & \textrm{wildcard} \\
adamc@526 252 &&& x & \textrm{variable} \\
adamc@526 253 &&& \ell & \textrm{constant} \\
adamc@526 254 &&& \hat{X} & \textrm{nullary constructor} \\
adamc@526 255 &&& \hat{X} \; p & \textrm{unary constructor} \\
adamc@526 256 &&& \{(x = p,)^*\} & \textrm{rigid record pattern} \\
adamc@526 257 &&& \{(x = p,)^+, \ldots\} & \textrm{flexible record pattern} \\
adamc@527 258 &&& (p) & \textrm{explicit precedence} \\
adamc@526 259 \\
adamc@529 260 \textrm{Qualified capitalized variables} & \hat{X} &::=& X & \textrm{not from a module} \\
adamc@526 261 &&& M.X & \textrm{projection from a module} \\
adamc@526 262 \end{array}$$
adamc@526 263
adamc@527 264 \emph{Expressions} are the main run-time entities, corresponding to both ``expressions'' and ``statements'' in mainstream imperative languages.
adamc@527 265 $$\begin{array}{rrcll}
adamc@527 266 \textrm{Expressions} & e &::=& e : \tau & \textrm{type annotation} \\
adamc@529 267 &&& \hat{x} & \textrm{variable} \\
adamc@529 268 &&& \hat{X} & \textrm{datatype constructor} \\
adamc@527 269 &&& \ell & \textrm{constant} \\
adamc@527 270 \\
adamc@527 271 &&& e \; e & \textrm{function application} \\
adamc@527 272 &&& \lambda x : \tau \Rightarrow e & \textrm{function abstraction} \\
adamc@527 273 &&& e [c] & \textrm{polymorphic function application} \\
adamc@527 274 &&& \lambda x \; ? \; \kappa \Rightarrow e & \textrm{polymorphic function abstraction} \\
adamc@527 275 \\
adamc@527 276 &&& \{(c = e,)^*\} & \textrm{known-length record} \\
adamc@527 277 &&& e.c & \textrm{record field projection} \\
adamc@527 278 &&& e \rc e & \textrm{record concatenation} \\
adamc@527 279 &&& e \rcut c & \textrm{removal of a single record field} \\
adamc@527 280 &&& e \rcutM c & \textrm{removal of multiple record fields} \\
adamc@527 281 &&& \mt{fold} & \textrm{fold over fields of a type-level record} \\
adamc@527 282 \\
adamc@527 283 &&& \mt{let} \; ed^* \; \mt{in} \; e \; \mt{end} & \textrm{local definitions} \\
adamc@527 284 \\
adamc@527 285 &&& \mt{case} \; e \; \mt{of} \; (p \Rightarrow e|)^+ & \textrm{pattern matching} \\
adamc@527 286 \\
adamc@527 287 &&& \lambda [c \sim c] \Rightarrow e & \textrm{guarded expression} \\
adamc@527 288 \\
adamc@527 289 &&& \_ & \textrm{wildcard} \\
adamc@527 290 &&& (e) & \textrm{explicit precedence} \\
adamc@527 291 \\
adamc@527 292 \textrm{Local declarations} & ed &::=& \cd{val} \; x : \tau = e & \textrm{non-recursive value} \\
adamc@527 293 &&& \cd{val} \; \cd{rec} \; (x : \tau = e \; \cd{and})^+ & \textrm{mutually-recursive values} \\
adamc@527 294 \end{array}$$
adamc@527 295
adamc@528 296 \emph{Declarations} primarily bring new symbols into context.
adamc@528 297 $$\begin{array}{rrcll}
adamc@528 298 \textrm{Declarations} & d &::=& \mt{con} \; x :: \kappa = c & \textrm{constructor synonym} \\
adamc@528 299 &&& \mt{datatype} \; x \; x^* = dc\mid^+ & \textrm{algebraic datatype definition} \\
adamc@529 300 &&& \mt{datatype} \; x = \mt{datatype} \; M.x & \textrm{algebraic datatype import} \\
adamc@528 301 &&& \mt{val} \; x : \tau = e & \textrm{value} \\
adamc@528 302 &&& \mt{val} \; \cd{rec} \; (x : \tau = e \; \mt{and})^+ & \textrm{mutually-recursive values} \\
adamc@528 303 &&& \mt{structure} \; X : S = M & \textrm{module definition} \\
adamc@528 304 &&& \mt{signature} \; X = S & \textrm{signature definition} \\
adamc@528 305 &&& \mt{open} \; M & \textrm{module inclusion} \\
adamc@528 306 &&& \mt{constraint} \; c \sim c & \textrm{record disjointness constraint} \\
adamc@528 307 &&& \mt{open} \; \mt{constraints} \; M & \textrm{inclusion of just the constraints from a module} \\
adamc@528 308 &&& \mt{table} \; x : c & \textrm{SQL table} \\
adamc@528 309 &&& \mt{sequence} \; x & \textrm{SQL sequence} \\
adamc@535 310 &&& \mt{cookie} \; x : \tau & \textrm{HTTP cookie} \\
adamc@528 311 &&& \mt{class} \; x = c & \textrm{concrete type class} \\
adamc@528 312 \\
adamc@529 313 \textrm{Modules} & M &::=& \mt{struct} \; d^* \; \mt{end} & \textrm{constant} \\
adamc@529 314 &&& X & \textrm{variable} \\
adamc@529 315 &&& M.X & \textrm{projection} \\
adamc@529 316 &&& M(M) & \textrm{functor application} \\
adamc@529 317 &&& \mt{functor}(X : S) : S = M & \textrm{functor abstraction} \\
adamc@528 318 \end{array}$$
adamc@528 319
adamc@528 320 There are two kinds of Ur files. A file named $M\texttt{.ur}$ is an \emph{implementation file}, and it should contain a sequence of declarations $d^*$. A file named $M\texttt{.urs}$ is an \emph{interface file}; it must always have a matching $M\texttt{.ur}$ and should contain a sequence of signature items $s^*$. When both files are present, the overall effect is the same as a monolithic declaration $\mt{structure} \; M : \mt{sig} \; s^* \; \mt{end} = \mt{struct} \; d^* \; \mt{end}$. When no interface file is included, the overall effect is similar, with a signature for module $M$ being inferred rather than just checked against an interface.
adamc@527 321
adamc@529 322 \subsection{Shorthands}
adamc@529 323
adamc@529 324 There are a variety of derived syntactic forms that elaborate into the core syntax from the last subsection. We will present the additional forms roughly following the order in which we presented the constructs that they elaborate into.
adamc@529 325
adamc@529 326 In many contexts where record fields are expected, like in a projection $e.c$, a constant field may be written as simply $X$, rather than $\#X$.
adamc@529 327
adamc@529 328 A record type may be written $\{(c = c,)^*\}$, which elaborates to $\$[(c = c,)^*]$.
adamc@529 329
adamc@533 330 The notation $[c_1, \ldots, c_n]$ is shorthand for $[c_1 = (), \ldots, c_n = ()]$.
adamc@533 331
adamc@529 332 A tuple type $(\tau_1, \ldots, \tau_n)$ expands to a record type $\{1 = \tau_1, \ldots, n = \tau_n\}$, with natural numbers as field names. A tuple pattern $(p_1, \ldots, p_n)$ expands to a rigid record pattern $\{1 = p_1, \ldots, n = p_n\}$. Positive natural numbers may be used in most places where field names would be allowed.
adamc@529 333
adamc@529 334 In general, several adjacent $\lambda$ forms may be combined into one, and kind and type annotations may be omitted, in which case they are implicitly included as wildcards. More formally, for constructor-level abstractions, we can define a new non-terminal $b ::= x \mid (x :: \kappa) \mid [c \sim c]$ and allow composite abstractions of the form $\lambda b^+ \Rightarrow c$, elaborating into the obvious sequence of one core $\lambda$ per element of $b^+$.
adamc@529 335
adamc@529 336 For any signature item or declaration that defines some entity to be equal to $A$ with classification annotation $B$ (e.g., $\mt{val} \; x : B = A$), $B$ and the preceding colon (or similar punctuation) may be omitted, in which case it is filled in as a wildcard.
adamc@529 337
adamc@529 338 A signature item or declaration $\mt{type} \; x$ or $\mt{type} \; x = \tau$ is elaborated into $\mt{con} \; x :: \mt{Type}$ or $\mt{con} \; x :: \mt{Type} = \tau$, respectively.
adamc@529 339
adamc@529 340 A signature item or declaration $\mt{class} \; x = \lambda y :: \mt{Type} \Rightarrow c$ may be abbreviated $\mt{class} \; x \; y = c$.
adamc@529 341
adamc@529 342 Handling of implicit and explicit constructor arguments may be tweaked with some prefixes to variable references. An expression $@x$ is a version of $x$ where all implicit constructor arguments have been made explicit. An expression $@@x$ achieves the same effect, additionally halting automatic resolution of type class instances. The same syntax works for variables projected out of modules and for capitalized variables (datatype constructors).
adamc@529 343
adamc@529 344 At the expression level, an analogue is available of the composite $\lambda$ form for constructors. We define the language of binders as $b ::= x \mid (x : \tau) \mid (x \; ? \; \kappa) \mid [c \sim c]$. A lone variable $x$ as a binder stands for an expression variable of unspecified type.
adamc@529 345
adamc@529 346 A $\mt{val}$ or $\mt{val} \; \mt{rec}$ declaration may include expression binders before the equal sign, following the binder grammar from the last paragraph. Such declarations are elaborated into versions that add additional $\lambda$s to the fronts of the righthand sides, as appropriate. The keyword $\mt{fun}$ is a synonym for $\mt{val} \; \mt{rec}$.
adamc@529 347
adamc@529 348 A signature item $\mt{functor} \; X_1 \; (X_2 : S_1) : S_2$ is elaborated into $\mt{structure} \; X_1 : \mt{functor}(X_2 : S_1) : S_2$. A declaration $\mt{functor} \; X_1 \; (X_2 : S_1) : S_2 = M$ is elaborated into $\mt{structure} \; X_1 : \mt{functor}(X_2 : S_1) : S_2 = \mt{functor}(X_2 : S_1) : S_2 = M$.
adamc@529 349
adamc@529 350 A declaration $\mt{table} \; x : \{(c = c,)^*\}$ is elaborated into $\mt{table} \; x : [(c = c,)^*]$
adamc@529 351
adamc@529 352 The syntax $\mt{where} \; \mt{type}$ is an alternate form of $\mt{where} \; \mt{con}$.
adamc@529 353
adamc@529 354 The syntax $\mt{if} \; e \; \mt{then} \; e_1 \; \mt{else} \; e_2$ expands to $\mt{case} \; e \; \mt{of} \; \mt{Basis}.\mt{True} \Rightarrow e_1 \mid \mt{Basis}.\mt{False} \Rightarrow e_2$.
adamc@529 355
adamc@529 356 There are infix operator syntaxes for a number of functions defined in the $\mt{Basis}$ module. There is $=$ for $\mt{eq}$, $\neq$ for $\mt{neq}$, $-$ for $\mt{neg}$ (as a prefix operator) and $\mt{minus}$, $+$ for $\mt{plus}$, $\times$ for $\mt{times}$, $/$ for $\mt{div}$, $\%$ for $\mt{mod}$, $<$ for $\mt{lt}$, $\leq$ for $\mt{le}$, $>$ for $\mt{gt}$, and $\geq$ for $\mt{ge}$.
adamc@529 357
adamc@529 358 A signature item $\mt{table} \; x : c$ is shorthand for $\mt{val} \; x : \mt{Basis}.\mt{sql\_table} \; c$. $\mt{sequence} \; x$ is short for $\mt{val} \; x : \mt{Basis}.\mt{sql\_sequence}$, and $\mt{cookie} \; x : \tau$ is shorthand for $\mt{val} \; x : \mt{Basis}.\mt{http\_cookie} \; \tau$.
adamc@529 359
adamc@530 360
adamc@530 361 \section{Static Semantics}
adamc@530 362
adamc@530 363 In this section, we give a declarative presentation of Ur's typing rules and related judgments. Inference is the subject of the next section; here, we assume that an oracle has filled in all wildcards with concrete values.
adamc@530 364
adamc@530 365 Since there is significant mutual recursion among the judgments, we introduce them all before beginning to give rules. We use the same variety of contexts throughout this section, implicitly introducing new sorts of context entries as needed.
adamc@530 366 \begin{itemize}
adamc@530 367 \item $\Gamma \vdash c :: \kappa$ assigns a kind to a constructor in a context.
adamc@530 368 \item $\Gamma \vdash c \sim c$ proves the disjointness of two record constructors; that is, that they share no field names. We overload the judgment to apply to pairs of field names as well.
adamc@531 369 \item $\Gamma \vdash c \hookrightarrow C$ proves that record constructor $c$ decomposes into set $C$ of field names and record constructors.
adamc@530 370 \item $\Gamma \vdash c \equiv c$ proves the computational equivalence of two constructors. This is often called a \emph{definitional equality} in the world of type theory.
adamc@530 371 \item $\Gamma \vdash e : \tau$ is a standard typing judgment.
adamc@534 372 \item $\Gamma \vdash p \leadsto \Gamma; \tau$ combines typing of patterns with calculation of which new variables they bind.
adamc@537 373 \item $\Gamma \vdash d \leadsto \Gamma$ expresses how a declaration modifies a context. We overload this judgment to apply to sequences of declarations, as well as to signature items and sequences of signature items.
adamc@537 374 \item $\Gamma \vdash S \equiv S$ is the signature equivalence judgment.
adamc@536 375 \item $\Gamma \vdash S \leq S$ is the signature compatibility judgment. We write $\Gamma \vdash S$ as shorthand for $\Gamma \vdash S \leq S$.
adamc@530 376 \item $\Gamma \vdash M : S$ is the module signature checking judgment.
adamc@537 377 \item $\mt{proj}(M, \overline{s}, V)$ is a partial function for projecting a signature item from $\overline{s}$, given the module $M$ that we project from. $V$ may be $\mt{con} \; x$, $\mt{datatype} \; x$, $\mt{val} \; x$, $\mt{signature} \; X$, or $\mt{structure} \; X$. The parameter $M$ is needed because the projected signature item may refer to other items from $\overline{s}$.
adamc@539 378 \item $\mt{selfify}(M, \overline{s})$ adds information to signature items $\overline{s}$ to reflect the fact that we are concerned with the particular module $M$. This function is overloaded to work over individual signature items as well.
adamc@530 379 \end{itemize}
adamc@530 380
adamc@530 381 \subsection{Kinding}
adamc@530 382
adamc@530 383 $$\infer{\Gamma \vdash (c) :: \kappa :: \kappa}{
adamc@530 384 \Gamma \vdash c :: \kappa
adamc@530 385 }
adamc@530 386 \quad \infer{\Gamma \vdash x :: \kappa}{
adamc@530 387 x :: \kappa \in \Gamma
adamc@530 388 }
adamc@530 389 \quad \infer{\Gamma \vdash x :: \kappa}{
adamc@530 390 x :: \kappa = c \in \Gamma
adamc@530 391 }$$
adamc@530 392
adamc@530 393 $$\infer{\Gamma \vdash M.x :: \kappa}{
adamc@537 394 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 395 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = \kappa
adamc@530 396 }
adamc@530 397 \quad \infer{\Gamma \vdash M.x :: \kappa}{
adamc@537 398 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 399 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = (\kappa, c)
adamc@530 400 }$$
adamc@530 401
adamc@530 402 $$\infer{\Gamma \vdash \tau_1 \to \tau_2 :: \mt{Type}}{
adamc@530 403 \Gamma \vdash \tau_1 :: \mt{Type}
adamc@530 404 & \Gamma \vdash \tau_2 :: \mt{Type}
adamc@530 405 }
adamc@530 406 \quad \infer{\Gamma \vdash x \; ? \: \kappa \to \tau :: \mt{Type}}{
adamc@530 407 \Gamma, x :: \kappa \vdash \tau :: \mt{Type}
adamc@530 408 }
adamc@530 409 \quad \infer{\Gamma \vdash \$c :: \mt{Type}}{
adamc@530 410 \Gamma \vdash c :: \{\mt{Type}\}
adamc@530 411 }$$
adamc@530 412
adamc@530 413 $$\infer{\Gamma \vdash c_1 \; c_2 :: \kappa_2}{
adamc@530 414 \Gamma \vdash c_1 :: \kappa_1 \to \kappa_2
adamc@530 415 & \Gamma \vdash c_2 :: \kappa_1
adamc@530 416 }
adamc@530 417 \quad \infer{\Gamma \vdash \lambda x \; :: \; \kappa_1 \Rightarrow c :: \kappa_1 \to \kappa_2}{
adamc@530 418 \Gamma, x :: \kappa_1 \vdash c :: \kappa_2
adamc@530 419 }$$
adamc@530 420
adamc@530 421 $$\infer{\Gamma \vdash () :: \mt{Unit}}{}
adamc@530 422 \quad \infer{\Gamma \vdash \#X :: \mt{Name}}{}$$
adamc@530 423
adamc@530 424 $$\infer{\Gamma \vdash [\overline{c_i = c'_i}] :: \{\kappa\}}{
adamc@530 425 \forall i: \Gamma \vdash c_i : \mt{Name}
adamc@530 426 & \Gamma \vdash c'_i :: \kappa
adamc@530 427 & \forall i \neq j: \Gamma \vdash c_i \sim c_j
adamc@530 428 }
adamc@530 429 \quad \infer{\Gamma \vdash c_1 \rc c_2 :: \{\kappa\}}{
adamc@530 430 \Gamma \vdash c_1 :: \{\kappa\}
adamc@530 431 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@530 432 & \Gamma \vdash c_1 \sim c_2
adamc@530 433 }$$
adamc@530 434
adamc@530 435 $$\infer{\Gamma \vdash \mt{fold} :: ((\mt{Name} \to \kappa_1 \to \kappa_2 \to \kappa_2) \to \kappa_2 \to \{\kappa_1\} \to \kappa_2}{}$$
adamc@530 436
adamc@530 437 $$\infer{\Gamma \vdash (\overline c) :: (k_1 \times \ldots \times k_n)}{
adamc@530 438 \forall i: \Gamma \vdash c_i :: k_i
adamc@530 439 }
adamc@530 440 \quad \infer{\Gamma \vdash c.i :: k_i}{
adamc@530 441 \Gamma \vdash c :: (k_1 \times \ldots \times k_n)
adamc@530 442 }$$
adamc@530 443
adamc@530 444 $$\infer{\Gamma \vdash \lambda [c_1 \sim c_2] \Rightarrow c :: \kappa}{
adamc@530 445 \Gamma \vdash c_1 :: \{\kappa'\}
adamc@530 446 & \Gamma \vdash c_2 :: \{\kappa'\}
adamc@530 447 & \Gamma, c_1 \sim c_2 \vdash c :: \kappa
adamc@530 448 }$$
adamc@530 449
adamc@531 450 \subsection{Record Disjointness}
adamc@531 451
adamc@531 452 We will use a keyword $\mt{map}$ as a shorthand, such that, for $f$ of kind $\kappa \to \kappa'$, $\mt{map} \; f$ stands for $\mt{fold} \; (\lambda (x_1 :: \mt{Name}) (x_2 :: \kappa) (x_3 :: \{\kappa'\}) \Rightarrow [x_1 = f \; x_2] \rc x_3) \; []$.
adamc@531 453
adamc@531 454 $$\infer{\Gamma \vdash c_1 \sim c_2}{
adamc@531 455 \Gamma \vdash c_1 \hookrightarrow c'_1
adamc@531 456 & \Gamma \vdash c_2 \hookrightarrow c'_2
adamc@531 457 & \forall c''_1 \in c'_1, c''_2 \in c'_2: \Gamma \vdash c''_1 \sim c''_2
adamc@531 458 }
adamc@531 459 \quad \infer{\Gamma \vdash X \sim X'}{
adamc@531 460 X \neq X'
adamc@531 461 }$$
adamc@531 462
adamc@531 463 $$\infer{\Gamma \vdash c_1 \sim c_2}{
adamc@531 464 c'_1 \sim c'_2 \in \Gamma
adamc@531 465 & \Gamma \vdash c'_1 \hookrightarrow c''_1
adamc@531 466 & \Gamma \vdash c'_2 \hookrightarrow c''_2
adamc@531 467 & c_1 \in c''_1
adamc@531 468 & c_2 \in c''_2
adamc@531 469 }$$
adamc@531 470
adamc@531 471 $$\infer{\Gamma \vdash c \hookrightarrow \{c\}}{}
adamc@531 472 \quad \infer{\Gamma \vdash [\overline{c = c'}] \hookrightarrow \{\overline{c}\}}{}
adamc@531 473 \quad \infer{\Gamma \vdash c_1 \rc c_2 \hookrightarrow C_1 \cup C_2}{
adamc@531 474 \Gamma \vdash c_1 \hookrightarrow C_1
adamc@531 475 & \Gamma \vdash c_2 \hookrightarrow C_2
adamc@531 476 }
adamc@531 477 \quad \infer{\Gamma \vdash c \hookrightarrow C}{
adamc@531 478 \Gamma \vdash c \equiv c'
adamc@531 479 & \Gamma \vdash c' \hookrightarrow C
adamc@531 480 }
adamc@531 481 \quad \infer{\Gamma \vdash \mt{map} \; f \; c \hookrightarrow C}{
adamc@531 482 \Gamma \vdash c \hookrightarrow C
adamc@531 483 }$$
adamc@531 484
adamc@541 485 \subsection{\label{definitional}Definitional Equality}
adamc@532 486
adamc@532 487 We use $\mathcal C$ to stand for a one-hole context that, when filled, yields a constructor. The notation $\mathcal C[c]$ plugs $c$ into $\mathcal C$. We omit the standard definition of one-hole contexts. We write $[x \mapsto c_1]c_2$ for capture-avoiding substitution of $c_1$ for $x$ in $c_2$.
adamc@532 488
adamc@532 489 $$\infer{\Gamma \vdash c \equiv c}{}
adamc@532 490 \quad \infer{\Gamma \vdash c_1 \equiv c_2}{
adamc@532 491 \Gamma \vdash c_2 \equiv c_1
adamc@532 492 }
adamc@532 493 \quad \infer{\Gamma \vdash c_1 \equiv c_3}{
adamc@532 494 \Gamma \vdash c_1 \equiv c_2
adamc@532 495 & \Gamma \vdash c_2 \equiv c_3
adamc@532 496 }
adamc@532 497 \quad \infer{\Gamma \vdash \mathcal C[c_1] \equiv \mathcal C[c_2]}{
adamc@532 498 \Gamma \vdash c_1 \equiv c_2
adamc@532 499 }$$
adamc@532 500
adamc@532 501 $$\infer{\Gamma \vdash x \equiv c}{
adamc@532 502 x :: \kappa = c \in \Gamma
adamc@532 503 }
adamc@532 504 \quad \infer{\Gamma \vdash M.x \equiv c}{
adamc@537 505 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 506 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = (\kappa, c)
adamc@532 507 }
adamc@532 508 \quad \infer{\Gamma \vdash (\overline c).i \equiv c_i}{}$$
adamc@532 509
adamc@532 510 $$\infer{\Gamma \vdash (\lambda x :: \kappa \Rightarrow c) \; c' \equiv [x \mapsto c'] c}{}
adamc@532 511 \quad \infer{\Gamma \vdash c_1 \rc c_2 \equiv c_2 \rc c_1}{}
adamc@532 512 \quad \infer{\Gamma \vdash c_1 \rc (c_2 \rc c_3) \equiv (c_1 \rc c_2) \rc c_3}{}$$
adamc@532 513
adamc@532 514 $$\infer{\Gamma \vdash [] \rc c \equiv c}{}
adamc@532 515 \quad \infer{\Gamma \vdash [\overline{c_1 = c'_1}] \rc [\overline{c_2 = c'_2}] \equiv [\overline{c_1 = c'_1}, \overline{c_2 = c'_2}]}{}$$
adamc@532 516
adamc@532 517 $$\infer{\Gamma \vdash \lambda [c_1 \sim c_2] \Rightarrow c \equiv c}{
adamc@532 518 \Gamma \vdash c_1 \sim c_2
adamc@532 519 }
adamc@532 520 \quad \infer{\Gamma \vdash \mt{fold} \; f \; i \; [] \equiv i}{}
adamc@532 521 \quad \infer{\Gamma \vdash \mt{fold} \; f \; i \; ([c_1 = c_2] \rc c) \equiv f \; c_1 \; c_2 \; (\mt{fold} \; f \; i \; c)}{}$$
adamc@532 522
adamc@532 523 $$\infer{\Gamma \vdash \mt{map} \; (\lambda x \Rightarrow x) \; c \equiv c}{}
adamc@532 524 \quad \infer{\Gamma \vdash \mt{fold} \; f \; i \; (\mt{map} \; f' \; c)
adamc@532 525 \equiv \mt{fold} \; (\lambda (x_1 :: \mt{Name}) (x_2 :: \kappa) \Rightarrow f \; x_1 \; (f' \; x_2)) \; i \; c}{}$$
adamc@532 526
adamc@532 527 $$\infer{\Gamma \vdash \mt{map} \; f \; (c_1 \rc c_2) \equiv \mt{map} \; f \; c_1 \rc \mt{map} \; f \; c_2}{}$$
adamc@531 528
adamc@534 529 \subsection{Expression Typing}
adamc@533 530
adamc@533 531 We assume the existence of a function $T$ assigning types to literal constants. It maps integer constants to $\mt{Basis}.\mt{int}$, float constants to $\mt{Basis}.\mt{float}$, and string constants to $\mt{Basis}.\mt{string}$.
adamc@533 532
adamc@533 533 We also refer to a function $\mathcal I$, such that $\mathcal I(\tau)$ ``uses an oracle'' to instantiate all constructor function arguments at the beginning of $\tau$ that are marked implicit; i.e., replace $x_1 ::: \kappa_1 \to \ldots \to x_n ::: \kappa_n \to \tau$ with $[x_1 \mapsto c_1]\ldots[x_n \mapsto c_n]\tau$, where the $c_i$s are inferred and $\tau$ does not start like $x ::: \kappa \to \tau'$.
adamc@533 534
adamc@533 535 $$\infer{\Gamma \vdash e : \tau : \tau}{
adamc@533 536 \Gamma \vdash e : \tau
adamc@533 537 }
adamc@533 538 \quad \infer{\Gamma \vdash e : \tau}{
adamc@533 539 \Gamma \vdash e : \tau'
adamc@533 540 & \Gamma \vdash \tau' \equiv \tau
adamc@533 541 }
adamc@533 542 \quad \infer{\Gamma \vdash \ell : T(\ell)}{}$$
adamc@533 543
adamc@533 544 $$\infer{\Gamma \vdash x : \mathcal I(\tau)}{
adamc@533 545 x : \tau \in \Gamma
adamc@533 546 }
adamc@533 547 \quad \infer{\Gamma \vdash M.x : \mathcal I(\tau)}{
adamc@537 548 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 549 & \mt{proj}(M, \overline{s}, \mt{val} \; x) = \tau
adamc@533 550 }
adamc@533 551 \quad \infer{\Gamma \vdash X : \mathcal I(\tau)}{
adamc@533 552 X : \tau \in \Gamma
adamc@533 553 }
adamc@533 554 \quad \infer{\Gamma \vdash M.X : \mathcal I(\tau)}{
adamc@537 555 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 556 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \tau
adamc@533 557 }$$
adamc@533 558
adamc@533 559 $$\infer{\Gamma \vdash e_1 \; e_2 : \tau_2}{
adamc@533 560 \Gamma \vdash e_1 : \tau_1 \to \tau_2
adamc@533 561 & \Gamma \vdash e_2 : \tau_1
adamc@533 562 }
adamc@533 563 \quad \infer{\Gamma \vdash \lambda x : \tau_1 \Rightarrow e : \tau_1 \to \tau_2}{
adamc@533 564 \Gamma, x : \tau_1 \vdash e : \tau_2
adamc@533 565 }$$
adamc@533 566
adamc@533 567 $$\infer{\Gamma \vdash e [c] : [x \mapsto c]\tau}{
adamc@533 568 \Gamma \vdash e : x :: \kappa \to \tau
adamc@533 569 & \Gamma \vdash c :: \kappa
adamc@533 570 }
adamc@533 571 \quad \infer{\Gamma \vdash \lambda x \; ? \; \kappa \Rightarrow e : x \; ? \; \kappa \to \tau}{
adamc@533 572 \Gamma, x :: \kappa \vdash e : \tau
adamc@533 573 }$$
adamc@533 574
adamc@533 575 $$\infer{\Gamma \vdash \{\overline{c = e}\} : \{\overline{c : \tau}\}}{
adamc@533 576 \forall i: \Gamma \vdash c_i :: \mt{Name}
adamc@533 577 & \Gamma \vdash e_i : \tau_i
adamc@533 578 & \forall i \neq j: \Gamma \vdash c_i \sim c_j
adamc@533 579 }
adamc@533 580 \quad \infer{\Gamma \vdash e.c : \tau}{
adamc@533 581 \Gamma \vdash e : \$([c = \tau] \rc c')
adamc@533 582 }
adamc@533 583 \quad \infer{\Gamma \vdash e_1 \rc e_2 : \$(c_1 \rc c_2)}{
adamc@533 584 \Gamma \vdash e_1 : \$c_1
adamc@533 585 & \Gamma \vdash e_2 : \$c_2
adamc@533 586 }$$
adamc@533 587
adamc@533 588 $$\infer{\Gamma \vdash e \rcut c : \$c'}{
adamc@533 589 \Gamma \vdash e : \$([c = \tau] \rc c')
adamc@533 590 }
adamc@533 591 \quad \infer{\Gamma \vdash e \rcutM c : \$c'}{
adamc@533 592 \Gamma \vdash e : \$(c \rc c')
adamc@533 593 }$$
adamc@533 594
adamc@533 595 $$\infer{\Gamma \vdash \mt{fold} : \begin{array}{c}
adamc@533 596 x_1 :: (\{\kappa\} \to \tau)
adamc@533 597 \to (x_2 :: \mt{Name} \to x_3 :: \kappa \to x_4 :: \{\kappa\} \to \lambda [[x_2] \sim x_4]
adamc@533 598 \Rightarrow x_1 \; x_4 \to x_1 \; ([x_2 = x_3] \rc x_4)) \\
adamc@533 599 \to x_1 \; [] \to x_5 :: \{\kappa\} \to x_1 \; x_5
adamc@533 600 \end{array}}{}$$
adamc@533 601
adamc@533 602 $$\infer{\Gamma \vdash \mt{let} \; \overline{ed} \; \mt{in} \; e \; \mt{end} : \tau}{
adamc@533 603 \Gamma \vdash \overline{ed} \leadsto \Gamma'
adamc@533 604 & \Gamma' \vdash e : \tau
adamc@533 605 }
adamc@533 606 \quad \infer{\Gamma \vdash \mt{case} \; e \; \mt{of} \; \overline{p \Rightarrow e} : \tau}{
adamc@533 607 \forall i: \Gamma \vdash p_i \leadsto \Gamma_i, \tau'
adamc@533 608 & \Gamma_i \vdash e_i : \tau
adamc@533 609 }$$
adamc@533 610
adamc@533 611 $$\infer{\Gamma \vdash [c_1 \sim c_2] \Rightarrow e : [c_1 \sim c_2] \Rightarrow \tau}{
adamc@533 612 \Gamma \vdash c_1 :: \{\kappa\}
adamc@533 613 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@533 614 & \Gamma, c_1 \sim c_2 \vdash e : \tau
adamc@533 615 }$$
adamc@533 616
adamc@534 617 \subsection{Pattern Typing}
adamc@534 618
adamc@534 619 $$\infer{\Gamma \vdash \_ \leadsto \Gamma; \tau}{}
adamc@534 620 \quad \infer{\Gamma \vdash x \leadsto \Gamma, x : \tau; \tau}{}
adamc@534 621 \quad \infer{\Gamma \vdash \ell \leadsto \Gamma; T(\ell)}{}$$
adamc@534 622
adamc@534 623 $$\infer{\Gamma \vdash X \leadsto \Gamma; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@534 624 X : \overline{x ::: \mt{Type}} \to \tau \in \Gamma
adamc@534 625 & \textrm{$\tau$ not a function type}
adamc@534 626 }
adamc@534 627 \quad \infer{\Gamma \vdash X \; p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@534 628 X : \overline{x ::: \mt{Type}} \to \tau'' \to \tau \in \Gamma
adamc@534 629 & \Gamma \vdash p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau''
adamc@534 630 }$$
adamc@534 631
adamc@534 632 $$\infer{\Gamma \vdash M.X \leadsto \Gamma; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@537 633 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 634 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \overline{x ::: \mt{Type}} \to \tau
adamc@534 635 & \textrm{$\tau$ not a function type}
adamc@534 636 }$$
adamc@534 637
adamc@534 638 $$\infer{\Gamma \vdash M.X \; p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@537 639 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 640 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \overline{x ::: \mt{Type}} \to \tau'' \to \tau
adamc@534 641 & \Gamma \vdash p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau''
adamc@534 642 }$$
adamc@534 643
adamc@534 644 $$\infer{\Gamma \vdash \{\overline{x = p}\} \leadsto \Gamma_n; \{\overline{x = \tau}\}}{
adamc@534 645 \Gamma_0 = \Gamma
adamc@534 646 & \forall i: \Gamma_i \vdash p_i \leadsto \Gamma_{i+1}; \tau_i
adamc@534 647 }
adamc@534 648 \quad \infer{\Gamma \vdash \{\overline{x = p}, \ldots\} \leadsto \Gamma_n; \$([\overline{x = \tau}] \rc c)}{
adamc@534 649 \Gamma_0 = \Gamma
adamc@534 650 & \forall i: \Gamma_i \vdash p_i \leadsto \Gamma_{i+1}; \tau_i
adamc@534 651 }$$
adamc@534 652
adamc@535 653 \subsection{Declaration Typing}
adamc@535 654
adamc@535 655 We use an auxiliary judgment $\overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'$, expressing the enrichment of $\Gamma$ with the types of the datatype constructors $\overline{dc}$, when they are known to belong to datatype $x$ with type parameters $\overline{y}$.
adamc@535 656
adamc@541 657 This is the first judgment where we deal with type classes, for the $\mt{class}$ declaration form. We will omit their special handling in this formal specification. Section \ref{typeclasses} gives an informal description of how type classes influence type inference.
adamc@535 658
adamc@537 659 We presuppose the existence of a function $\mathcal O$, where $\mathcal(M, \overline{s})$ implements the $\mt{open}$ declaration by producing a context with the appropriate entry for each available component of module $M$ with signature items $\overline{s}$. Where possible, $\mathcal O$ uses ``transparent'' entries (e.g., an abstract type $M.x$ is mapped to $x :: \mt{Type} = M.x$), so that the relationship with $M$ is maintained. A related function $\mathcal O_c$ builds a context containing the disjointness constraints found in $S$.
adamc@537 660
adamc@537 661 We write $\kappa_1^n \to \kappa$ as a shorthand, where $\kappa_1^0 \to \kappa = \kappa$ and $\kappa_1^{n+1} \to \kappa_2 = \kappa_1 \to (\kappa_1^n \to \kappa_2)$. We write $\mt{len}(\overline{y})$ for the length of vector $\overline{y}$ of variables.
adamc@535 662
adamc@535 663 $$\infer{\Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@535 664 \quad \infer{\Gamma \vdash d, \overline{d} \leadsto \Gamma''}{
adamc@535 665 \Gamma \vdash d \leadsto \Gamma'
adamc@535 666 & \Gamma' \vdash \overline{d} \leadsto \Gamma''
adamc@535 667 }$$
adamc@535 668
adamc@535 669 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leadsto \Gamma, x :: \kappa = c}{
adamc@535 670 \Gamma \vdash c :: \kappa
adamc@535 671 }
adamc@535 672 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leadsto \Gamma'}{
adamc@535 673 \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} \vdash \overline{dc} \leadsto \Gamma'
adamc@535 674 }$$
adamc@535 675
adamc@535 676 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leadsto \Gamma'}{
adamc@537 677 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 678 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@535 679 & \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} = M.z \vdash \overline{dc} \leadsto \Gamma'
adamc@535 680 }$$
adamc@535 681
adamc@535 682 $$\infer{\Gamma \vdash \mt{val} \; x : \tau = e \leadsto \Gamma, x : \tau}{
adamc@535 683 \Gamma \vdash e : \tau
adamc@535 684 }$$
adamc@535 685
adamc@535 686 $$\infer{\Gamma \vdash \mt{val} \; \mt{rec} \; \overline{x : \tau = e} \leadsto \Gamma, \overline{x : \tau}}{
adamc@535 687 \forall i: \Gamma, \overline{x : \tau} \vdash e_i : \tau_i
adamc@535 688 & \textrm{$e_i$ starts with an expression $\lambda$, optionally preceded by constructor and disjointness $\lambda$s}
adamc@535 689 }$$
adamc@535 690
adamc@535 691 $$\infer{\Gamma \vdash \mt{structure} \; X : S = M \leadsto \Gamma, X : S}{
adamc@535 692 \Gamma \vdash M : S
adamc@539 693 & \textrm{ ($M$ not a $\mt{struct} \; \ldots \; \mt{end}$)}
adamc@535 694 }
adamc@539 695 \quad \infer{\Gamma \vdash \mt{structure} \; X : S = \mt{struct} \; \overline{d} \; \mt{end} \leadsto \Gamma, X : \mt{selfify}(X, \overline{s})}{
adamc@539 696 \Gamma \vdash \mt{struct} \; \overline{d} \; \mt{end} : \mt{sig} \; \overline{s} \; \mt{end}
adamc@539 697 }$$
adamc@539 698
adamc@539 699 $$\infer{\Gamma \vdash \mt{signature} \; X = S \leadsto \Gamma, X = S}{
adamc@535 700 \Gamma \vdash S
adamc@535 701 }$$
adamc@535 702
adamc@537 703 $$\infer{\Gamma \vdash \mt{open} \; M \leadsto \Gamma, \mathcal O(M, \overline{s})}{
adamc@537 704 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@535 705 }$$
adamc@535 706
adamc@535 707 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leadsto \Gamma}{
adamc@535 708 \Gamma \vdash c_1 :: \{\kappa\}
adamc@535 709 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@535 710 & \Gamma \vdash c_1 \sim c_2
adamc@535 711 }
adamc@537 712 \quad \infer{\Gamma \vdash \mt{open} \; \mt{constraints} \; M \leadsto \Gamma, \mathcal O_c(M, \overline{s})}{
adamc@537 713 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@535 714 }$$
adamc@535 715
adamc@535 716 $$\infer{\Gamma \vdash \mt{table} \; x : c \leadsto \Gamma, x : \mt{Basis}.\mt{sql\_table} \; c}{
adamc@535 717 \Gamma \vdash c :: \{\mt{Type}\}
adamc@535 718 }
adamc@535 719 \quad \infer{\Gamma \vdash \mt{sequence} \; x \leadsto \Gamma, x : \mt{Basis}.\mt{sql\_sequence}}{}$$
adamc@535 720
adamc@535 721 $$\infer{\Gamma \vdash \mt{cookie} \; x : \tau \leadsto \Gamma, x : \mt{Basis}.\mt{http\_cookie} \; \tau}{
adamc@535 722 \Gamma \vdash \tau :: \mt{Type}
adamc@535 723 }$$
adamc@535 724
adamc@535 725 $$\infer{\Gamma \vdash \mt{class} \; x = c \leadsto \Gamma, x :: \mt{Type} \to \mt{Type} = c}{
adamc@535 726 \Gamma \vdash c :: \mt{Type} \to \mt{Type}
adamc@535 727 }$$
adamc@535 728
adamc@535 729 $$\infer{\overline{y}; x; \Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@535 730 \quad \infer{\overline{y}; x; \Gamma \vdash X \mid \overline{dc} \leadsto \Gamma', X : \overline{y ::: \mt{Type}} \to x \; \overline{y}}{
adamc@535 731 \overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'
adamc@535 732 }
adamc@535 733 \quad \infer{\overline{y}; x; \Gamma \vdash X \; \mt{of} \; \tau \mid \overline{dc} \leadsto \Gamma', X : \overline{y ::: \mt{Type}} \to \tau \to x \; \overline{y}}{
adamc@535 734 \overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'
adamc@535 735 }$$
adamc@535 736
adamc@537 737 \subsection{Signature Item Typing}
adamc@537 738
adamc@537 739 We appeal to a signature item analogue of the $\mathcal O$ function from the last subsection.
adamc@537 740
adamc@537 741 $$\infer{\Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@537 742 \quad \infer{\Gamma \vdash s, \overline{s} \leadsto \Gamma''}{
adamc@537 743 \Gamma \vdash s \leadsto \Gamma'
adamc@537 744 & \Gamma' \vdash \overline{s} \leadsto \Gamma''
adamc@537 745 }$$
adamc@537 746
adamc@537 747 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa \leadsto \Gamma, x :: \kappa}{}
adamc@537 748 \quad \infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leadsto \Gamma, x :: \kappa = c}{
adamc@537 749 \Gamma \vdash c :: \kappa
adamc@537 750 }
adamc@537 751 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leadsto \Gamma'}{
adamc@537 752 \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} \vdash \overline{dc} \leadsto \Gamma'
adamc@537 753 }$$
adamc@537 754
adamc@537 755 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leadsto \Gamma'}{
adamc@537 756 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 757 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537 758 & \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} = M.z \vdash \overline{dc} \leadsto \Gamma'
adamc@537 759 }$$
adamc@537 760
adamc@537 761 $$\infer{\Gamma \vdash \mt{val} \; x : \tau \leadsto \Gamma, x : \tau}{
adamc@537 762 \Gamma \vdash \tau :: \mt{Type}
adamc@537 763 }$$
adamc@537 764
adamc@537 765 $$\infer{\Gamma \vdash \mt{structure} \; X : S \leadsto \Gamma, X : S}{
adamc@537 766 \Gamma \vdash S
adamc@537 767 }
adamc@537 768 \quad \infer{\Gamma \vdash \mt{signature} \; X = S \leadsto \Gamma, X = S}{
adamc@537 769 \Gamma \vdash S
adamc@537 770 }$$
adamc@537 771
adamc@537 772 $$\infer{\Gamma \vdash \mt{include} \; S \leadsto \Gamma, \mathcal O(\overline{s})}{
adamc@537 773 \Gamma \vdash S
adamc@537 774 & \Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 775 }$$
adamc@537 776
adamc@537 777 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leadsto \Gamma, c_1 \sim c_2}{
adamc@537 778 \Gamma \vdash c_1 :: \{\kappa\}
adamc@537 779 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@537 780 }$$
adamc@537 781
adamc@537 782 $$\infer{\Gamma \vdash \mt{class} \; x = c \leadsto \Gamma, x :: \mt{Type} \to \mt{Type} = c}{
adamc@537 783 \Gamma \vdash c :: \mt{Type} \to \mt{Type}
adamc@537 784 }
adamc@537 785 \quad \infer{\Gamma \vdash \mt{class} \; x \leadsto \Gamma, x :: \mt{Type} \to \mt{Type}}{}$$
adamc@537 786
adamc@536 787 \subsection{Signature Compatibility}
adamc@536 788
adamc@537 789 To simplify the judgments in this section, we assume that all signatures are alpha-varied as necessary to avoid including mmultiple bindings for the same identifier. This is in addition to the usual alpha-variation of locally-bound variables.
adamc@537 790
adamc@537 791 We rely on a judgment $\Gamma \vdash \overline{s} \leq s'$, which expresses the occurrence in signature items $\overline{s}$ of an item compatible with $s'$. We also use a judgment $\Gamma \vdash \overline{dc} \leq \overline{dc}$, which expresses compatibility of datatype definitions.
adamc@537 792
adamc@536 793 $$\infer{\Gamma \vdash S \equiv S}{}
adamc@536 794 \quad \infer{\Gamma \vdash S_1 \equiv S_2}{
adamc@536 795 \Gamma \vdash S_2 \equiv S_1
adamc@536 796 }
adamc@536 797 \quad \infer{\Gamma \vdash X \equiv S}{
adamc@536 798 X = S \in \Gamma
adamc@536 799 }
adamc@536 800 \quad \infer{\Gamma \vdash M.X \equiv S}{
adamc@537 801 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 802 & \mt{proj}(M, \overline{s}, \mt{signature} \; X) = S
adamc@536 803 }$$
adamc@536 804
adamc@536 805 $$\infer{\Gamma \vdash S \; \mt{where} \; \mt{con} \; x = c \equiv \mt{sig} \; \overline{s^1} \; \mt{con} \; x :: \kappa = c \; \overline{s_2} \; \mt{end}}{
adamc@536 806 \Gamma \vdash S \equiv \mt{sig} \; \overline{s^1} \; \mt{con} \; x :: \kappa \; \overline{s_2} \; \mt{end}
adamc@536 807 & \Gamma \vdash c :: \kappa
adamc@537 808 }
adamc@537 809 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s^1} \; \mt{include} \; S \; \overline{s^2} \; \mt{end} \equiv \mt{sig} \; \overline{s^1} \; \overline{s} \; \overline{s^2} \; \mt{end}}{
adamc@537 810 \Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}
adamc@536 811 }$$
adamc@536 812
adamc@536 813 $$\infer{\Gamma \vdash S_1 \leq S_2}{
adamc@536 814 \Gamma \vdash S_1 \equiv S_2
adamc@536 815 }
adamc@536 816 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; \mt{end}}{}
adamc@537 817 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; s' \; \overline{s'} \; \mt{end}}{
adamc@537 818 \Gamma \vdash \overline{s} \leq s'
adamc@537 819 & \Gamma \vdash s' \leadsto \Gamma'
adamc@537 820 & \Gamma' \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; \overline{s'} \; \mt{end}
adamc@537 821 }$$
adamc@537 822
adamc@537 823 $$\infer{\Gamma \vdash s \; \overline{s} \leq s'}{
adamc@537 824 \Gamma \vdash s \leq s'
adamc@537 825 }
adamc@537 826 \quad \infer{\Gamma \vdash s \; \overline{s} \leq s'}{
adamc@537 827 \Gamma \vdash s \leadsto \Gamma'
adamc@537 828 & \Gamma' \vdash \overline{s} \leq s'
adamc@536 829 }$$
adamc@536 830
adamc@536 831 $$\infer{\Gamma \vdash \mt{functor} (X : S_1) : S_2 \leq \mt{functor} (X : S'_1) : S'_2}{
adamc@536 832 \Gamma \vdash S'_1 \leq S_1
adamc@536 833 & \Gamma, X : S'_1 \vdash S_2 \leq S'_2
adamc@536 834 }$$
adamc@536 835
adamc@537 836 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa \leq \mt{con} \; x :: \kappa}{}
adamc@537 837 \quad \infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leq \mt{con} \; x :: \kappa}{}
adamc@537 838 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leq \mt{con} \; x :: \mt{Type}}{}$$
adamc@537 839
adamc@537 840 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{con} \; x :: \mt{Type}^{\mt{len}(y)} \to \mt{Type}}{
adamc@537 841 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 842 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537 843 }$$
adamc@537 844
adamc@537 845 $$\infer{\Gamma \vdash \mt{class} \; x \leq \mt{con} \; x :: \mt{Type} \to \mt{Type}}{}
adamc@537 846 \quad \infer{\Gamma \vdash \mt{class} \; x = c \leq \mt{con} \; x :: \mt{Type} \to \mt{Type}}{}$$
adamc@537 847
adamc@537 848 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa = c_1 \leq \mt{con} \; x :: \mt{\kappa} = c_2}{
adamc@537 849 \Gamma \vdash c_1 \equiv c_2
adamc@537 850 }
adamc@537 851 \quad \infer{\Gamma \vdash \mt{class} \; x = c_1 \leq \mt{con} \; x :: \mt{Type} \to \mt{Type} = c_2}{
adamc@537 852 \Gamma \vdash c_1 \equiv c_2
adamc@537 853 }$$
adamc@537 854
adamc@537 855 $$\infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leq \mt{datatype} \; x \; \overline{y} = \overline{dc'}}{
adamc@537 856 \Gamma, \overline{y :: \mt{Type}} \vdash \overline{dc} \leq \overline{dc'}
adamc@537 857 }$$
adamc@537 858
adamc@537 859 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{datatype} \; x \; \overline{y} = \overline{dc'}}{
adamc@537 860 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 861 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537 862 & \Gamma, \overline{y :: \mt{Type}} \vdash \overline{dc} \leq \overline{dc'}
adamc@537 863 }$$
adamc@537 864
adamc@537 865 $$\infer{\Gamma \vdash \cdot \leq \cdot}{}
adamc@537 866 \quad \infer{\Gamma \vdash X; \overline{dc} \leq X; \overline{dc'}}{
adamc@537 867 \Gamma \vdash \overline{dc} \leq \overline{dc'}
adamc@537 868 }
adamc@537 869 \quad \infer{\Gamma \vdash X \; \mt{of} \; \tau_1; \overline{dc} \leq X \; \mt{of} \; \tau_2; \overline{dc'}}{
adamc@537 870 \Gamma \vdash \tau_1 \equiv \tau_2
adamc@537 871 & \Gamma \vdash \overline{dc} \leq \overline{dc'}
adamc@537 872 }$$
adamc@537 873
adamc@537 874 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{datatype} \; x = \mt{datatype} \; M'.z'}{
adamc@537 875 \Gamma \vdash M.z \equiv M'.z'
adamc@537 876 }$$
adamc@537 877
adamc@537 878 $$\infer{\Gamma \vdash \mt{val} \; x : \tau_1 \leq \mt{val} \; x : \tau_2}{
adamc@537 879 \Gamma \vdash \tau_1 \equiv \tau_2
adamc@537 880 }
adamc@537 881 \quad \infer{\Gamma \vdash \mt{structure} \; X : S_1 \leq \mt{structure} \; X : S_2}{
adamc@537 882 \Gamma \vdash S_1 \leq S_2
adamc@537 883 }
adamc@537 884 \quad \infer{\Gamma \vdash \mt{signature} \; X = S_1 \leq \mt{signature} \; X = S_2}{
adamc@537 885 \Gamma \vdash S_1 \leq S_2
adamc@537 886 & \Gamma \vdash S_2 \leq S_1
adamc@537 887 }$$
adamc@537 888
adamc@537 889 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leq \mt{constraint} \; c'_1 \sim c'_2}{
adamc@537 890 \Gamma \vdash c_1 \equiv c'_1
adamc@537 891 & \Gamma \vdash c_2 \equiv c'_2
adamc@537 892 }$$
adamc@537 893
adamc@537 894 $$\infer{\Gamma \vdash \mt{class} \; x \leq \mt{class} \; x}{}
adamc@537 895 \quad \infer{\Gamma \vdash \mt{class} \; x = c \leq \mt{class} \; x}{}
adamc@537 896 \quad \infer{\Gamma \vdash \mt{class} \; x = c_1 \leq \mt{class} \; x = c_2}{
adamc@537 897 \Gamma \vdash c_1 \equiv c_2
adamc@537 898 }$$
adamc@537 899
adamc@538 900 \subsection{Module Typing}
adamc@538 901
adamc@538 902 We use a helper function $\mt{sigOf}$, which converts declarations and sequences of declarations into their principal signature items and sequences of signature items, respectively.
adamc@538 903
adamc@538 904 $$\infer{\Gamma \vdash M : S}{
adamc@538 905 \Gamma \vdash M : S'
adamc@538 906 & \Gamma \vdash S' \leq S
adamc@538 907 }
adamc@538 908 \quad \infer{\Gamma \vdash \mt{struct} \; \overline{d} \; \mt{end} : \mt{sig} \; \mt{sigOf}(\overline{d}) \; \mt{end}}{
adamc@538 909 \Gamma \vdash \overline{d} \leadsto \Gamma'
adamc@538 910 }
adamc@538 911 \quad \infer{\Gamma \vdash X : S}{
adamc@538 912 X : S \in \Gamma
adamc@538 913 }$$
adamc@538 914
adamc@538 915 $$\infer{\Gamma \vdash M.X : S}{
adamc@538 916 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@538 917 & \mt{proj}(M, \overline{s}, \mt{structure} \; X) = S
adamc@538 918 }$$
adamc@538 919
adamc@538 920 $$\infer{\Gamma \vdash M_1(M_2) : [X \mapsto M_2]S_2}{
adamc@538 921 \Gamma \vdash M_1 : \mt{functor}(X : S_1) : S_2
adamc@538 922 & \Gamma \vdash M_2 : S_1
adamc@538 923 }
adamc@538 924 \quad \infer{\Gamma \vdash \mt{functor} (X : S_1) : S_2 = M : \mt{functor} (X : S_1) : S_2}{
adamc@538 925 \Gamma \vdash S_1
adamc@538 926 & \Gamma, X : S_1 \vdash S_2
adamc@538 927 & \Gamma, X : S_1 \vdash M : S_2
adamc@538 928 }$$
adamc@538 929
adamc@538 930 \begin{eqnarray*}
adamc@538 931 \mt{sigOf}(\cdot) &=& \cdot \\
adamc@538 932 \mt{sigOf}(s \; \overline{s'}) &=& \mt{sigOf}(s) \; \mt{sigOf}(\overline{s'}) \\
adamc@538 933 \\
adamc@538 934 \mt{sigOf}(\mt{con} \; x :: \kappa = c) &=& \mt{con} \; x :: \kappa = c \\
adamc@538 935 \mt{sigOf}(\mt{datatype} \; x \; \overline{y} = \overline{dc}) &=& \mt{datatype} \; x \; \overline{y} = \overline{dc} \\
adamc@538 936 \mt{sigOf}(\mt{datatype} \; x = \mt{datatype} \; M.z) &=& \mt{datatype} \; x = \mt{datatype} \; M.z \\
adamc@538 937 \mt{sigOf}(\mt{val} \; x : \tau = e) &=& \mt{val} \; x : \tau \\
adamc@538 938 \mt{sigOf}(\mt{val} \; \mt{rec} \; \overline{x : \tau = e}) &=& \overline{\mt{val} \; x : \tau} \\
adamc@538 939 \mt{sigOf}(\mt{structure} \; X : S = M) &=& \mt{structure} \; X : S \\
adamc@538 940 \mt{sigOf}(\mt{signature} \; X = S) &=& \mt{signature} \; X = S \\
adamc@538 941 \mt{sigOf}(\mt{open} \; M) &=& \mt{include} \; S \textrm{ (where $\Gamma \vdash M : S$)} \\
adamc@538 942 \mt{sigOf}(\mt{constraint} \; c_1 \sim c_2) &=& \mt{constraint} \; c_1 \sim c_2 \\
adamc@538 943 \mt{sigOf}(\mt{open} \; \mt{constraints} \; M) &=& \cdot \\
adamc@538 944 \mt{sigOf}(\mt{table} \; x : c) &=& \mt{table} \; x : c \\
adamc@538 945 \mt{sigOf}(\mt{sequence} \; x) &=& \mt{sequence} \; x \\
adamc@538 946 \mt{sigOf}(\mt{cookie} \; x : \tau) &=& \mt{cookie} \; x : \tau \\
adamc@538 947 \mt{sigOf}(\mt{class} \; x = c) &=& \mt{class} \; x = c \\
adamc@538 948 \end{eqnarray*}
adamc@538 949
adamc@539 950 \begin{eqnarray*}
adamc@539 951 \mt{selfify}(M, \cdot) &=& \cdot \\
adamc@539 952 \mt{selfify}(M, s \; \overline{s'}) &=& \mt{selfify}(M, \sigma, s) \; \mt{selfify}(M, \overline{s'}) \\
adamc@539 953 \\
adamc@539 954 \mt{selfify}(M, \mt{con} \; x :: \kappa) &=& \mt{con} \; x :: \kappa = M.x \\
adamc@539 955 \mt{selfify}(M, \mt{con} \; x :: \kappa = c) &=& \mt{con} \; x :: \kappa = c \\
adamc@539 956 \mt{selfify}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc}) &=& \mt{datatype} \; x \; \overline{y} = \mt{datatype} \; M.x \\
adamc@539 957 \mt{selfify}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z) &=& \mt{datatype} \; x = \mt{datatype} \; M'.z \\
adamc@539 958 \mt{selfify}(M, \mt{val} \; x : \tau) &=& \mt{val} \; x : \tau \\
adamc@539 959 \mt{selfify}(M, \mt{structure} \; X : S) &=& \mt{structure} \; X : \mt{selfify}(M.X, \overline{s}) \textrm{ (where $\Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}$)} \\
adamc@539 960 \mt{selfify}(M, \mt{signature} \; X = S) &=& \mt{signature} \; X = S \\
adamc@539 961 \mt{selfify}(M, \mt{include} \; S) &=& \mt{include} \; S \\
adamc@539 962 \mt{selfify}(M, \mt{constraint} \; c_1 \sim c_2) &=& \mt{constraint} \; c_1 \sim c_2 \\
adamc@539 963 \mt{selfify}(M, \mt{class} \; x) &=& \mt{class} \; x = M.x \\
adamc@539 964 \mt{selfify}(M, \mt{class} \; x = c) &=& \mt{class} \; x = c \\
adamc@539 965 \end{eqnarray*}
adamc@539 966
adamc@540 967 \subsection{Module Projection}
adamc@540 968
adamc@540 969 \begin{eqnarray*}
adamc@540 970 \mt{proj}(M, \mt{con} \; x :: \kappa \; \overline{s}, \mt{con} \; x) &=& \kappa \\
adamc@540 971 \mt{proj}(M, \mt{con} \; x :: \kappa = c \; \overline{s}, \mt{con} \; x) &=& (\kappa, c) \\
adamc@540 972 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{con} \; x) &=& \mt{Type}^{\mt{len}(\overline{y})} \to \mt{Type} \\
adamc@540 973 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, \mt{con} \; x) &=& (\mt{Type}^{\mt{len}(\overline{y})} \to \mt{Type}, M'.z) \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@540 974 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})$)} \\
adamc@540 975 \mt{proj}(M, \mt{class} \; x \; \overline{s}, \mt{con} \; x) &=& \mt{Type} \to \mt{Type} \\
adamc@540 976 \mt{proj}(M, \mt{class} \; x = c \; \overline{s}, \mt{con} \; x) &=& (\mt{Type} \to \mt{Type}, c) \\
adamc@540 977 \\
adamc@540 978 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{datatype} \; x) &=& (\overline{y}, \overline{dc}) \\
adamc@540 979 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, \mt{con} \; x) &=& \mt{proj}(M', \overline{s'}, \mt{datatype} \; z) \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$)} \\
adamc@540 980 \\
adamc@540 981 \mt{proj}(M, \mt{val} \; x : \tau \; \overline{s}, \mt{val} \; x) &=& \tau \\
adamc@540 982 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to M.x \; \overline y \textrm{ (where $X \in \overline{dc}$)} \\
adamc@540 983 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to \tau \to M.x \; \overline y \textrm{ (where $X \; \mt{of} \; \tau \in \overline{dc}$)} \\
adamc@540 984 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to M.x \; \overline y \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@540 985 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z = (\overline{y}, \overline{dc})$ and $X \in \overline{dc}$)} \\
adamc@540 986 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to \tau \to M.x \; \overline y \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@540 987 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z = (\overline{y}, \overline{dc})$ and $X : \tau \in \overline{dc}$)} \\
adamc@540 988 \\
adamc@540 989 \mt{proj}(M, \mt{structure} \; X : S \; \overline{s}, \mt{structure} \; X) &=& S \\
adamc@540 990 \\
adamc@540 991 \mt{proj}(M, \mt{signature} \; X = S \; \overline{s}, \mt{signature} \; X) &=& S \\
adamc@540 992 \\
adamc@540 993 \mt{proj}(M, \mt{con} \; x :: \kappa \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 994 \mt{proj}(M, \mt{con} \; x :: \kappa = c \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 995 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 996 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 997 \mt{proj}(M, \mt{val} \; x : \tau \; \overline{s}, V) &=& \mt{proj}(M, \overline{s}, V) \\
adamc@540 998 \mt{proj}(M, \mt{structure} \; X : S \; \overline{s}, V) &=& [X \mapsto M.X]\mt{proj}(M, \overline{s}, V) \\
adamc@540 999 \mt{proj}(M, \mt{signature} \; X = S \; \overline{s}, V) &=& [X \mapsto M.X]\mt{proj}(M, \overline{s}, V) \\
adamc@540 1000 \mt{proj}(M, \mt{include} \; S \; \overline{s}, V) &=& \mt{proj}(M, \overline{s'} \; \overline{s}, V) \textrm{ (where $\Gamma \vdash S \equiv \mt{sig} \; \overline{s'} \; \mt{end}$)} \\
adamc@540 1001 \mt{proj}(M, \mt{constraint} \; c_1 \sim c_2 \; \overline{s}, V) &=& \mt{proj}(M, \overline{s}, V) \\
adamc@540 1002 \mt{proj}(M, \mt{class} \; x \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 1003 \mt{proj}(M, \mt{class} \; x = c \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 1004 \end{eqnarray*}
adamc@540 1005
adamc@541 1006
adamc@541 1007 \section{Type Inference}
adamc@541 1008
adamc@541 1009 The Ur/Web compiler uses \emph{heuristic type inference}, with no claims of completeness with respect to the declarative specification of the last section. The rules in use seem to work well in practice. This section summarizes those rules, to help Ur programmers predict what will work and what won't.
adamc@541 1010
adamc@541 1011 \subsection{Basic Unification}
adamc@541 1012
adamc@541 1013 Type-checkers for languages based on the Hindly-Milner type discipline, like ML and Haskell, take advantage of \emph{principal typing} properties, making complete type inference relatively straightforward. Inference algorithms are traditionally implemented using type unification variables, at various points asserting equalities between types, in the process discovering the values of type variables. The Ur/Web compiler uses the same basic strategy, but the complexity of the type system rules out easy completeness.
adamc@541 1014
adamc@541 1015 Type-checking can require evaluating recursive functional programs, thanks to the type-level $\mt{fold}$ operator. When a unification variable appears in such a type, the next step of computation can be undetermined. The value of that variable might be determined later, but this would be ``too late'' for the unification problems generated at the first occurrence. This is the essential source of incompletness.
adamc@541 1016
adamc@541 1017 Nonetheless, the unification engine tends to do reasonably well. Unlike in ML, polymorphism is never inferred in definitions; it must be indicated explicitly by writing out constructor-level parameters. By writing these and other annotations, the programmer can generally get the type inference engine to do most of the type reconstruction work.
adamc@541 1018
adamc@541 1019 \subsection{Unifying Record Types}
adamc@541 1020
adamc@541 1021 The type inference engine tries to take advantage of the algebraic rules governing type-level records, as shown in Section \ref{definitional}. When two constructors of record kind are unified, they are reduce to normal forms, with like terms crossed off from each normal form until, hopefully, nothing remains. This cannot be complete, with the inclusion of unification variables. The type-checker can help you understand what goes wrong when the process fails, as it outputs the unmatched remainders of the two normal forms.
adamc@541 1022
adamc@541 1023 \subsection{\label{typeclasses}Type Classes}
adamc@541 1024
adamc@541 1025 Ur includes a type class facility inspired by Haskell's. The current version is very rudimentary, only supporting instances for particular types built up from abstract types and datatypes and type-level application.
adamc@541 1026
adamc@541 1027 Type classes are integrated with the module system. A type class is just a constructor of kind $\mt{Type} \to \mt{Type}$. By marking such a constructor $c$ as a type class, the programmer instructs the type inference engine to, in each scope, record all values of types $c \; \tau$ as \emph{instances}. Any function argument whose type is of such a form is treated as implicit, to be determined by examining the current instance database.
adamc@541 1028
adamc@541 1029 The ``dictionary encoding'' often used in Haskell implementations is made explicit in Ur. Type class instances are just properly-typed values, and they can also be considered as ``proofs'' of membership in the class. In some cases, it is useful to pass these proofs around explicitly. An underscore written where a proof is expected will also be inferred, if possible, from the current instance database.
adamc@541 1030
adamc@541 1031 Just as for types, type classes may be exported from modules, and they may be exported as concrete or abstract. Concrete type classes have their ``real'' definitions exposed, so that client code may add new instances freely. Abstract type classes are useful as ``predicates'' that can be used to enforce invariants, as we will see in some definitions of SQL syntax in the Ur/Web standard library.
adamc@541 1032
adamc@541 1033 \subsection{Reverse-Engineering Record Types}
adamc@541 1034
adamc@541 1035 It's useful to write Ur functions and functors that take record constructors as inputs, but these constructors can grow quite long, even though their values are often implied by other arguments. The compiler uses a simple heuristic to infer the values of unification variables that are folded over, yielding known results. Often, as in the case of $\mt{map}$-like folds, the base and recursive cases of a fold produce constructors with different top-level structure. Thus, if the result of the fold is known, examining its top-level structure reveals whether the record being folded over is empty or not. If it's empty, we're done; if it's not empty, we replace a single unification variable with a new constructor formed from three new unification variables, as in $[\alpha = \beta] \rc \gamma$. This process can often be repeated to determine a unification variable fully.
adamc@541 1036
adamc@541 1037 \subsection{Implicit Arguments in Functor Applications}
adamc@541 1038
adamc@541 1039 Constructor, constraint, and type class witness members of structures may be omitted, when those structures are used in contexts where their assigned signatures imply how to fill in those missing members. This feature combines well with reverse-engineering to allow for uses of complicated meta-programming functors with little more code than would be necessary to invoke an untyped, ad-hoc code generator.
adamc@541 1040
adamc@541 1041
adamc@542 1042 \section{The Ur Standard Library}
adamc@542 1043
adamc@542 1044 The built-in parts of the Ur/Web standard library are described by the signature in \texttt{lib/basis.urs} in the distribution. A module $\mt{Basis}$ ascribing to that signature is available in the initial environment, and every program is implicitly prefixed by $\mt{open} \; \mt{Basis}$.
adamc@542 1045
adamc@542 1046 Additionally, other common functions that are definable within Ur are included in \texttt{lib/top.urs} and \texttt{lib/top.ur}. This $\mt{Top}$ module is also opened implicitly.
adamc@542 1047
adamc@542 1048 The idea behind Ur is to serve as the ideal host for embedded domain-specific languages. For now, however, the ``generic'' functionality is intermixed with Ur/Web-specific functionality, including in these two library modules. We hope that these generic library components have types that speak for themselves. The next section introduces the Ur/Web-specific elements. Here, we only give the type declarations from the beginning of $\mt{Basis}$.
adamc@542 1049 $$\begin{array}{l}
adamc@542 1050 \mt{type} \; \mt{int} \\
adamc@542 1051 \mt{type} \; \mt{float} \\
adamc@542 1052 \mt{type} \; \mt{string} \\
adamc@542 1053 \mt{type} \; \mt{time} \\
adamc@542 1054 \\
adamc@542 1055 \mt{type} \; \mt{unit} = \{\} \\
adamc@542 1056 \\
adamc@542 1057 \mt{datatype} \; \mt{bool} = \mt{False} \mid \mt{True} \\
adamc@542 1058 \\
adamc@542 1059 \mt{datatype} \; \mt{option} \; \mt{t} = \mt{None} \mid \mt{Some} \; \mt{of} \; \mt{t}
adamc@542 1060 \end{array}$$
adamc@542 1061
adamc@542 1062
adamc@542 1063 \section{The Ur/Web Standard Library}
adamc@542 1064
adamc@542 1065 \subsection{Transactions}
adamc@542 1066
adamc@542 1067 Ur is a pure language; we use Haskell's trick to support controlled side effects. The standard library defines a monad $\mt{transaction}$, meant to stand for actions that may be undone cleanly. By design, no other kinds of actions are supported.
adamc@542 1068 $$\begin{array}{l}
adamc@542 1069 \mt{con} \; \mt{transaction} :: \mt{Type} \to \mt{Type} \\
adamc@542 1070 \\
adamc@542 1071 \mt{val} \; \mt{return} : \mt{t} ::: \mt{Type} \to \mt{t} \to \mt{transaction} \; \mt{t} \\
adamc@542 1072 \mt{val} \; \mt{bind} : \mt{t_1} ::: \mt{Type} \to \mt{t_2} ::: \mt{Type} \to \mt{transaction} \; \mt{t_1} \to (\mt{t_1} \to \mt{transaction} \; \mt{t_2}) \to \mt{transaction} \; \mt{t_2}
adamc@542 1073 \end{array}$$
adamc@542 1074
adamc@542 1075 \subsection{HTTP}
adamc@542 1076
adamc@542 1077 There are transactions for reading an HTTP header by name and for getting and setting strongly-typed cookies. Cookies may only be created by the $\mt{cookie}$ declaration form, ensuring that they be named consistently based on module structure.
adamc@542 1078 $$\begin{array}{l}
adamc@542 1079 \mt{val} \; \mt{requestHeader} : \mt{string} \to \mt{transaction} \; (\mt{option} \; \mt{string}) \\
adamc@542 1080 \\
adamc@542 1081 \mt{con} \; \mt{http\_cookie} :: \mt{Type} \to \mt{Type} \\
adamc@542 1082 \mt{val} \; \mt{getCookie} : \mt{t} ::: \mt{Type} \to \mt{http\_cookie} \; \mt{t} \to \mt{transaction} \; (\mt{option} \; \mt{t}) \\
adamc@542 1083 \mt{val} \; \mt{setCookie} : \mt{t} ::: \mt{Type} \to \mt{http\_cookie} \; \mt{t} \to \mt{t} \to \mt{transaction} \; \mt{unit}
adamc@542 1084 \end{array}$$
adamc@542 1085
adamc@543 1086 \subsection{SQL}
adamc@543 1087
adamc@543 1088 The fundamental unit of interest in the embedding of SQL is tables, described by a type family and creatable only via the $\mt{table}$ declaration form.
adamc@543 1089 $$\begin{array}{l}
adamc@543 1090 \mt{con} \; \mt{sql\_table} :: \{\mt{Type}\} \to \mt{Type}
adamc@543 1091 \end{array}$$
adamc@543 1092
adamc@543 1093 \subsubsection{Queries}
adamc@543 1094
adamc@543 1095 A final query is constructed via the $\mt{sql\_query}$ function. Constructor arguments respectively specify the table fields we select (as records mapping tables to the subsets of their fields that we choose) and the (always named) extra expressions that we select.
adamc@543 1096 $$\begin{array}{l}
adamc@543 1097 \mt{con} \; \mt{sql\_query} :: \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@543 1098 \mt{val} \; \mt{sql\_query} : \mt{tables} ::: \{\{\mt{Type}\}\} \\
adamc@543 1099 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543 1100 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@543 1101 \hspace{.1in} \to \{\mt{Rows} : \mt{sql\_query1} \; \mt{tables} \; \mt{selectedFields} \; \mt{selectedExps}, \\
adamc@543 1102 \hspace{.2in} \mt{OrderBy} : \mt{sql\_order\_by} \; \mt{tables} \; \mt{selectedExps}, \\
adamc@543 1103 \hspace{.2in} \mt{Limit} : \mt{sql\_limit}, \\
adamc@543 1104 \hspace{.2in} \mt{Offset} : \mt{sql\_offset}\} \\
adamc@543 1105 \hspace{.1in} \to \mt{sql\_query} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543 1106 \end{array}$$
adamc@543 1107
adamc@545 1108 Queries are used by folding over their results inside transactions.
adamc@545 1109 $$\begin{array}{l}
adamc@545 1110 \mt{val} \; \mt{query} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \lambda [\mt{tables} \sim \mt{exps}] \Rightarrow \mt{state} ::: \mt{Type} \to \mt{sql\_query} \; \mt{tables} \; \mt{exps} \\
adamc@545 1111 \hspace{.1in} \to (\$(\mt{exps} \rc \mt{fold} \; (\lambda \mt{nm} \; (\mt{fields} :: \{\mt{Type}\}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \$\mt{fields}] \rc \mt{acc}) \; [] \; \mt{tables}) \\
adamc@545 1112 \hspace{.2in} \to \mt{state} \to \mt{transaction} \; \mt{state}) \\
adamc@545 1113 \hspace{.1in} \to \mt{state} \to \mt{transaction} \; \mt{state}
adamc@545 1114 \end{array}$$
adamc@545 1115
adamc@543 1116 Most of the complexity of the query encoding is in the type $\mt{sql\_query1}$, which includes simple queries and derived queries based on relational operators. Constructor arguments respectively specify the tables we select from, the subset of fields that we keep from each table for the result rows, and the extra expressions that we select.
adamc@543 1117 $$\begin{array}{l}
adamc@543 1118 \mt{con} \; \mt{sql\_query1} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@543 1119 \\
adamc@543 1120 \mt{type} \; \mt{sql\_relop} \\
adamc@543 1121 \mt{val} \; \mt{sql\_union} : \mt{sql\_relop} \\
adamc@543 1122 \mt{val} \; \mt{sql\_intersect} : \mt{sql\_relop} \\
adamc@543 1123 \mt{val} \; \mt{sql\_except} : \mt{sql\_relop} \\
adamc@543 1124 \mt{val} \; \mt{sql\_relop} : \mt{tables1} ::: \{\{\mt{Type}\}\} \\
adamc@543 1125 \hspace{.1in} \to \mt{tables2} ::: \{\{\mt{Type}\}\} \\
adamc@543 1126 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543 1127 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@543 1128 \hspace{.1in} \to \mt{sql\_relop} \\
adamc@543 1129 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables1} \; \mt{selectedFields} \; \mt{selectedExps} \\
adamc@543 1130 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables2} \; \mt{selectedFields} \; \mt{selectedExps} \\
adamc@543 1131 \hspace{.1in} \to \mt{sql\_query1} \; \mt{selectedFields} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543 1132 \end{array}$$
adamc@543 1133
adamc@543 1134 $$\begin{array}{l}
adamc@543 1135 \mt{val} \; \mt{sql\_query1} : \mt{tables} ::: \{\{\mt{Type}\}\} \\
adamc@543 1136 \hspace{.1in} \to \mt{grouped} ::: \{\{\mt{Type}\}\} \\
adamc@543 1137 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543 1138 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@543 1139 \hspace{.1in} \to \{\mt{From} : \$(\mt{fold} \; (\lambda \mt{nm} \; (\mt{fields} :: \{\mt{Type}\}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{sql\_table} \; \mt{fields}] \rc \mt{acc}) \; [] \; \mt{tables}), \\
adamc@543 1140 \hspace{.2in} \mt{Where} : \mt{sql\_exp} \; \mt{tables} \; [] \; [] \; \mt{bool}, \\
adamc@543 1141 \hspace{.2in} \mt{GroupBy} : \mt{sql\_subset} \; \mt{tables} \; \mt{grouped}, \\
adamc@543 1142 \hspace{.2in} \mt{Having} : \mt{sql\_exp} \; \mt{grouped} \; \mt{tables} \; [] \; \mt{bool}, \\
adamc@543 1143 \hspace{.2in} \mt{SelectFields} : \mt{sql\_subset} \; \mt{grouped} \; \mt{selectedFields}, \\
adamc@543 1144 \hspace{.2in} \mt {SelectExps} : \$(\mt{fold} \; (\lambda \mt{nm} \; (\mt{t} :: \mt{Type}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{sql\_exp} \; \mt{grouped} \; \mt{tables} \; [] \; \mt{t}] \rc \mt{acc}) \; [] \; \mt{selectedExps}) \} \\
adamc@543 1145 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543 1146 \end{array}$$
adamc@543 1147
adamc@543 1148 To encode projection of subsets of fields in $\mt{SELECT}$ clauses, and to encode $\mt{GROUP} \; \mt{BY}$ clauses, we rely on a type family $\mt{sql\_subset}$, capturing what it means for one record of table fields to be a subset of another. The main constructor $\mt{sql\_subset}$ ``proves subset facts'' by requiring a split of a record into kept and dropped parts. The extra constructor $\mt{sql\_subset\_all}$ is a convenience for keeping all fields of a record.
adamc@543 1149 $$\begin{array}{l}
adamc@543 1150 \mt{con} \; \mt{sql\_subset} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \mt{Type} \\
adamc@543 1151 \mt{val} \; \mt{sql\_subset} : \mt{keep\_drop} :: \{(\{\mt{Type}\} \times \{\mt{Type}\})\} \\
adamc@543 1152 \hspace{.1in} \to \mt{sql\_subset} \\
adamc@543 1153 \hspace{.2in} (\mt{fold} \; (\lambda \mt{nm} \; (\mt{fields} :: (\{\mt{Type}\} \times \{\mt{Type}\})) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \; [\mt{fields}.1 \sim \mt{fields}.2] \Rightarrow \\
adamc@543 1154 \hspace{.3in} [\mt{nm} = \mt{fields}.1 \rc \mt{fields}.2] \rc \mt{acc}) \; [] \; \mt{keep\_drop}) \\
adamc@543 1155 \hspace{.2in} (\mt{fold} \; (\lambda \mt{nm} \; (\mt{fields} :: (\{\mt{Type}\} \times \{\mt{Type}\})) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{fields}.1] \rc \mt{acc}) \; [] \; \mt{keep\_drop}) \\
adamc@543 1156 \mt{val} \; \mt{sql\_subset\_all} : \mt{tables} :: \{\{\mt{Type}\}\} \to \mt{sql\_subset} \; \mt{tables} \; \mt{tables}
adamc@543 1157 \end{array}$$
adamc@543 1158
adamc@543 1159 SQL expressions are used in several places, including $\mt{SELECT}$, $\mt{WHERE}$, $\mt{HAVING}$, and $\mt{ORDER} \; \mt{BY}$ clauses. They reify a fragment of the standard SQL expression language, while making it possible to inject ``native'' Ur values in some places. The arguments to the $\mt{sql\_exp}$ type family respectively give the unrestricted-availablity table fields, the table fields that may only be used in arguments to aggregate functions, the available selected expressions, and the type of the expression.
adamc@543 1160 $$\begin{array}{l}
adamc@543 1161 \mt{con} \; \mt{sql\_exp} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \to \mt{Type}
adamc@543 1162 \end{array}$$
adamc@543 1163
adamc@543 1164 Any field in scope may be converted to an expression.
adamc@543 1165 $$\begin{array}{l}
adamc@543 1166 \mt{val} \; \mt{sql\_field} : \mt{otherTabs} ::: \{\{\mt{Type}\}\} \to \mt{otherFields} ::: \{\mt{Type}\} \\
adamc@543 1167 \hspace{.1in} \to \mt{fieldType} ::: \mt{Type} \to \mt{agg} ::: \{\{\mt{Type}\}\} \\
adamc@543 1168 \hspace{.1in} \to \mt{exps} ::: \{\mt{Type}\} \\
adamc@543 1169 \hspace{.1in} \to \mt{tab} :: \mt{Name} \to \mt{field} :: \mt{Name} \\
adamc@543 1170 \hspace{.1in} \to \mt{sql\_exp} \; ([\mt{tab} = [\mt{field} = \mt{fieldType}] \rc \mt{otherFields}] \rc \mt{otherTabs}) \; \mt{agg} \; \mt{exps} \; \mt{fieldType}
adamc@543 1171 \end{array}$$
adamc@543 1172
adamc@544 1173 There is an analogous function for referencing named expressions.
adamc@544 1174 $$\begin{array}{l}
adamc@544 1175 \mt{val} \; \mt{sql\_exp} : \mt{tabs} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{t} ::: \mt{Type} \to \mt{rest} ::: \{\mt{Type}\} \to \mt{nm} :: \mt{Name} \\
adamc@544 1176 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tabs} \; \mt{agg} \; ([\mt{nm} = \mt{t}] \rc \mt{rest}) \; \mt{t}
adamc@544 1177 \end{array}$$
adamc@544 1178
adamc@544 1179 Ur values of appropriate types may be injected into SQL expressions.
adamc@544 1180 $$\begin{array}{l}
adamc@544 1181 \mt{class} \; \mt{sql\_injectable} \\
adamc@544 1182 \mt{val} \; \mt{sql\_bool} : \mt{sql\_injectable} \; \mt{bool} \\
adamc@544 1183 \mt{val} \; \mt{sql\_int} : \mt{sql\_injectable} \; \mt{int} \\
adamc@544 1184 \mt{val} \; \mt{sql\_float} : \mt{sql\_injectable} \; \mt{float} \\
adamc@544 1185 \mt{val} \; \mt{sql\_string} : \mt{sql\_injectable} \; \mt{string} \\
adamc@544 1186 \mt{val} \; \mt{sql\_time} : \mt{sql\_injectable} \; \mt{time} \\
adamc@544 1187 \mt{val} \; \mt{sql\_option\_bool} : \mt{sql\_injectable} \; (\mt{option} \; \mt{bool}) \\
adamc@544 1188 \mt{val} \; \mt{sql\_option\_int} : \mt{sql\_injectable} \; (\mt{option} \; \mt{int}) \\
adamc@544 1189 \mt{val} \; \mt{sql\_option\_float} : \mt{sql\_injectable} \; (\mt{option} \; \mt{float}) \\
adamc@544 1190 \mt{val} \; \mt{sql\_option\_string} : \mt{sql\_injectable} \; (\mt{option} \; \mt{string}) \\
adamc@544 1191 \mt{val} \; \mt{sql\_option\_time} : \mt{sql\_injectable} \; (\mt{option} \; \mt{time}) \\
adamc@544 1192 \mt{val} \; \mt{sql\_inject} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \to \mt{sql\_injectable} \; \mt{t} \\
adamc@544 1193 \hspace{.1in} \to \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t}
adamc@544 1194 \end{array}$$
adamc@544 1195
adamc@544 1196 We have the SQL nullness test, which is necessary because of the strange SQL semantics of equality in the presence of null values.
adamc@544 1197 $$\begin{array}{l}
adamc@544 1198 \mt{val} \; \mt{sql\_is\_null} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1199 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; (\mt{option} \; \mt{t}) \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{bool}
adamc@544 1200 \end{array}$$
adamc@544 1201
adamc@544 1202 We have generic nullary, unary, and binary operators, as well as comparison operators.
adamc@544 1203 $$\begin{array}{l}
adamc@544 1204 \mt{con} \; \mt{sql\_nfunc} :: \mt{Type} \to \mt{Type} \\
adamc@544 1205 \mt{val} \; \mt{sql\_current\_timestamp} : \mt{sql\_nfunc} \; \mt{time} \\
adamc@544 1206 \mt{val} \; \mt{sql\_nfunc} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1207 \hspace{.1in} \to \mt{sql\_nfunc} \; \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t} \\\end{array}$$
adamc@544 1208
adamc@544 1209 $$\begin{array}{l}
adamc@544 1210 \mt{con} \; \mt{sql\_unary} :: \mt{Type} \to \mt{Type} \to \mt{Type} \\
adamc@544 1211 \mt{val} \; \mt{sql\_not} : \mt{sql\_unary} \; \mt{bool} \; \mt{bool} \\
adamc@544 1212 \mt{val} \; \mt{sql\_unary} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{arg} ::: \mt{Type} \to \mt{res} ::: \mt{Type} \\
adamc@544 1213 \hspace{.1in} \to \mt{sql\_unary} \; \mt{arg} \; \mt{res} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{res} \\
adamc@544 1214 \end{array}$$
adamc@544 1215
adamc@544 1216 $$\begin{array}{l}
adamc@544 1217 \mt{con} \; \mt{sql\_binary} :: \mt{Type} \to \mt{Type} \to \mt{Type} \to \mt{Type} \\
adamc@544 1218 \mt{val} \; \mt{sql\_and} : \mt{sql\_binary} \; \mt{bool} \; \mt{bool} \; \mt{bool} \\
adamc@544 1219 \mt{val} \; \mt{sql\_or} : \mt{sql\_binary} \; \mt{bool} \; \mt{bool} \; \mt{bool} \\
adamc@544 1220 \mt{val} \; \mt{sql\_binary} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{arg_1} ::: \mt{Type} \to \mt{arg_2} ::: \mt{Type} \to \mt{res} ::: \mt{Type} \\
adamc@544 1221 \hspace{.1in} \to \mt{sql\_binary} \; \mt{arg_1} \; \mt{arg_2} \; \mt{res} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg_1} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg_2} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{res}
adamc@544 1222 \end{array}$$
adamc@544 1223
adamc@544 1224 $$\begin{array}{l}
adamc@544 1225 \mt{type} \; \mt{sql\_comparison} \\
adamc@544 1226 \mt{val} \; \mt{sql\_eq} : \mt{sql\_comparison} \\
adamc@544 1227 \mt{val} \; \mt{sql\_ne} : \mt{sql\_comparison} \\
adamc@544 1228 \mt{val} \; \mt{sql\_lt} : \mt{sql\_comparison} \\
adamc@544 1229 \mt{val} \; \mt{sql\_le} : \mt{sql\_comparison} \\
adamc@544 1230 \mt{val} \; \mt{sql\_gt} : \mt{sql\_comparison} \\
adamc@544 1231 \mt{val} \; \mt{sql\_ge} : \mt{sql\_comparison} \\
adamc@544 1232 \mt{val} \; \mt{sql\_comparison} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1233 \hspace{.1in} \to \mt{sql\_comparison} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{bool}
adamc@544 1234 \end{array}$$
adamc@544 1235
adamc@544 1236 Finally, we have aggregate functions. The $\mt{COUNT(\ast)}$ syntax is handled specially, since it takes no real argument. The other aggregate functions are placed into a general type family, using type classes to restrict usage to properly-typed arguments. The key aspect of the $\mt{sql\_aggregate}$ function's type is the shift of aggregate-function-only fields into unrestricted fields.
adamc@544 1237
adamc@544 1238 $$\begin{array}{l}
adamc@544 1239 \mt{val} \; \mt{sql\_count} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{int}
adamc@544 1240 \end{array}$$
adamc@544 1241
adamc@544 1242 $$\begin{array}{l}
adamc@544 1243 \mt{con} \; \mt{sql\_aggregate} :: \mt{Type} \to \mt{Type} \\
adamc@544 1244 \mt{val} \; \mt{sql\_aggregate} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1245 \hspace{.1in} \to \mt{sql\_aggregate} \; \mt{t} \to \mt{sql\_exp} \; \mt{agg} \; \mt{agg} \; \mt{exps} \; \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t}
adamc@544 1246 \end{array}$$
adamc@544 1247
adamc@544 1248 $$\begin{array}{l}
adamc@544 1249 \mt{class} \; \mt{sql\_summable} \\
adamc@544 1250 \mt{val} \; \mt{sql\_summable\_int} : \mt{sql\_summable} \; \mt{int} \\
adamc@544 1251 \mt{val} \; \mt{sql\_summable\_float} : \mt{sql\_summable} \; \mt{float} \\
adamc@544 1252 \mt{val} \; \mt{sql\_avg} : \mt{t} ::: \mt{Type} \to \mt{sql\_summable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t} \\
adamc@544 1253 \mt{val} \; \mt{sql\_sum} : \mt{t} ::: \mt{Type} \to \mt{sql\_summable} \mt{t} \to \mt{sql\_aggregate} \; \mt{t}
adamc@544 1254 \end{array}$$
adamc@544 1255
adamc@544 1256 $$\begin{array}{l}
adamc@544 1257 \mt{class} \; \mt{sql\_maxable} \\
adamc@544 1258 \mt{val} \; \mt{sql\_maxable\_int} : \mt{sql\_maxable} \; \mt{int} \\
adamc@544 1259 \mt{val} \; \mt{sql\_maxable\_float} : \mt{sql\_maxable} \; \mt{float} \\
adamc@544 1260 \mt{val} \; \mt{sql\_maxable\_string} : \mt{sql\_maxable} \; \mt{string} \\
adamc@544 1261 \mt{val} \; \mt{sql\_maxable\_time} : \mt{sql\_maxable} \; \mt{time} \\
adamc@544 1262 \mt{val} \; \mt{sql\_max} : \mt{t} ::: \mt{Type} \to \mt{sql\_maxable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t} \\
adamc@544 1263 \mt{val} \; \mt{sql\_min} : \mt{t} ::: \mt{Type} \to \mt{sql\_maxable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t}
adamc@544 1264 \end{array}$$
adamc@544 1265
adamc@544 1266 We wrap up the definition of query syntax with the types used in representing $\mt{ORDER} \; \mt{BY}$, $\mt{LIMIT}$, and $\mt{OFFSET}$ clauses.
adamc@544 1267 $$\begin{array}{l}
adamc@544 1268 \mt{type} \; \mt{sql\_direction} \\
adamc@544 1269 \mt{val} \; \mt{sql\_asc} : \mt{sql\_direction} \\
adamc@544 1270 \mt{val} \; \mt{sql\_desc} : \mt{sql\_direction} \\
adamc@544 1271 \\
adamc@544 1272 \mt{con} \; \mt{sql\_order\_by} :: \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@544 1273 \mt{val} \; \mt{sql\_order\_by\_Nil} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} :: \{\mt{Type}\} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \\
adamc@544 1274 \mt{val} \; \mt{sql\_order\_by\_Cons} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1275 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tables} \; [] \; \mt{exps} \; \mt{t} \to \mt{sql\_direction} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \\
adamc@544 1276 \\
adamc@544 1277 \mt{type} \; \mt{sql\_limit} \\
adamc@544 1278 \mt{val} \; \mt{sql\_no\_limit} : \mt{sql\_limit} \\
adamc@544 1279 \mt{val} \; \mt{sql\_limit} : \mt{int} \to \mt{sql\_limit} \\
adamc@544 1280 \\
adamc@544 1281 \mt{type} \; \mt{sql\_offset} \\
adamc@544 1282 \mt{val} \; \mt{sql\_no\_offset} : \mt{sql\_offset} \\
adamc@544 1283 \mt{val} \; \mt{sql\_offset} : \mt{int} \to \mt{sql\_offset}
adamc@544 1284 \end{array}$$
adamc@544 1285
adamc@545 1286
adamc@545 1287 \subsubsection{DML}
adamc@545 1288
adamc@545 1289 The Ur/Web library also includes an embedding of a fragment of SQL's DML, the Data Manipulation Language, for modifying database tables. Any piece of DML may be executed in a transaction.
adamc@545 1290
adamc@545 1291 $$\begin{array}{l}
adamc@545 1292 \mt{type} \; \mt{dml} \\
adamc@545 1293 \mt{val} \; \mt{dml} : \mt{dml} \to \mt{transaction} \; \mt{unit}
adamc@545 1294 \end{array}$$
adamc@545 1295
adamc@545 1296 Properly-typed records may be used to form $\mt{INSERT}$ commands.
adamc@545 1297 $$\begin{array}{l}
adamc@545 1298 \mt{val} \; \mt{insert} : \mt{fields} ::: \{\mt{Type}\} \to \mt{sql\_table} \; \mt{fields} \\
adamc@545 1299 \hspace{.1in} \to \$(\mt{fold} \; (\lambda \mt{nm} \; (\mt{t} :: \mt{Type}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{sql\_exp} \; [] \; [] \; [] \; \mt{t}] \rc \mt{acc}) \; [] \; \mt{fields}) \to \mt{dml}
adamc@545 1300 \end{array}$$
adamc@545 1301
adamc@545 1302 An $\mt{UPDATE}$ command is formed from a choice of which table fields to leave alone and which to change, along with an expression to use to compute the new value of each changed field and a $\mt{WHERE}$ clause.
adamc@545 1303 $$\begin{array}{l}
adamc@545 1304 \mt{val} \; \mt{update} : \mt{unchanged} ::: \{\mt{Type}\} \to \mt{changed} :: \{\mt{Type}\} \to \lambda [\mt{changed} \sim \mt{unchanged}] \\
adamc@545 1305 \hspace{.1in} \Rightarrow \$(\mt{fold} \; (\lambda \mt{nm} \; (\mt{t} :: \mt{Type}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{sql\_exp} \; [\mt{T} = \mt{changed} \rc \mt{unchanged}] \; [] \; [] \; \mt{t}] \rc \mt{acc}) \; [] \; \mt{changed}) \\
adamc@545 1306 \hspace{.1in} \to \mt{sql\_table} \; (\mt{changed} \rc \mt{unchanged}) \to \mt{sql\_exp} \; [\mt{T} = \mt{changed} \rc \mt{unchanged}] \; [] \; [] \; \mt{bool} \to \mt{dml}
adamc@545 1307 \end{array}$$
adamc@545 1308
adamc@545 1309 A $\mt{DELETE}$ command is formed from a table and a $\mt{WHERE}$ clause.
adamc@545 1310 $$\begin{array}{l}
adamc@545 1311 \mt{val} \; \mt{delete} : \mt{fields} ::: \{\mt{Type}\} \to \mt{sql\_table} \; \mt{fields} \to \mt{sql\_exp} \; [\mt{T} = \mt{fields}] \; [] \; [] \; \mt{bool} \to \mt{dml}
adamc@545 1312 \end{array}$$
adamc@545 1313
adamc@546 1314 \subsubsection{Sequences}
adamc@546 1315
adamc@546 1316 SQL sequences are counters with concurrency control, often used to assign unique IDs. Ur/Web supports them via a simple interface. The only way to create a sequence is with the $\mt{sequence}$ declaration form.
adamc@546 1317
adamc@546 1318 $$\begin{array}{l}
adamc@546 1319 \mt{type} \; \mt{sql\_sequence} \\
adamc@546 1320 \mt{val} \; \mt{nextval} : \mt{sql\_sequence} \to \mt{transaction} \; \mt{int}
adamc@546 1321 \end{array}$$
adamc@546 1322
adamc@546 1323
adamc@547 1324 \subsection{XML}
adamc@547 1325
adamc@547 1326 Ur/Web's library contains an encoding of XML syntax and semantic constraints. We make no effort to follow the standards governing XML schemas. Rather, XML fragments are viewed more as values of ML datatypes, and we only track which tags are allowed inside which other tags.
adamc@547 1327
adamc@547 1328 The basic XML type family has arguments respectively indicating the \emph{context} of a fragment, the fields that the fragment expects to be bound on entry (and their types), and the fields that the fragment will bind (and their types). Contexts are a record-based ``poor man's subtyping'' encoding, with each possible set of valid tags corresponding to a different context record. The arguments dealing with field binding are only relevant to HTML forms.
adamc@547 1329 $$\begin{array}{l}
adamc@547 1330 \mt{con} \; \mt{xml} :: \{\mt{Unit}\} \to \{\mt{Type}\} \to \{\mt{Type}\} \to \mt{Type}
adamc@547 1331 \end{array}$$
adamc@547 1332
adamc@547 1333 We also have a type family of XML tags, indexed respectively by the record of optional attributes accepted by the tag, the context in which the tag may be placed, the context required of children of the tag, which form fields the tag uses, and which fields the tag defines.
adamc@547 1334 $$\begin{array}{l}
adamc@547 1335 \mt{con} \; \mt{tag} :: \{\mt{Type}\} \to \{\mt{Unit}\} \to \{\mt{Unit}\} \to \{\mt{Type}\} \to \{\mt{Type}\} \to \mt{Type}
adamc@547 1336 \end{array}$$
adamc@547 1337
adamc@547 1338 Literal text may be injected into XML as ``CDATA.''
adamc@547 1339 $$\begin{array}{l}
adamc@547 1340 \mt{val} \; \mt{cdata} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use} ::: \{\mt{Type}\} \to \mt{string} \to \mt{xml} \; \mt{ctx} \; \mt{use} \; []
adamc@547 1341 \end{array}$$
adamc@547 1342
adamc@547 1343 There is a function for producing an XML tree with a particular tag at its root.
adamc@547 1344 $$\begin{array}{l}
adamc@547 1345 \mt{val} \; \mt{tag} : \mt{attrsGiven} ::: \{\mt{Type}\} \to \mt{attrsAbsent} ::: \{\mt{Type}\} \to \mt{ctxOuter} ::: \{\mt{Unit}\} \to \mt{ctxInner} ::: \{\mt{Unit}\} \\
adamc@547 1346 \hspace{.1in} \to \mt{useOuter} ::: \{\mt{Type}\} \to \mt{useInner} ::: \{\mt{Type}\} \to \mt{bindOuter} ::: \{\mt{Type}\} \to \mt{bindInner} ::: \{\mt{Type}\} \\
adamc@547 1347 \hspace{.1in} \to \lambda [\mt{attrsGiven} \sim \mt{attrsAbsent}] \; [\mt{useOuter} \sim \mt{useInner}] \; [\mt{bindOuter} \sim \mt{bindInner}] \Rightarrow \$\mt{attrsGiven} \\
adamc@547 1348 \hspace{.1in} \to \mt{tag} \; (\mt{attrsGiven} \rc \mt{attrsAbsent}) \; \mt{ctxOuter} \; \mt{ctxInner} \; \mt{useOuter} \; \mt{bindOuter} \\
adamc@547 1349 \hspace{.1in} \to \mt{xml} \; \mt{ctxInner} \; \mt{useInner} \; \mt{bindInner} \to \mt{xml} \; \mt{ctxOuter} \; (\mt{useOuter} \rc \mt{useInner}) \; (\mt{bindOuter} \rc \mt{bindInner})
adamc@547 1350 \end{array}$$
adamc@547 1351
adamc@547 1352 Two XML fragments may be concatenated.
adamc@547 1353 $$\begin{array}{l}
adamc@547 1354 \mt{val} \; \mt{join} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use_1} ::: \{\mt{Type}\} \to \mt{bind_1} ::: \{\mt{Type}\} \to \mt{bind_2} ::: \{\mt{Type}\} \\
adamc@547 1355 \hspace{.1in} \to \lambda [\mt{use_1} \sim \mt{bind_1}] \; [\mt{bind_1} \sim \mt{bind_2}] \\
adamc@547 1356 \hspace{.1in} \Rightarrow \mt{xml} \; \mt{ctx} \; \mt{use_1} \; \mt{bind_1} \to \mt{xml} \; \mt{ctx} \; (\mt{use_1} \rc \mt{bind_1}) \; \mt{bind_2} \to \mt{xml} \; \mt{ctx} \; \mt{use_1} \; (\mt{bind_1} \rc \mt{bind_2})
adamc@547 1357 \end{array}$$
adamc@547 1358
adamc@547 1359 Finally, any XML fragment may be updated to ``claim'' to use more form fields than it does.
adamc@547 1360 $$\begin{array}{l}
adamc@547 1361 \mt{val} \; \mt{useMore} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use_1} ::: \{\mt{Type}\} \to \mt{use_2} ::: \{\mt{Type}\} \to \mt{bind} ::: \{\mt{Type}\} \to \lambda [\mt{use_1} \sim \mt{use_2}] \\
adamc@547 1362 \hspace{.1in} \Rightarrow \mt{xml} \; \mt{ctx} \; \mt{use_1} \; \mt{bind} \to \mt{xml} \; \mt{ctx} \; (\mt{use_1} \rc \mt{use_2}) \; \mt{bind}
adamc@547 1363 \end{array}$$
adamc@547 1364
adamc@547 1365 We will not list here the different HTML tags and related functions from the standard library. They should be easy enough to understand from the code in \texttt{basis.urs}. The set of tags in the library is not yet claimed to be complete for HTML standards.
adamc@547 1366
adamc@547 1367 One last useful function is for aborting any page generation, returning some XML as an error message. This function takes the place of some uses of a general exception mechanism.
adamc@547 1368 $$\begin{array}{l}
adamc@547 1369 \mt{val} \; \mt{error} : \mt{t} ::: \mt{Type} \to \mt{xml} \; [\mt{Body}] \; [] \; [] \to \mt{t}
adamc@547 1370 \end{array}$$
adamc@547 1371
adamc@549 1372
adamc@549 1373 \section{Ur/Web Syntax Extensions}
adamc@549 1374
adamc@549 1375 Ur/Web features some syntactic shorthands for building values using the functions from the last section. This section sketches the grammar of those extensions. We write spans of syntax inside brackets to indicate that they are optional.
adamc@549 1376
adamc@549 1377 \subsection{SQL}
adamc@549 1378
adamc@549 1379 \subsubsection{Queries}
adamc@549 1380
adamc@550 1381 Queries $Q$ are added to the rules for expressions $e$.
adamc@550 1382
adamc@549 1383 $$\begin{array}{rrcll}
adamc@550 1384 \textrm{Queries} & Q &::=& (q \; [\mt{ORDER} \; \mt{BY} \; (E \; [o],)^+] \; [\mt{LIMIT} \; N] \; [\mt{OFFSET} \; N]) \\
adamc@549 1385 \textrm{Pre-queries} & q &::=& \mt{SELECT} \; P \; \mt{FROM} \; T,^+ \; [\mt{WHERE} \; E] \; [\mt{GROUP} \; \mt{BY} \; p,^+] \; [\mt{HAVING} \; E] \\
adamc@549 1386 &&& \mid q \; R \; q \\
adamc@549 1387 \textrm{Relational operators} & R &::=& \mt{UNION} \mid \mt{INTERSECT} \mid \mt{EXCEPT}
adamc@549 1388 \end{array}$$
adamc@549 1389
adamc@549 1390 $$\begin{array}{rrcll}
adamc@549 1391 \textrm{Projections} & P &::=& \ast & \textrm{all columns} \\
adamc@549 1392 &&& p,^+ & \textrm{particular columns} \\
adamc@549 1393 \textrm{Pre-projections} & p &::=& t.f & \textrm{one column from a table} \\
adamc@549 1394 &&& t.\{\{c\}\} & \textrm{a record of colums from a table (of kind $\{\mt{Type}\}$)} \\
adamc@549 1395 \textrm{Table names} & t &::=& x & \textrm{constant table name (automatically capitalized)} \\
adamc@549 1396 &&& X & \textrm{constant table name} \\
adamc@549 1397 &&& \{\{c\}\} & \textrm{computed table name (of kind $\mt{Name}$)} \\
adamc@549 1398 \textrm{Column names} & f &::=& X & \textrm{constant column name} \\
adamc@549 1399 &&& \{c\} & \textrm{computed column name (of kind $\mt{Name}$)} \\
adamc@549 1400 \textrm{Tables} & T &::=& x & \textrm{table variable, named locally by its own capitalization} \\
adamc@549 1401 &&& x \; \mt{AS} \; t & \textrm{table variable, with local name} \\
adamc@549 1402 &&& \{\{e\}\} \; \mt{AS} \; t & \textrm{computed table expression, with local name} \\
adamc@549 1403 \textrm{SQL expressions} & E &::=& p & \textrm{column references} \\
adamc@549 1404 &&& X & \textrm{named expression references} \\
adamc@549 1405 &&& \{\{e\}\} & \textrm{injected native Ur expressions} \\
adamc@549 1406 &&& \{e\} & \textrm{computed expressions, probably using $\mt{sql\_exp}$ directly} \\
adamc@549 1407 &&& \mt{TRUE} \mid \mt{FALSE} & \textrm{boolean constants} \\
adamc@549 1408 &&& \ell & \textrm{primitive type literals} \\
adamc@549 1409 &&& \mt{NULL} & \textrm{null value (injection of $\mt{None}$)} \\
adamc@549 1410 &&& E \; \mt{IS} \; \mt{NULL} & \textrm{nullness test} \\
adamc@549 1411 &&& n & \textrm{nullary operators} \\
adamc@549 1412 &&& u \; E & \textrm{unary operators} \\
adamc@549 1413 &&& E \; b \; E & \textrm{binary operators} \\
adamc@549 1414 &&& \mt{COUNT}(\ast) & \textrm{count number of rows} \\
adamc@549 1415 &&& a(E) & \textrm{other aggregate function} \\
adamc@549 1416 &&& (E) & \textrm{explicit precedence} \\
adamc@549 1417 \textrm{Nullary operators} & n &::=& \mt{CURRENT\_TIMESTAMP} \\
adamc@549 1418 \textrm{Unary operators} & u &::=& \mt{NOT} \\
adamc@549 1419 \textrm{Binary operators} & b &::=& \mt{AND} \mid \mt{OR} \mid \neq \mid < \mid \leq \mid > \mid \geq \\
adamc@549 1420 \textrm{Aggregate functions} & a &::=& \mt{AVG} \mid \mt{SUM} \mid \mt{MIN} \mid \mt{MAX} \\
adamc@550 1421 \textrm{Directions} & o &::=& \mt{ASC} \mid \mt{DESC} \\
adamc@549 1422 \textrm{SQL integer} & N &::=& n \mid \{e\} \\
adamc@549 1423 \end{array}$$
adamc@549 1424
adamc@549 1425 Additionally, an SQL expression may be inserted into normal Ur code with the syntax $(\mt{SQL} \; E)$ or $(\mt{WHERE} \; E)$.
adamc@549 1426
adamc@550 1427 \subsubsection{DML}
adamc@550 1428
adamc@550 1429 DML commands $D$ are added to the rules for expressions $e$.
adamc@550 1430
adamc@550 1431 $$\begin{array}{rrcll}
adamc@550 1432 \textrm{Commands} & D &::=& (\mt{INSERT} \; \mt{INTO} \; T^E \; (f,^+) \; \mt{VALUES} \; (E,^+)) \\
adamc@550 1433 &&& (\mt{UPDATE} \; T^E \; \mt{SET} \; (f = E,)^+ \; \mt{WHERE} \; E) \\
adamc@550 1434 &&& (\mt{DELETE} \; \mt{FROM} \; T^E \; \mt{WHERE} \; E) \\
adamc@550 1435 \textrm{Table expressions} & T^E &::=& x \mid \{\{e\}\}
adamc@550 1436 \end{array}$$
adamc@550 1437
adamc@550 1438 Inside $\mt{UPDATE}$ and $\mt{DELETE}$ commands, lone variables $X$ are interpreted as references to columns of the implicit table $\mt{T}$, rather than to named expressions.
adamc@549 1439
adamc@551 1440 \subsection{XML}
adamc@551 1441
adamc@551 1442 XML fragments $L$ are added to the rules for expressions $e$.
adamc@551 1443
adamc@551 1444 $$\begin{array}{rrcll}
adamc@551 1445 \textrm{XML fragments} & L &::=& \texttt{<xml/>} \mid \texttt{<xml>}l^*\texttt{</xml>} \\
adamc@551 1446 \textrm{XML pieces} & l &::=& \textrm{text} & \textrm{cdata} \\
adamc@551 1447 &&& \texttt{<}g\texttt{/>} & \textrm{tag with no children} \\
adamc@551 1448 &&& \texttt{<}g\texttt{>}l^*\texttt{</}x\texttt{>} & \textrm{tag with children} \\
adamc@551 1449 \textrm{Tag} & g &::=& h \; (x = v)^* \\
adamc@551 1450 \textrm{Tag head} & h &::=& x & \textrm{tag name} \\
adamc@551 1451 &&& h\{c\} & \textrm{constructor parameter} \\
adamc@551 1452 \textrm{Attribute value} & v &::=& \ell & \textrm{literal value} \\
adamc@551 1453 &&& \{e\} & \textrm{computed value} \\
adamc@551 1454 \end{array}$$
adamc@551 1455
adamc@552 1456
adamc@553 1457 \section{The Structure of Web Applications}
adamc@553 1458
adamc@553 1459 A web application is built from a series of modules, with one module, the last one appearing in the \texttt{.urp} file, designated as the main module. The signature of the main module determines the URL entry points to the application. Such an entry point should have type $\mt{unit} \to \mt{transaction} \; \mt{page}$, where $\mt{page}$ is a type synonym for top-level HTML pages, defined in $\mt{Basis}$. If such a function is at the top level of main module $M$, it will be accessible at URI \texttt{/M/f}, and so on for more deeply-nested functions, as described in Section \ref{tag} below.
adamc@553 1460
adamc@553 1461 When the standalone web server receives a request for a known page, it calls the function for that page, ``running'' the resulting transaction to produce the page to return to the client. Pages link to other pages with the \texttt{link} attribute of the \texttt{a} HTML tag. A link has type $\mt{transaction} \; \mt{page}$, and the semantics of a link are that this transaction should be run to compute the result page, when the link is followed. Link targets are assigned URL names in the same way as top-level entry points.
adamc@553 1462
adamc@553 1463 HTML forms are handled in a similar way. The $\mt{action}$ attribute of a $\mt{submit}$ form tag takes a value of type $\$\mt{use} \to \mt{transaction} \; \mt{page}$, where $\mt{use}$ is a kind-$\{\mt{Type}\}$ record of the form fields used by this action handler. Action handlers are assigned URL patterns in the same way as above.
adamc@553 1464
adamc@553 1465 For both links and actions, direct arguments and local variables mentioned implicitly via closures are automatically included in serialized form in URLs, in the order in which they appeared in the source code.
adamc@553 1466
adamc@553 1467
adamc@552 1468 \section{Compiler Phases}
adamc@552 1469
adamc@552 1470 The Ur/Web compiler is unconventional in that it relies on a kind of \emph{heuristic compilation}. Not all valid programs will compile successfully. Informally, programs fail to compile when they are ``too higher order.'' Compiler phases do their best to eliminate different kinds of higher order-ness, but some programs just won't compile. This is a trade-off for producing very efficient executables. Compiled Ur/Web programs use native C representations and require no garbage collection.
adamc@552 1471
adamc@552 1472 In this section, we step through the main phases of compilation, noting what consequences each phase has for effective programming.
adamc@552 1473
adamc@552 1474 \subsection{Parse}
adamc@552 1475
adamc@552 1476 The compiler reads a \texttt{.urp} file, figures out which \texttt{.urs} and \texttt{.ur} files it references, and combines them all into what is conceptually a single sequence of declarations in the core language of Section \ref{core}.
adamc@552 1477
adamc@552 1478 \subsection{Elaborate}
adamc@552 1479
adamc@552 1480 This is where type inference takes place, translating programs into an explicit form with no more wildcards. This phase is the most likely source of compiler error messages.
adamc@552 1481
adamc@552 1482 \subsection{Unnest}
adamc@552 1483
adamc@552 1484 Named local function definitions are moved to the top level, to avoid the need to generate closures.
adamc@552 1485
adamc@552 1486 \subsection{Corify}
adamc@552 1487
adamc@552 1488 Module system features are compiled away, through inlining of functor definitions at application sites. Afterward, most abstraction boundaries are broken, facilitating optimization.
adamc@552 1489
adamc@552 1490 \subsection{Especialize}
adamc@552 1491
adamc@552 1492 Functions are specialized to particular argument patterns. This is an important trick for avoiding the need to maintain any closures at runtime.
adamc@552 1493
adamc@552 1494 \subsection{Untangle}
adamc@552 1495
adamc@552 1496 Remove unnecessary mutual recursion, splitting recursive groups into strongly-connected components.
adamc@552 1497
adamc@552 1498 \subsection{Shake}
adamc@552 1499
adamc@552 1500 Remove all definitions not needed to run the page handlers that are visible in the signature of the last module listed in the \texttt{.urp} file.
adamc@552 1501
adamc@553 1502 \subsection{\label{tag}Tag}
adamc@552 1503
adamc@552 1504 Assign a URL name to each link and form action. It is important that these links and actions are written as applications of named functions, because such names are used to generate URL patterns. A URL pattern has a name built from the full module path of the named function, followed by the function name, with all pieces separated by slashes. The path of a functor application is based on the name given to the result, rather than the path of the functor itself.
adamc@552 1505
adamc@552 1506 \subsection{Reduce}
adamc@552 1507
adamc@552 1508 Apply definitional equality rules to simplify the program as much as possible. This effectively includes inlining of every non-recursive definition.
adamc@552 1509
adamc@552 1510 \subsection{Unpoly}
adamc@552 1511
adamc@552 1512 This phase specializes polymorphic functions to the specific arguments passed to them in the program. If the program contains real polymorphic recursion, Unpoly will be insufficient to avoid later error messages about too much polymorphism.
adamc@552 1513
adamc@552 1514 \subsection{Specialize}
adamc@552 1515
adamc@552 1516 Replace uses of parametrized datatypes with versions specialized to specific parameters. As for Unpoly, this phase will not be effective enough in the presence of polymorphic recursion or other fancy uses of impredicative polymorphism.
adamc@552 1517
adamc@552 1518 \subsection{Shake}
adamc@552 1519
adamc@552 1520 Here the compiler repeats the earlier shake phase.
adamc@552 1521
adamc@552 1522 \subsection{Monoize}
adamc@552 1523
adamc@552 1524 Programs are translated to a new intermediate language without polymorphism or non-$\mt{Type}$ constructors. Error messages may pop up here if earlier phases failed to remove such features.
adamc@552 1525
adamc@552 1526 This is the stage at which concrete names are generated for cookies, tables, and sequences. They are named following the same convention as for links and actions, based on module path information saved from earlier stages. Table and sequence names separate path elements with underscores instead of slashes, and they are prefixed by \texttt{uw\_}.
adamc@552 1527 \subsection{MonoOpt}
adamc@552 1528
adamc@552 1529 Simple algebraic laws are applied to simplify the program, focusing especially on efficient imperative generation of HTML pages.
adamc@552 1530
adamc@552 1531 \subsection{MonoUntangle}
adamc@552 1532
adamc@552 1533 Unnecessary mutual recursion is broken up again.
adamc@552 1534
adamc@552 1535 \subsection{MonoReduce}
adamc@552 1536
adamc@552 1537 Equivalents of the definitional equality rules are applied to simplify programs, with inlining again playing a major role.
adamc@552 1538
adamc@552 1539 \subsection{MonoShake, MonoOpt}
adamc@552 1540
adamc@552 1541 Unneeded declarations are removed, and basic optimizations are repeated.
adamc@552 1542
adamc@552 1543 \subsection{Fuse}
adamc@552 1544
adamc@552 1545 The compiler tries to simplify calls to recursive functions whose results are immediately written as page output. The write action is pushed inside the function definitions to avoid allocation of intermediate results.
adamc@552 1546
adamc@552 1547 \subsection{MonoUntangle, MonoShake}
adamc@552 1548
adamc@552 1549 Fuse often creates more opportunities to remove spurious mutual recursion.
adamc@552 1550
adamc@552 1551 \subsection{Pathcheck}
adamc@552 1552
adamc@552 1553 The compiler checks that no link or action name has been used more than once.
adamc@552 1554
adamc@552 1555 \subsection{Cjrize}
adamc@552 1556
adamc@552 1557 The program is translated to what is more or less a subset of C. If any use of functions as data remains at this point, the compiler will complain.
adamc@552 1558
adamc@552 1559 \subsection{C Compilation and Linking}
adamc@552 1560
adamc@552 1561 The output of the last phase is pretty-printed as C source code and passed to GCC.
adamc@552 1562
adamc@552 1563
adamc@524 1564 \end{document}