annotate doc/manual.tex @ 575:9f02f1765149

Starting to implement source set
author Adam Chlipala <adamc@hcoop.net>
date Tue, 30 Dec 2008 09:43:41 -0500
parents af0df56ecc2c
children 33500a15b872
rev   line source
adamc@524 1 \documentclass{article}
adamc@554 2 \usepackage{fullpage,amsmath,amssymb,proof,url}
adamc@524 3
adamc@524 4 \newcommand{\cd}[1]{\texttt{#1}}
adamc@524 5 \newcommand{\mt}[1]{\mathsf{#1}}
adamc@524 6
adamc@524 7 \newcommand{\rc}{+ \hspace{-.075in} + \;}
adamc@527 8 \newcommand{\rcut}{\; \texttt{--} \;}
adamc@527 9 \newcommand{\rcutM}{\; \texttt{---} \;}
adamc@524 10
adamc@524 11 \begin{document}
adamc@524 12
adamc@524 13 \title{The Ur/Web Manual}
adamc@524 14 \author{Adam Chlipala}
adamc@524 15
adamc@524 16 \maketitle
adamc@524 17
adamc@540 18 \tableofcontents
adamc@540 19
adamc@554 20
adamc@554 21 \section{Introduction}
adamc@554 22
adamc@554 23 \emph{Ur} is a programming language designed to introduce richer type system features into functional programming in the tradition of ML and Haskell. Ur is functional, pure, statically-typed, and strict. Ur supports a powerful kind of \emph{metaprogramming} based on \emph{row types}.
adamc@554 24
adamc@554 25 \emph{Ur/Web} is Ur plus a special standard library and associated rules for parsing and optimization. Ur/Web supports construction of dynamic web applications backed by SQL databases. The signature of the standard library is such that well-typed Ur/Web programs ``don't go wrong'' in a very broad sense. Not only do they not crash during particular page generations, but they also may not:
adamc@554 26
adamc@554 27 \begin{itemize}
adamc@554 28 \item Suffer from any kinds of code-injection attacks
adamc@554 29 \item Return invalid HTML
adamc@554 30 \item Contain dead intra-application links
adamc@554 31 \item Have mismatches between HTML forms and the fields expected by their handlers
adamc@554 32 \item Attempt invalid SQL queries
adamc@554 33 \item Use improper marshaling or unmarshaling in communication with SQL databases
adamc@554 34 \end{itemize}
adamc@554 35
adamc@554 36 This type safety is just the foundation of the Ur/Web methodology. It is also possible to use metaprogramming to build significant application pieces by analysis of type structure. For instance, the demo includes an ML-style functor for building an admin interface for an arbitrary SQL table. The type system guarantees that the admin interface sub-application that comes out will always be free of the above-listed bugs, no matter which well-typed table description is given as input.
adamc@554 37
adamc@554 38 The Ur/Web compiler also produces very efficient object code that does not use garbage collection. These compiled programs will often be even more efficient than what most programmers would bother to write in C.
adamc@554 39
adamc@554 40 \medskip
adamc@554 41
adamc@554 42 The official web site for Ur is:
adamc@554 43 \begin{center}
adamc@554 44 \url{http://www.impredicative.com/ur/}
adamc@554 45 \end{center}
adamc@554 46
adamc@555 47
adamc@555 48 \section{Installation}
adamc@555 49
adamc@555 50 If you are lucky, then the following standard command sequence will suffice for installation, in a directory to which you have unpacked the latest distribution tarball.
adamc@555 51
adamc@555 52 \begin{verbatim}
adamc@555 53 ./configure
adamc@555 54 make
adamc@555 55 sudo make install
adamc@555 56 \end{verbatim}
adamc@555 57
adamc@555 58 Some other packages must be installed for the above to work. At a minimum, you need a standard UNIX shell, with standard UNIX tools like sed and GCC in your execution path; and MLton, the whole-program optimizing compiler for Standard ML. To build programs that access SQL databases, you also need libpq, the PostgreSQL client library. As of this writing, in the ``testing'' version of Debian Linux, this command will install the more uncommon of these dependencies:
adamc@555 59
adamc@555 60 \begin{verbatim}
adamc@555 61 apt-get install mlton libpq-dev
adamc@555 62 \end{verbatim}
adamc@555 63
adamc@555 64 It is also possible to access the modules of the Ur/Web compiler interactively, within Standard ML of New Jersey. To install the prerequisites in Debian testing:
adamc@555 65
adamc@555 66 \begin{verbatim}
adamc@555 67 apt-get install smlnj libsmlnj-smlnj ml-yacc ml-lpt
adamc@555 68 \end{verbatim}
adamc@555 69
adamc@555 70 To begin an interactive session with the Ur compiler modules, run \texttt{make smlnj}, and then, from within an \texttt{sml} session, run \texttt{CM.make "src/urweb.cm";}. The \texttt{Compiler} module is the main entry point.
adamc@555 71
adamc@555 72 To run an SQL-backed application, you will probably want to install the PostgreSQL server. Version 8.3 or higher is required.
adamc@555 73
adamc@555 74 \begin{verbatim}
adamc@555 75 apt-get install postgresql-8.3
adamc@555 76 \end{verbatim}
adamc@555 77
adamc@555 78 To use the Emacs mode, you must have a modern Emacs installed. We assume that you already know how to do this, if you're in the business of looking for an Emacs mode. The demo generation facility of the compiler will also call out to Emacs to syntax-highlight code, and that process depends on the \texttt{htmlize} module, which can be installed in Debian testing via:
adamc@555 79
adamc@555 80 \begin{verbatim}
adamc@555 81 apt-get install emacs-goodies-el
adamc@555 82 \end{verbatim}
adamc@555 83
adamc@555 84 Even with the right packages installed, configuration and building might fail to work. After you run \texttt{./configure}, you will see the values of some named environment variables printed. You may need to adjust these values to get proper installation for your system. To change a value, store your preferred alternative in the corresponding UNIX environment variable, before running \texttt{./configure}. For instance, here is how to change the list of extra arguments that the Ur/Web compiler will pass to GCC on every invocation.
adamc@555 85
adamc@555 86 \begin{verbatim}
adamc@555 87 GCCARGS=-fnested-functions ./configure
adamc@555 88 \end{verbatim}
adamc@555 89
adamc@555 90 Some OSX users have reported needing to use this particular GCCARGS value.
adamc@555 91
adamc@555 92 The Emacs mode can be set to autoload by adding the following to your \texttt{.emacs} file.
adamc@555 93
adamc@555 94 \begin{verbatim}
adamc@555 95 (add-to-list 'load-path "/usr/local/share/emacs/site-lisp/urweb-mode")
adamc@555 96 (load "urweb-mode-startup")
adamc@555 97 \end{verbatim}
adamc@555 98
adamc@555 99 Change the path in the first line if you chose a different Emacs installation path during configuration.
adamc@555 100
adamc@555 101
adamc@556 102 \section{Command-Line Compiler}
adamc@556 103
adamc@556 104 \subsection{Project Files}
adamc@556 105
adamc@556 106 The basic inputs to the \texttt{urweb} compiler are project files, which have the extension \texttt{.urp}. Here is a sample \texttt{.urp} file.
adamc@556 107
adamc@556 108 \begin{verbatim}
adamc@556 109 database dbname=test
adamc@556 110 sql crud1.sql
adamc@556 111
adamc@556 112 crud
adamc@556 113 crud1
adamc@556 114 \end{verbatim}
adamc@556 115
adamc@556 116 The \texttt{database} line gives the database information string to pass to libpq. In this case, the string only says to connect to a local database named \texttt{test}.
adamc@556 117
adamc@556 118 The \texttt{sql} line asks for an SQL source file to be generated, giving the commands to run to create the tables and sequences that this application expects to find. After building this \texttt{.urp} file, the following commands could be used to initialize the database, assuming that the current UNIX user exists as a Postgres user with database creation privileges:
adamc@556 119
adamc@556 120 \begin{verbatim}
adamc@556 121 createdb test
adamc@556 122 psql -f crud1.sql test
adamc@556 123 \end{verbatim}
adamc@556 124
adamc@556 125 A blank line always separates the named directives from a list of modules to include in the project; if there are no named directives, a blank line must begin the file.
adamc@556 126
adamc@556 127 For each entry \texttt{M} in the module list, the file \texttt{M.urs} is included in the project if it exists, and the file \texttt{M.ur} must exist and is always included.
adamc@556 128
adamc@556 129 A few other named directives are supported. \texttt{prefix PREFIX} sets the prefix included before every URI within the generated application; the default is \texttt{/}. \texttt{exe FILENAME} sets the filename to which to write the output executable; the default for file \texttt{P.urp} is \texttt{P.exe}. \texttt{debug} saves some intermediate C files, which is mostly useful to help in debugging the compiler itself. \texttt{profile} generates an executable that may be used with gprof.
adamc@556 130
adamc@557 131 \subsection{Building an Application}
adamc@557 132
adamc@557 133 To compile project \texttt{P.urp}, simply run
adamc@557 134 \begin{verbatim}
adamc@557 135 urweb P
adamc@557 136 \end{verbatim}
adamc@558 137 The output executable is a standalone web server. Run it with the command-line argument \texttt{-h} to see which options it takes. If the project file lists a database, the web server will attempt to connect to that database on startup.
adamc@557 138
adamc@557 139 To time how long the different compiler phases run, without generating an executable, run
adamc@557 140 \begin{verbatim}
adamc@557 141 urweb -timing P
adamc@557 142 \end{verbatim}
adamc@557 143
adamc@556 144
adamc@529 145 \section{Ur Syntax}
adamc@529 146
adamc@529 147 In this section, we describe the syntax of Ur, deferring to a later section discussion of most of the syntax specific to SQL and XML. The sole exceptions are the declaration forms for tables, sequences, and cookies.
adamc@524 148
adamc@524 149 \subsection{Lexical Conventions}
adamc@524 150
adamc@524 151 We give the Ur language definition in \LaTeX $\;$ math mode, since that is prettier than monospaced ASCII. The corresponding ASCII syntax can be read off directly. Here is the key for mapping math symbols to ASCII character sequences.
adamc@524 152
adamc@524 153 \begin{center}
adamc@524 154 \begin{tabular}{rl}
adamc@524 155 \textbf{\LaTeX} & \textbf{ASCII} \\
adamc@524 156 $\to$ & \cd{->} \\
adamc@524 157 $\times$ & \cd{*} \\
adamc@524 158 $\lambda$ & \cd{fn} \\
adamc@524 159 $\Rightarrow$ & \cd{=>} \\
adamc@529 160 $\neq$ & \cd{<>} \\
adamc@529 161 $\leq$ & \cd{<=} \\
adamc@529 162 $\geq$ & \cd{>=} \\
adamc@524 163 \\
adamc@524 164 $x$ & Normal textual identifier, not beginning with an uppercase letter \\
adamc@525 165 $X$ & Normal textual identifier, beginning with an uppercase letter \\
adamc@524 166 \end{tabular}
adamc@524 167 \end{center}
adamc@524 168
adamc@525 169 We often write syntax like $e^*$ to indicate zero or more copies of $e$, $e^+$ to indicate one or more copies, and $e,^*$ and $e,^+$ to indicate multiple copies separated by commas. Another separator may be used in place of a comma. The $e$ term may be surrounded by parentheses to indicate grouping; those parentheses should not be included in the actual ASCII.
adamc@524 170
adamc@526 171 We write $\ell$ for literals of the primitive types, for the most part following C conventions. There are $\mt{int}$, $\mt{float}$, and $\mt{string}$ literals.
adamc@526 172
adamc@527 173 This version of the manual doesn't include operator precedences; see \texttt{src/urweb.grm} for that.
adamc@527 174
adamc@552 175 \subsection{\label{core}Core Syntax}
adamc@524 176
adamc@524 177 \emph{Kinds} classify types and other compile-time-only entities. Each kind in the grammar is listed with a description of the sort of data it classifies.
adamc@524 178 $$\begin{array}{rrcll}
adamc@524 179 \textrm{Kinds} & \kappa &::=& \mt{Type} & \textrm{proper types} \\
adamc@525 180 &&& \mt{Unit} & \textrm{the trivial constructor} \\
adamc@525 181 &&& \mt{Name} & \textrm{field names} \\
adamc@525 182 &&& \kappa \to \kappa & \textrm{type-level functions} \\
adamc@525 183 &&& \{\kappa\} & \textrm{type-level records} \\
adamc@525 184 &&& (\kappa\times^+) & \textrm{type-level tuples} \\
adamc@529 185 &&& \_\_ & \textrm{wildcard} \\
adamc@525 186 &&& (\kappa) & \textrm{explicit precedence} \\
adamc@524 187 \end{array}$$
adamc@524 188
adamc@524 189 Ur supports several different notions of functions that take types as arguments. These arguments can be either implicit, causing them to be inferred at use sites; or explicit, forcing them to be specified manually at use sites. There is a common explicitness annotation convention applied at the definitions of and in the types of such functions.
adamc@524 190 $$\begin{array}{rrcll}
adamc@524 191 \textrm{Explicitness} & ? &::=& :: & \textrm{explicit} \\
adamc@558 192 &&& ::: & \textrm{implicit}
adamc@524 193 \end{array}$$
adamc@524 194
adamc@524 195 \emph{Constructors} are the main class of compile-time-only data. They include proper types and are classified by kinds.
adamc@524 196 $$\begin{array}{rrcll}
adamc@524 197 \textrm{Constructors} & c, \tau &::=& (c) :: \kappa & \textrm{kind annotation} \\
adamc@530 198 &&& \hat{x} & \textrm{constructor variable} \\
adamc@524 199 \\
adamc@525 200 &&& \tau \to \tau & \textrm{function type} \\
adamc@525 201 &&& x \; ? \; \kappa \to \tau & \textrm{polymorphic function type} \\
adamc@525 202 &&& \$ c & \textrm{record type} \\
adamc@524 203 \\
adamc@525 204 &&& c \; c & \textrm{type-level function application} \\
adamc@530 205 &&& \lambda x \; :: \; \kappa \Rightarrow c & \textrm{type-level function abstraction} \\
adamc@524 206 \\
adamc@525 207 &&& () & \textrm{type-level unit} \\
adamc@525 208 &&& \#X & \textrm{field name} \\
adamc@524 209 \\
adamc@525 210 &&& [(c = c)^*] & \textrm{known-length type-level record} \\
adamc@525 211 &&& c \rc c & \textrm{type-level record concatenation} \\
adamc@525 212 &&& \mt{fold} & \textrm{type-level record fold} \\
adamc@524 213 \\
adamc@558 214 &&& (c,^+) & \textrm{type-level tuple} \\
adamc@525 215 &&& c.n & \textrm{type-level tuple projection ($n \in \mathbb N^+$)} \\
adamc@524 216 \\
adamc@525 217 &&& \lambda [c \sim c] \Rightarrow c & \textrm{guarded constructor} \\
adamc@524 218 \\
adamc@529 219 &&& \_ :: \kappa & \textrm{wildcard} \\
adamc@525 220 &&& (c) & \textrm{explicit precedence} \\
adamc@530 221 \\
adamc@530 222 \textrm{Qualified uncapitalized variables} & \hat{x} &::=& x & \textrm{not from a module} \\
adamc@530 223 &&& M.x & \textrm{projection from a module} \\
adamc@525 224 \end{array}$$
adamc@525 225
adamc@525 226 Modules of the module system are described by \emph{signatures}.
adamc@525 227 $$\begin{array}{rrcll}
adamc@525 228 \textrm{Signatures} & S &::=& \mt{sig} \; s^* \; \mt{end} & \textrm{constant} \\
adamc@525 229 &&& X & \textrm{variable} \\
adamc@525 230 &&& \mt{functor}(X : S) : S & \textrm{functor} \\
adamc@529 231 &&& S \; \mt{where} \; \mt{con} \; x = c & \textrm{concretizing an abstract constructor} \\
adamc@525 232 &&& M.X & \textrm{projection from a module} \\
adamc@525 233 \\
adamc@525 234 \textrm{Signature items} & s &::=& \mt{con} \; x :: \kappa & \textrm{abstract constructor} \\
adamc@525 235 &&& \mt{con} \; x :: \kappa = c & \textrm{concrete constructor} \\
adamc@528 236 &&& \mt{datatype} \; x \; x^* = dc\mid^+ & \textrm{algebraic datatype definition} \\
adamc@529 237 &&& \mt{datatype} \; x = \mt{datatype} \; M.x & \textrm{algebraic datatype import} \\
adamc@525 238 &&& \mt{val} \; x : \tau & \textrm{value} \\
adamc@525 239 &&& \mt{structure} \; X : S & \textrm{sub-module} \\
adamc@525 240 &&& \mt{signature} \; X = S & \textrm{sub-signature} \\
adamc@525 241 &&& \mt{include} \; S & \textrm{signature inclusion} \\
adamc@525 242 &&& \mt{constraint} \; c \sim c & \textrm{record disjointness constraint} \\
adamc@525 243 &&& \mt{class} \; x & \textrm{abstract type class} \\
adamc@525 244 &&& \mt{class} \; x = c & \textrm{concrete type class} \\
adamc@525 245 \\
adamc@525 246 \textrm{Datatype constructors} & dc &::=& X & \textrm{nullary constructor} \\
adamc@525 247 &&& X \; \mt{of} \; \tau & \textrm{unary constructor} \\
adamc@524 248 \end{array}$$
adamc@524 249
adamc@526 250 \emph{Patterns} are used to describe structural conditions on expressions, such that expressions may be tested against patterns, generating assignments to pattern variables if successful.
adamc@526 251 $$\begin{array}{rrcll}
adamc@526 252 \textrm{Patterns} & p &::=& \_ & \textrm{wildcard} \\
adamc@526 253 &&& x & \textrm{variable} \\
adamc@526 254 &&& \ell & \textrm{constant} \\
adamc@526 255 &&& \hat{X} & \textrm{nullary constructor} \\
adamc@526 256 &&& \hat{X} \; p & \textrm{unary constructor} \\
adamc@526 257 &&& \{(x = p,)^*\} & \textrm{rigid record pattern} \\
adamc@526 258 &&& \{(x = p,)^+, \ldots\} & \textrm{flexible record pattern} \\
adamc@527 259 &&& (p) & \textrm{explicit precedence} \\
adamc@526 260 \\
adamc@529 261 \textrm{Qualified capitalized variables} & \hat{X} &::=& X & \textrm{not from a module} \\
adamc@526 262 &&& M.X & \textrm{projection from a module} \\
adamc@526 263 \end{array}$$
adamc@526 264
adamc@527 265 \emph{Expressions} are the main run-time entities, corresponding to both ``expressions'' and ``statements'' in mainstream imperative languages.
adamc@527 266 $$\begin{array}{rrcll}
adamc@527 267 \textrm{Expressions} & e &::=& e : \tau & \textrm{type annotation} \\
adamc@529 268 &&& \hat{x} & \textrm{variable} \\
adamc@529 269 &&& \hat{X} & \textrm{datatype constructor} \\
adamc@527 270 &&& \ell & \textrm{constant} \\
adamc@527 271 \\
adamc@527 272 &&& e \; e & \textrm{function application} \\
adamc@527 273 &&& \lambda x : \tau \Rightarrow e & \textrm{function abstraction} \\
adamc@527 274 &&& e [c] & \textrm{polymorphic function application} \\
adamc@527 275 &&& \lambda x \; ? \; \kappa \Rightarrow e & \textrm{polymorphic function abstraction} \\
adamc@527 276 \\
adamc@527 277 &&& \{(c = e,)^*\} & \textrm{known-length record} \\
adamc@527 278 &&& e.c & \textrm{record field projection} \\
adamc@527 279 &&& e \rc e & \textrm{record concatenation} \\
adamc@527 280 &&& e \rcut c & \textrm{removal of a single record field} \\
adamc@527 281 &&& e \rcutM c & \textrm{removal of multiple record fields} \\
adamc@527 282 &&& \mt{fold} & \textrm{fold over fields of a type-level record} \\
adamc@527 283 \\
adamc@527 284 &&& \mt{let} \; ed^* \; \mt{in} \; e \; \mt{end} & \textrm{local definitions} \\
adamc@527 285 \\
adamc@527 286 &&& \mt{case} \; e \; \mt{of} \; (p \Rightarrow e|)^+ & \textrm{pattern matching} \\
adamc@527 287 \\
adamc@527 288 &&& \lambda [c \sim c] \Rightarrow e & \textrm{guarded expression} \\
adamc@527 289 \\
adamc@527 290 &&& \_ & \textrm{wildcard} \\
adamc@527 291 &&& (e) & \textrm{explicit precedence} \\
adamc@527 292 \\
adamc@527 293 \textrm{Local declarations} & ed &::=& \cd{val} \; x : \tau = e & \textrm{non-recursive value} \\
adamc@527 294 &&& \cd{val} \; \cd{rec} \; (x : \tau = e \; \cd{and})^+ & \textrm{mutually-recursive values} \\
adamc@527 295 \end{array}$$
adamc@527 296
adamc@528 297 \emph{Declarations} primarily bring new symbols into context.
adamc@528 298 $$\begin{array}{rrcll}
adamc@528 299 \textrm{Declarations} & d &::=& \mt{con} \; x :: \kappa = c & \textrm{constructor synonym} \\
adamc@528 300 &&& \mt{datatype} \; x \; x^* = dc\mid^+ & \textrm{algebraic datatype definition} \\
adamc@529 301 &&& \mt{datatype} \; x = \mt{datatype} \; M.x & \textrm{algebraic datatype import} \\
adamc@528 302 &&& \mt{val} \; x : \tau = e & \textrm{value} \\
adamc@528 303 &&& \mt{val} \; \cd{rec} \; (x : \tau = e \; \mt{and})^+ & \textrm{mutually-recursive values} \\
adamc@528 304 &&& \mt{structure} \; X : S = M & \textrm{module definition} \\
adamc@528 305 &&& \mt{signature} \; X = S & \textrm{signature definition} \\
adamc@528 306 &&& \mt{open} \; M & \textrm{module inclusion} \\
adamc@528 307 &&& \mt{constraint} \; c \sim c & \textrm{record disjointness constraint} \\
adamc@528 308 &&& \mt{open} \; \mt{constraints} \; M & \textrm{inclusion of just the constraints from a module} \\
adamc@528 309 &&& \mt{table} \; x : c & \textrm{SQL table} \\
adamc@528 310 &&& \mt{sequence} \; x & \textrm{SQL sequence} \\
adamc@535 311 &&& \mt{cookie} \; x : \tau & \textrm{HTTP cookie} \\
adamc@528 312 &&& \mt{class} \; x = c & \textrm{concrete type class} \\
adamc@528 313 \\
adamc@529 314 \textrm{Modules} & M &::=& \mt{struct} \; d^* \; \mt{end} & \textrm{constant} \\
adamc@529 315 &&& X & \textrm{variable} \\
adamc@529 316 &&& M.X & \textrm{projection} \\
adamc@529 317 &&& M(M) & \textrm{functor application} \\
adamc@529 318 &&& \mt{functor}(X : S) : S = M & \textrm{functor abstraction} \\
adamc@528 319 \end{array}$$
adamc@528 320
adamc@528 321 There are two kinds of Ur files. A file named $M\texttt{.ur}$ is an \emph{implementation file}, and it should contain a sequence of declarations $d^*$. A file named $M\texttt{.urs}$ is an \emph{interface file}; it must always have a matching $M\texttt{.ur}$ and should contain a sequence of signature items $s^*$. When both files are present, the overall effect is the same as a monolithic declaration $\mt{structure} \; M : \mt{sig} \; s^* \; \mt{end} = \mt{struct} \; d^* \; \mt{end}$. When no interface file is included, the overall effect is similar, with a signature for module $M$ being inferred rather than just checked against an interface.
adamc@527 322
adamc@529 323 \subsection{Shorthands}
adamc@529 324
adamc@529 325 There are a variety of derived syntactic forms that elaborate into the core syntax from the last subsection. We will present the additional forms roughly following the order in which we presented the constructs that they elaborate into.
adamc@529 326
adamc@529 327 In many contexts where record fields are expected, like in a projection $e.c$, a constant field may be written as simply $X$, rather than $\#X$.
adamc@529 328
adamc@529 329 A record type may be written $\{(c = c,)^*\}$, which elaborates to $\$[(c = c,)^*]$.
adamc@529 330
adamc@533 331 The notation $[c_1, \ldots, c_n]$ is shorthand for $[c_1 = (), \ldots, c_n = ()]$.
adamc@533 332
adamc@529 333 A tuple type $(\tau_1, \ldots, \tau_n)$ expands to a record type $\{1 = \tau_1, \ldots, n = \tau_n\}$, with natural numbers as field names. A tuple pattern $(p_1, \ldots, p_n)$ expands to a rigid record pattern $\{1 = p_1, \ldots, n = p_n\}$. Positive natural numbers may be used in most places where field names would be allowed.
adamc@529 334
adamc@529 335 In general, several adjacent $\lambda$ forms may be combined into one, and kind and type annotations may be omitted, in which case they are implicitly included as wildcards. More formally, for constructor-level abstractions, we can define a new non-terminal $b ::= x \mid (x :: \kappa) \mid [c \sim c]$ and allow composite abstractions of the form $\lambda b^+ \Rightarrow c$, elaborating into the obvious sequence of one core $\lambda$ per element of $b^+$.
adamc@529 336
adamc@529 337 For any signature item or declaration that defines some entity to be equal to $A$ with classification annotation $B$ (e.g., $\mt{val} \; x : B = A$), $B$ and the preceding colon (or similar punctuation) may be omitted, in which case it is filled in as a wildcard.
adamc@529 338
adamc@529 339 A signature item or declaration $\mt{type} \; x$ or $\mt{type} \; x = \tau$ is elaborated into $\mt{con} \; x :: \mt{Type}$ or $\mt{con} \; x :: \mt{Type} = \tau$, respectively.
adamc@529 340
adamc@529 341 A signature item or declaration $\mt{class} \; x = \lambda y :: \mt{Type} \Rightarrow c$ may be abbreviated $\mt{class} \; x \; y = c$.
adamc@529 342
adamc@529 343 Handling of implicit and explicit constructor arguments may be tweaked with some prefixes to variable references. An expression $@x$ is a version of $x$ where all implicit constructor arguments have been made explicit. An expression $@@x$ achieves the same effect, additionally halting automatic resolution of type class instances. The same syntax works for variables projected out of modules and for capitalized variables (datatype constructors).
adamc@529 344
adamc@529 345 At the expression level, an analogue is available of the composite $\lambda$ form for constructors. We define the language of binders as $b ::= x \mid (x : \tau) \mid (x \; ? \; \kappa) \mid [c \sim c]$. A lone variable $x$ as a binder stands for an expression variable of unspecified type.
adamc@529 346
adamc@529 347 A $\mt{val}$ or $\mt{val} \; \mt{rec}$ declaration may include expression binders before the equal sign, following the binder grammar from the last paragraph. Such declarations are elaborated into versions that add additional $\lambda$s to the fronts of the righthand sides, as appropriate. The keyword $\mt{fun}$ is a synonym for $\mt{val} \; \mt{rec}$.
adamc@529 348
adamc@529 349 A signature item $\mt{functor} \; X_1 \; (X_2 : S_1) : S_2$ is elaborated into $\mt{structure} \; X_1 : \mt{functor}(X_2 : S_1) : S_2$. A declaration $\mt{functor} \; X_1 \; (X_2 : S_1) : S_2 = M$ is elaborated into $\mt{structure} \; X_1 : \mt{functor}(X_2 : S_1) : S_2 = \mt{functor}(X_2 : S_1) : S_2 = M$.
adamc@529 350
adamc@529 351 A declaration $\mt{table} \; x : \{(c = c,)^*\}$ is elaborated into $\mt{table} \; x : [(c = c,)^*]$
adamc@529 352
adamc@529 353 The syntax $\mt{where} \; \mt{type}$ is an alternate form of $\mt{where} \; \mt{con}$.
adamc@529 354
adamc@529 355 The syntax $\mt{if} \; e \; \mt{then} \; e_1 \; \mt{else} \; e_2$ expands to $\mt{case} \; e \; \mt{of} \; \mt{Basis}.\mt{True} \Rightarrow e_1 \mid \mt{Basis}.\mt{False} \Rightarrow e_2$.
adamc@529 356
adamc@529 357 There are infix operator syntaxes for a number of functions defined in the $\mt{Basis}$ module. There is $=$ for $\mt{eq}$, $\neq$ for $\mt{neq}$, $-$ for $\mt{neg}$ (as a prefix operator) and $\mt{minus}$, $+$ for $\mt{plus}$, $\times$ for $\mt{times}$, $/$ for $\mt{div}$, $\%$ for $\mt{mod}$, $<$ for $\mt{lt}$, $\leq$ for $\mt{le}$, $>$ for $\mt{gt}$, and $\geq$ for $\mt{ge}$.
adamc@529 358
adamc@529 359 A signature item $\mt{table} \; x : c$ is shorthand for $\mt{val} \; x : \mt{Basis}.\mt{sql\_table} \; c$. $\mt{sequence} \; x$ is short for $\mt{val} \; x : \mt{Basis}.\mt{sql\_sequence}$, and $\mt{cookie} \; x : \tau$ is shorthand for $\mt{val} \; x : \mt{Basis}.\mt{http\_cookie} \; \tau$.
adamc@529 360
adamc@530 361
adamc@530 362 \section{Static Semantics}
adamc@530 363
adamc@530 364 In this section, we give a declarative presentation of Ur's typing rules and related judgments. Inference is the subject of the next section; here, we assume that an oracle has filled in all wildcards with concrete values.
adamc@530 365
adamc@530 366 Since there is significant mutual recursion among the judgments, we introduce them all before beginning to give rules. We use the same variety of contexts throughout this section, implicitly introducing new sorts of context entries as needed.
adamc@530 367 \begin{itemize}
adamc@530 368 \item $\Gamma \vdash c :: \kappa$ assigns a kind to a constructor in a context.
adamc@530 369 \item $\Gamma \vdash c \sim c$ proves the disjointness of two record constructors; that is, that they share no field names. We overload the judgment to apply to pairs of field names as well.
adamc@531 370 \item $\Gamma \vdash c \hookrightarrow C$ proves that record constructor $c$ decomposes into set $C$ of field names and record constructors.
adamc@530 371 \item $\Gamma \vdash c \equiv c$ proves the computational equivalence of two constructors. This is often called a \emph{definitional equality} in the world of type theory.
adamc@530 372 \item $\Gamma \vdash e : \tau$ is a standard typing judgment.
adamc@534 373 \item $\Gamma \vdash p \leadsto \Gamma; \tau$ combines typing of patterns with calculation of which new variables they bind.
adamc@537 374 \item $\Gamma \vdash d \leadsto \Gamma$ expresses how a declaration modifies a context. We overload this judgment to apply to sequences of declarations, as well as to signature items and sequences of signature items.
adamc@537 375 \item $\Gamma \vdash S \equiv S$ is the signature equivalence judgment.
adamc@536 376 \item $\Gamma \vdash S \leq S$ is the signature compatibility judgment. We write $\Gamma \vdash S$ as shorthand for $\Gamma \vdash S \leq S$.
adamc@530 377 \item $\Gamma \vdash M : S$ is the module signature checking judgment.
adamc@537 378 \item $\mt{proj}(M, \overline{s}, V)$ is a partial function for projecting a signature item from $\overline{s}$, given the module $M$ that we project from. $V$ may be $\mt{con} \; x$, $\mt{datatype} \; x$, $\mt{val} \; x$, $\mt{signature} \; X$, or $\mt{structure} \; X$. The parameter $M$ is needed because the projected signature item may refer to other items from $\overline{s}$.
adamc@539 379 \item $\mt{selfify}(M, \overline{s})$ adds information to signature items $\overline{s}$ to reflect the fact that we are concerned with the particular module $M$. This function is overloaded to work over individual signature items as well.
adamc@530 380 \end{itemize}
adamc@530 381
adamc@530 382 \subsection{Kinding}
adamc@530 383
adamc@530 384 $$\infer{\Gamma \vdash (c) :: \kappa :: \kappa}{
adamc@530 385 \Gamma \vdash c :: \kappa
adamc@530 386 }
adamc@530 387 \quad \infer{\Gamma \vdash x :: \kappa}{
adamc@530 388 x :: \kappa \in \Gamma
adamc@530 389 }
adamc@530 390 \quad \infer{\Gamma \vdash x :: \kappa}{
adamc@530 391 x :: \kappa = c \in \Gamma
adamc@530 392 }$$
adamc@530 393
adamc@530 394 $$\infer{\Gamma \vdash M.x :: \kappa}{
adamc@537 395 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 396 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = \kappa
adamc@530 397 }
adamc@530 398 \quad \infer{\Gamma \vdash M.x :: \kappa}{
adamc@537 399 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 400 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = (\kappa, c)
adamc@530 401 }$$
adamc@530 402
adamc@530 403 $$\infer{\Gamma \vdash \tau_1 \to \tau_2 :: \mt{Type}}{
adamc@530 404 \Gamma \vdash \tau_1 :: \mt{Type}
adamc@530 405 & \Gamma \vdash \tau_2 :: \mt{Type}
adamc@530 406 }
adamc@530 407 \quad \infer{\Gamma \vdash x \; ? \: \kappa \to \tau :: \mt{Type}}{
adamc@530 408 \Gamma, x :: \kappa \vdash \tau :: \mt{Type}
adamc@530 409 }
adamc@530 410 \quad \infer{\Gamma \vdash \$c :: \mt{Type}}{
adamc@530 411 \Gamma \vdash c :: \{\mt{Type}\}
adamc@530 412 }$$
adamc@530 413
adamc@530 414 $$\infer{\Gamma \vdash c_1 \; c_2 :: \kappa_2}{
adamc@530 415 \Gamma \vdash c_1 :: \kappa_1 \to \kappa_2
adamc@530 416 & \Gamma \vdash c_2 :: \kappa_1
adamc@530 417 }
adamc@530 418 \quad \infer{\Gamma \vdash \lambda x \; :: \; \kappa_1 \Rightarrow c :: \kappa_1 \to \kappa_2}{
adamc@530 419 \Gamma, x :: \kappa_1 \vdash c :: \kappa_2
adamc@530 420 }$$
adamc@530 421
adamc@530 422 $$\infer{\Gamma \vdash () :: \mt{Unit}}{}
adamc@530 423 \quad \infer{\Gamma \vdash \#X :: \mt{Name}}{}$$
adamc@530 424
adamc@530 425 $$\infer{\Gamma \vdash [\overline{c_i = c'_i}] :: \{\kappa\}}{
adamc@530 426 \forall i: \Gamma \vdash c_i : \mt{Name}
adamc@530 427 & \Gamma \vdash c'_i :: \kappa
adamc@530 428 & \forall i \neq j: \Gamma \vdash c_i \sim c_j
adamc@530 429 }
adamc@530 430 \quad \infer{\Gamma \vdash c_1 \rc c_2 :: \{\kappa\}}{
adamc@530 431 \Gamma \vdash c_1 :: \{\kappa\}
adamc@530 432 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@530 433 & \Gamma \vdash c_1 \sim c_2
adamc@530 434 }$$
adamc@530 435
adamc@530 436 $$\infer{\Gamma \vdash \mt{fold} :: ((\mt{Name} \to \kappa_1 \to \kappa_2 \to \kappa_2) \to \kappa_2 \to \{\kappa_1\} \to \kappa_2}{}$$
adamc@530 437
adamc@530 438 $$\infer{\Gamma \vdash (\overline c) :: (k_1 \times \ldots \times k_n)}{
adamc@530 439 \forall i: \Gamma \vdash c_i :: k_i
adamc@530 440 }
adamc@530 441 \quad \infer{\Gamma \vdash c.i :: k_i}{
adamc@530 442 \Gamma \vdash c :: (k_1 \times \ldots \times k_n)
adamc@530 443 }$$
adamc@530 444
adamc@530 445 $$\infer{\Gamma \vdash \lambda [c_1 \sim c_2] \Rightarrow c :: \kappa}{
adamc@530 446 \Gamma \vdash c_1 :: \{\kappa'\}
adamc@530 447 & \Gamma \vdash c_2 :: \{\kappa'\}
adamc@530 448 & \Gamma, c_1 \sim c_2 \vdash c :: \kappa
adamc@530 449 }$$
adamc@530 450
adamc@531 451 \subsection{Record Disjointness}
adamc@531 452
adamc@531 453 We will use a keyword $\mt{map}$ as a shorthand, such that, for $f$ of kind $\kappa \to \kappa'$, $\mt{map} \; f$ stands for $\mt{fold} \; (\lambda (x_1 :: \mt{Name}) (x_2 :: \kappa) (x_3 :: \{\kappa'\}) \Rightarrow [x_1 = f \; x_2] \rc x_3) \; []$.
adamc@531 454
adamc@531 455 $$\infer{\Gamma \vdash c_1 \sim c_2}{
adamc@558 456 \Gamma \vdash c_1 \hookrightarrow C_1
adamc@558 457 & \Gamma \vdash c_2 \hookrightarrow C_2
adamc@558 458 & \forall c'_1 \in C_1, c'_2 \in C_2: \Gamma \vdash c'_1 \sim c'_2
adamc@531 459 }
adamc@531 460 \quad \infer{\Gamma \vdash X \sim X'}{
adamc@531 461 X \neq X'
adamc@531 462 }$$
adamc@531 463
adamc@531 464 $$\infer{\Gamma \vdash c_1 \sim c_2}{
adamc@531 465 c'_1 \sim c'_2 \in \Gamma
adamc@558 466 & \Gamma \vdash c'_1 \hookrightarrow C_1
adamc@558 467 & \Gamma \vdash c'_2 \hookrightarrow C_2
adamc@558 468 & c_1 \in C_1
adamc@558 469 & c_2 \in C_2
adamc@531 470 }$$
adamc@531 471
adamc@531 472 $$\infer{\Gamma \vdash c \hookrightarrow \{c\}}{}
adamc@531 473 \quad \infer{\Gamma \vdash [\overline{c = c'}] \hookrightarrow \{\overline{c}\}}{}
adamc@531 474 \quad \infer{\Gamma \vdash c_1 \rc c_2 \hookrightarrow C_1 \cup C_2}{
adamc@531 475 \Gamma \vdash c_1 \hookrightarrow C_1
adamc@531 476 & \Gamma \vdash c_2 \hookrightarrow C_2
adamc@531 477 }
adamc@531 478 \quad \infer{\Gamma \vdash c \hookrightarrow C}{
adamc@531 479 \Gamma \vdash c \equiv c'
adamc@531 480 & \Gamma \vdash c' \hookrightarrow C
adamc@531 481 }
adamc@531 482 \quad \infer{\Gamma \vdash \mt{map} \; f \; c \hookrightarrow C}{
adamc@531 483 \Gamma \vdash c \hookrightarrow C
adamc@531 484 }$$
adamc@531 485
adamc@541 486 \subsection{\label{definitional}Definitional Equality}
adamc@532 487
adamc@532 488 We use $\mathcal C$ to stand for a one-hole context that, when filled, yields a constructor. The notation $\mathcal C[c]$ plugs $c$ into $\mathcal C$. We omit the standard definition of one-hole contexts. We write $[x \mapsto c_1]c_2$ for capture-avoiding substitution of $c_1$ for $x$ in $c_2$.
adamc@532 489
adamc@532 490 $$\infer{\Gamma \vdash c \equiv c}{}
adamc@532 491 \quad \infer{\Gamma \vdash c_1 \equiv c_2}{
adamc@532 492 \Gamma \vdash c_2 \equiv c_1
adamc@532 493 }
adamc@532 494 \quad \infer{\Gamma \vdash c_1 \equiv c_3}{
adamc@532 495 \Gamma \vdash c_1 \equiv c_2
adamc@532 496 & \Gamma \vdash c_2 \equiv c_3
adamc@532 497 }
adamc@532 498 \quad \infer{\Gamma \vdash \mathcal C[c_1] \equiv \mathcal C[c_2]}{
adamc@532 499 \Gamma \vdash c_1 \equiv c_2
adamc@532 500 }$$
adamc@532 501
adamc@532 502 $$\infer{\Gamma \vdash x \equiv c}{
adamc@532 503 x :: \kappa = c \in \Gamma
adamc@532 504 }
adamc@532 505 \quad \infer{\Gamma \vdash M.x \equiv c}{
adamc@537 506 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 507 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = (\kappa, c)
adamc@532 508 }
adamc@532 509 \quad \infer{\Gamma \vdash (\overline c).i \equiv c_i}{}$$
adamc@532 510
adamc@532 511 $$\infer{\Gamma \vdash (\lambda x :: \kappa \Rightarrow c) \; c' \equiv [x \mapsto c'] c}{}
adamc@532 512 \quad \infer{\Gamma \vdash c_1 \rc c_2 \equiv c_2 \rc c_1}{}
adamc@532 513 \quad \infer{\Gamma \vdash c_1 \rc (c_2 \rc c_3) \equiv (c_1 \rc c_2) \rc c_3}{}$$
adamc@532 514
adamc@532 515 $$\infer{\Gamma \vdash [] \rc c \equiv c}{}
adamc@532 516 \quad \infer{\Gamma \vdash [\overline{c_1 = c'_1}] \rc [\overline{c_2 = c'_2}] \equiv [\overline{c_1 = c'_1}, \overline{c_2 = c'_2}]}{}$$
adamc@532 517
adamc@532 518 $$\infer{\Gamma \vdash \lambda [c_1 \sim c_2] \Rightarrow c \equiv c}{
adamc@532 519 \Gamma \vdash c_1 \sim c_2
adamc@532 520 }
adamc@532 521 \quad \infer{\Gamma \vdash \mt{fold} \; f \; i \; [] \equiv i}{}
adamc@532 522 \quad \infer{\Gamma \vdash \mt{fold} \; f \; i \; ([c_1 = c_2] \rc c) \equiv f \; c_1 \; c_2 \; (\mt{fold} \; f \; i \; c)}{}$$
adamc@532 523
adamc@532 524 $$\infer{\Gamma \vdash \mt{map} \; (\lambda x \Rightarrow x) \; c \equiv c}{}
adamc@532 525 \quad \infer{\Gamma \vdash \mt{fold} \; f \; i \; (\mt{map} \; f' \; c)
adamc@532 526 \equiv \mt{fold} \; (\lambda (x_1 :: \mt{Name}) (x_2 :: \kappa) \Rightarrow f \; x_1 \; (f' \; x_2)) \; i \; c}{}$$
adamc@532 527
adamc@532 528 $$\infer{\Gamma \vdash \mt{map} \; f \; (c_1 \rc c_2) \equiv \mt{map} \; f \; c_1 \rc \mt{map} \; f \; c_2}{}$$
adamc@531 529
adamc@534 530 \subsection{Expression Typing}
adamc@533 531
adamc@533 532 We assume the existence of a function $T$ assigning types to literal constants. It maps integer constants to $\mt{Basis}.\mt{int}$, float constants to $\mt{Basis}.\mt{float}$, and string constants to $\mt{Basis}.\mt{string}$.
adamc@533 533
adamc@533 534 We also refer to a function $\mathcal I$, such that $\mathcal I(\tau)$ ``uses an oracle'' to instantiate all constructor function arguments at the beginning of $\tau$ that are marked implicit; i.e., replace $x_1 ::: \kappa_1 \to \ldots \to x_n ::: \kappa_n \to \tau$ with $[x_1 \mapsto c_1]\ldots[x_n \mapsto c_n]\tau$, where the $c_i$s are inferred and $\tau$ does not start like $x ::: \kappa \to \tau'$.
adamc@533 535
adamc@533 536 $$\infer{\Gamma \vdash e : \tau : \tau}{
adamc@533 537 \Gamma \vdash e : \tau
adamc@533 538 }
adamc@533 539 \quad \infer{\Gamma \vdash e : \tau}{
adamc@533 540 \Gamma \vdash e : \tau'
adamc@533 541 & \Gamma \vdash \tau' \equiv \tau
adamc@533 542 }
adamc@533 543 \quad \infer{\Gamma \vdash \ell : T(\ell)}{}$$
adamc@533 544
adamc@533 545 $$\infer{\Gamma \vdash x : \mathcal I(\tau)}{
adamc@533 546 x : \tau \in \Gamma
adamc@533 547 }
adamc@533 548 \quad \infer{\Gamma \vdash M.x : \mathcal I(\tau)}{
adamc@537 549 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 550 & \mt{proj}(M, \overline{s}, \mt{val} \; x) = \tau
adamc@533 551 }
adamc@533 552 \quad \infer{\Gamma \vdash X : \mathcal I(\tau)}{
adamc@533 553 X : \tau \in \Gamma
adamc@533 554 }
adamc@533 555 \quad \infer{\Gamma \vdash M.X : \mathcal I(\tau)}{
adamc@537 556 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 557 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \tau
adamc@533 558 }$$
adamc@533 559
adamc@533 560 $$\infer{\Gamma \vdash e_1 \; e_2 : \tau_2}{
adamc@533 561 \Gamma \vdash e_1 : \tau_1 \to \tau_2
adamc@533 562 & \Gamma \vdash e_2 : \tau_1
adamc@533 563 }
adamc@533 564 \quad \infer{\Gamma \vdash \lambda x : \tau_1 \Rightarrow e : \tau_1 \to \tau_2}{
adamc@533 565 \Gamma, x : \tau_1 \vdash e : \tau_2
adamc@533 566 }$$
adamc@533 567
adamc@533 568 $$\infer{\Gamma \vdash e [c] : [x \mapsto c]\tau}{
adamc@533 569 \Gamma \vdash e : x :: \kappa \to \tau
adamc@533 570 & \Gamma \vdash c :: \kappa
adamc@533 571 }
adamc@533 572 \quad \infer{\Gamma \vdash \lambda x \; ? \; \kappa \Rightarrow e : x \; ? \; \kappa \to \tau}{
adamc@533 573 \Gamma, x :: \kappa \vdash e : \tau
adamc@533 574 }$$
adamc@533 575
adamc@533 576 $$\infer{\Gamma \vdash \{\overline{c = e}\} : \{\overline{c : \tau}\}}{
adamc@533 577 \forall i: \Gamma \vdash c_i :: \mt{Name}
adamc@533 578 & \Gamma \vdash e_i : \tau_i
adamc@533 579 & \forall i \neq j: \Gamma \vdash c_i \sim c_j
adamc@533 580 }
adamc@533 581 \quad \infer{\Gamma \vdash e.c : \tau}{
adamc@533 582 \Gamma \vdash e : \$([c = \tau] \rc c')
adamc@533 583 }
adamc@533 584 \quad \infer{\Gamma \vdash e_1 \rc e_2 : \$(c_1 \rc c_2)}{
adamc@533 585 \Gamma \vdash e_1 : \$c_1
adamc@533 586 & \Gamma \vdash e_2 : \$c_2
adamc@533 587 }$$
adamc@533 588
adamc@533 589 $$\infer{\Gamma \vdash e \rcut c : \$c'}{
adamc@533 590 \Gamma \vdash e : \$([c = \tau] \rc c')
adamc@533 591 }
adamc@533 592 \quad \infer{\Gamma \vdash e \rcutM c : \$c'}{
adamc@533 593 \Gamma \vdash e : \$(c \rc c')
adamc@533 594 }$$
adamc@533 595
adamc@533 596 $$\infer{\Gamma \vdash \mt{fold} : \begin{array}{c}
adamc@533 597 x_1 :: (\{\kappa\} \to \tau)
adamc@533 598 \to (x_2 :: \mt{Name} \to x_3 :: \kappa \to x_4 :: \{\kappa\} \to \lambda [[x_2] \sim x_4]
adamc@533 599 \Rightarrow x_1 \; x_4 \to x_1 \; ([x_2 = x_3] \rc x_4)) \\
adamc@533 600 \to x_1 \; [] \to x_5 :: \{\kappa\} \to x_1 \; x_5
adamc@533 601 \end{array}}{}$$
adamc@533 602
adamc@533 603 $$\infer{\Gamma \vdash \mt{let} \; \overline{ed} \; \mt{in} \; e \; \mt{end} : \tau}{
adamc@533 604 \Gamma \vdash \overline{ed} \leadsto \Gamma'
adamc@533 605 & \Gamma' \vdash e : \tau
adamc@533 606 }
adamc@533 607 \quad \infer{\Gamma \vdash \mt{case} \; e \; \mt{of} \; \overline{p \Rightarrow e} : \tau}{
adamc@533 608 \forall i: \Gamma \vdash p_i \leadsto \Gamma_i, \tau'
adamc@533 609 & \Gamma_i \vdash e_i : \tau
adamc@533 610 }$$
adamc@533 611
adamc@533 612 $$\infer{\Gamma \vdash [c_1 \sim c_2] \Rightarrow e : [c_1 \sim c_2] \Rightarrow \tau}{
adamc@533 613 \Gamma \vdash c_1 :: \{\kappa\}
adamc@533 614 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@533 615 & \Gamma, c_1 \sim c_2 \vdash e : \tau
adamc@533 616 }$$
adamc@533 617
adamc@534 618 \subsection{Pattern Typing}
adamc@534 619
adamc@534 620 $$\infer{\Gamma \vdash \_ \leadsto \Gamma; \tau}{}
adamc@534 621 \quad \infer{\Gamma \vdash x \leadsto \Gamma, x : \tau; \tau}{}
adamc@534 622 \quad \infer{\Gamma \vdash \ell \leadsto \Gamma; T(\ell)}{}$$
adamc@534 623
adamc@534 624 $$\infer{\Gamma \vdash X \leadsto \Gamma; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@534 625 X : \overline{x ::: \mt{Type}} \to \tau \in \Gamma
adamc@534 626 & \textrm{$\tau$ not a function type}
adamc@534 627 }
adamc@534 628 \quad \infer{\Gamma \vdash X \; p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@534 629 X : \overline{x ::: \mt{Type}} \to \tau'' \to \tau \in \Gamma
adamc@534 630 & \Gamma \vdash p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau''
adamc@534 631 }$$
adamc@534 632
adamc@534 633 $$\infer{\Gamma \vdash M.X \leadsto \Gamma; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@537 634 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 635 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \overline{x ::: \mt{Type}} \to \tau
adamc@534 636 & \textrm{$\tau$ not a function type}
adamc@534 637 }$$
adamc@534 638
adamc@534 639 $$\infer{\Gamma \vdash M.X \; p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@537 640 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 641 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \overline{x ::: \mt{Type}} \to \tau'' \to \tau
adamc@534 642 & \Gamma \vdash p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau''
adamc@534 643 }$$
adamc@534 644
adamc@534 645 $$\infer{\Gamma \vdash \{\overline{x = p}\} \leadsto \Gamma_n; \{\overline{x = \tau}\}}{
adamc@534 646 \Gamma_0 = \Gamma
adamc@534 647 & \forall i: \Gamma_i \vdash p_i \leadsto \Gamma_{i+1}; \tau_i
adamc@534 648 }
adamc@534 649 \quad \infer{\Gamma \vdash \{\overline{x = p}, \ldots\} \leadsto \Gamma_n; \$([\overline{x = \tau}] \rc c)}{
adamc@534 650 \Gamma_0 = \Gamma
adamc@534 651 & \forall i: \Gamma_i \vdash p_i \leadsto \Gamma_{i+1}; \tau_i
adamc@534 652 }$$
adamc@534 653
adamc@535 654 \subsection{Declaration Typing}
adamc@535 655
adamc@535 656 We use an auxiliary judgment $\overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'$, expressing the enrichment of $\Gamma$ with the types of the datatype constructors $\overline{dc}$, when they are known to belong to datatype $x$ with type parameters $\overline{y}$.
adamc@535 657
adamc@541 658 This is the first judgment where we deal with type classes, for the $\mt{class}$ declaration form. We will omit their special handling in this formal specification. Section \ref{typeclasses} gives an informal description of how type classes influence type inference.
adamc@535 659
adamc@558 660 We presuppose the existence of a function $\mathcal O$, where $\mathcal O(M, \overline{s})$ implements the $\mt{open}$ declaration by producing a context with the appropriate entry for each available component of module $M$ with signature items $\overline{s}$. Where possible, $\mathcal O$ uses ``transparent'' entries (e.g., an abstract type $M.x$ is mapped to $x :: \mt{Type} = M.x$), so that the relationship with $M$ is maintained. A related function $\mathcal O_c$ builds a context containing the disjointness constraints found in $\overline s$.
adamc@537 661 We write $\kappa_1^n \to \kappa$ as a shorthand, where $\kappa_1^0 \to \kappa = \kappa$ and $\kappa_1^{n+1} \to \kappa_2 = \kappa_1 \to (\kappa_1^n \to \kappa_2)$. We write $\mt{len}(\overline{y})$ for the length of vector $\overline{y}$ of variables.
adamc@535 662
adamc@535 663 $$\infer{\Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@535 664 \quad \infer{\Gamma \vdash d, \overline{d} \leadsto \Gamma''}{
adamc@535 665 \Gamma \vdash d \leadsto \Gamma'
adamc@535 666 & \Gamma' \vdash \overline{d} \leadsto \Gamma''
adamc@535 667 }$$
adamc@535 668
adamc@535 669 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leadsto \Gamma, x :: \kappa = c}{
adamc@535 670 \Gamma \vdash c :: \kappa
adamc@535 671 }
adamc@535 672 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leadsto \Gamma'}{
adamc@535 673 \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} \vdash \overline{dc} \leadsto \Gamma'
adamc@535 674 }$$
adamc@535 675
adamc@535 676 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leadsto \Gamma'}{
adamc@537 677 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 678 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@535 679 & \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} = M.z \vdash \overline{dc} \leadsto \Gamma'
adamc@535 680 }$$
adamc@535 681
adamc@535 682 $$\infer{\Gamma \vdash \mt{val} \; x : \tau = e \leadsto \Gamma, x : \tau}{
adamc@535 683 \Gamma \vdash e : \tau
adamc@535 684 }$$
adamc@535 685
adamc@535 686 $$\infer{\Gamma \vdash \mt{val} \; \mt{rec} \; \overline{x : \tau = e} \leadsto \Gamma, \overline{x : \tau}}{
adamc@535 687 \forall i: \Gamma, \overline{x : \tau} \vdash e_i : \tau_i
adamc@535 688 & \textrm{$e_i$ starts with an expression $\lambda$, optionally preceded by constructor and disjointness $\lambda$s}
adamc@535 689 }$$
adamc@535 690
adamc@535 691 $$\infer{\Gamma \vdash \mt{structure} \; X : S = M \leadsto \Gamma, X : S}{
adamc@535 692 \Gamma \vdash M : S
adamc@558 693 & \textrm{ $M$ not a constant or application}
adamc@535 694 }
adamc@558 695 \quad \infer{\Gamma \vdash \mt{structure} \; X : S = M \leadsto \Gamma, X : \mt{selfify}(X, \overline{s})}{
adamc@558 696 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@539 697 }$$
adamc@539 698
adamc@539 699 $$\infer{\Gamma \vdash \mt{signature} \; X = S \leadsto \Gamma, X = S}{
adamc@535 700 \Gamma \vdash S
adamc@535 701 }$$
adamc@535 702
adamc@537 703 $$\infer{\Gamma \vdash \mt{open} \; M \leadsto \Gamma, \mathcal O(M, \overline{s})}{
adamc@537 704 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@535 705 }$$
adamc@535 706
adamc@535 707 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leadsto \Gamma}{
adamc@535 708 \Gamma \vdash c_1 :: \{\kappa\}
adamc@535 709 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@535 710 & \Gamma \vdash c_1 \sim c_2
adamc@535 711 }
adamc@537 712 \quad \infer{\Gamma \vdash \mt{open} \; \mt{constraints} \; M \leadsto \Gamma, \mathcal O_c(M, \overline{s})}{
adamc@537 713 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@535 714 }$$
adamc@535 715
adamc@535 716 $$\infer{\Gamma \vdash \mt{table} \; x : c \leadsto \Gamma, x : \mt{Basis}.\mt{sql\_table} \; c}{
adamc@535 717 \Gamma \vdash c :: \{\mt{Type}\}
adamc@535 718 }
adamc@535 719 \quad \infer{\Gamma \vdash \mt{sequence} \; x \leadsto \Gamma, x : \mt{Basis}.\mt{sql\_sequence}}{}$$
adamc@535 720
adamc@535 721 $$\infer{\Gamma \vdash \mt{cookie} \; x : \tau \leadsto \Gamma, x : \mt{Basis}.\mt{http\_cookie} \; \tau}{
adamc@535 722 \Gamma \vdash \tau :: \mt{Type}
adamc@535 723 }$$
adamc@535 724
adamc@535 725 $$\infer{\Gamma \vdash \mt{class} \; x = c \leadsto \Gamma, x :: \mt{Type} \to \mt{Type} = c}{
adamc@535 726 \Gamma \vdash c :: \mt{Type} \to \mt{Type}
adamc@535 727 }$$
adamc@535 728
adamc@535 729 $$\infer{\overline{y}; x; \Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@535 730 \quad \infer{\overline{y}; x; \Gamma \vdash X \mid \overline{dc} \leadsto \Gamma', X : \overline{y ::: \mt{Type}} \to x \; \overline{y}}{
adamc@535 731 \overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'
adamc@535 732 }
adamc@535 733 \quad \infer{\overline{y}; x; \Gamma \vdash X \; \mt{of} \; \tau \mid \overline{dc} \leadsto \Gamma', X : \overline{y ::: \mt{Type}} \to \tau \to x \; \overline{y}}{
adamc@535 734 \overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'
adamc@535 735 }$$
adamc@535 736
adamc@537 737 \subsection{Signature Item Typing}
adamc@537 738
adamc@537 739 We appeal to a signature item analogue of the $\mathcal O$ function from the last subsection.
adamc@537 740
adamc@537 741 $$\infer{\Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@537 742 \quad \infer{\Gamma \vdash s, \overline{s} \leadsto \Gamma''}{
adamc@537 743 \Gamma \vdash s \leadsto \Gamma'
adamc@537 744 & \Gamma' \vdash \overline{s} \leadsto \Gamma''
adamc@537 745 }$$
adamc@537 746
adamc@537 747 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa \leadsto \Gamma, x :: \kappa}{}
adamc@537 748 \quad \infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leadsto \Gamma, x :: \kappa = c}{
adamc@537 749 \Gamma \vdash c :: \kappa
adamc@537 750 }
adamc@537 751 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leadsto \Gamma'}{
adamc@537 752 \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} \vdash \overline{dc} \leadsto \Gamma'
adamc@537 753 }$$
adamc@537 754
adamc@537 755 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leadsto \Gamma'}{
adamc@537 756 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 757 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537 758 & \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} = M.z \vdash \overline{dc} \leadsto \Gamma'
adamc@537 759 }$$
adamc@537 760
adamc@537 761 $$\infer{\Gamma \vdash \mt{val} \; x : \tau \leadsto \Gamma, x : \tau}{
adamc@537 762 \Gamma \vdash \tau :: \mt{Type}
adamc@537 763 }$$
adamc@537 764
adamc@537 765 $$\infer{\Gamma \vdash \mt{structure} \; X : S \leadsto \Gamma, X : S}{
adamc@537 766 \Gamma \vdash S
adamc@537 767 }
adamc@537 768 \quad \infer{\Gamma \vdash \mt{signature} \; X = S \leadsto \Gamma, X = S}{
adamc@537 769 \Gamma \vdash S
adamc@537 770 }$$
adamc@537 771
adamc@537 772 $$\infer{\Gamma \vdash \mt{include} \; S \leadsto \Gamma, \mathcal O(\overline{s})}{
adamc@537 773 \Gamma \vdash S
adamc@537 774 & \Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 775 }$$
adamc@537 776
adamc@537 777 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leadsto \Gamma, c_1 \sim c_2}{
adamc@537 778 \Gamma \vdash c_1 :: \{\kappa\}
adamc@537 779 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@537 780 }$$
adamc@537 781
adamc@537 782 $$\infer{\Gamma \vdash \mt{class} \; x = c \leadsto \Gamma, x :: \mt{Type} \to \mt{Type} = c}{
adamc@537 783 \Gamma \vdash c :: \mt{Type} \to \mt{Type}
adamc@537 784 }
adamc@537 785 \quad \infer{\Gamma \vdash \mt{class} \; x \leadsto \Gamma, x :: \mt{Type} \to \mt{Type}}{}$$
adamc@537 786
adamc@536 787 \subsection{Signature Compatibility}
adamc@536 788
adamc@558 789 To simplify the judgments in this section, we assume that all signatures are alpha-varied as necessary to avoid including multiple bindings for the same identifier. This is in addition to the usual alpha-variation of locally-bound variables.
adamc@537 790
adamc@537 791 We rely on a judgment $\Gamma \vdash \overline{s} \leq s'$, which expresses the occurrence in signature items $\overline{s}$ of an item compatible with $s'$. We also use a judgment $\Gamma \vdash \overline{dc} \leq \overline{dc}$, which expresses compatibility of datatype definitions.
adamc@537 792
adamc@536 793 $$\infer{\Gamma \vdash S \equiv S}{}
adamc@536 794 \quad \infer{\Gamma \vdash S_1 \equiv S_2}{
adamc@536 795 \Gamma \vdash S_2 \equiv S_1
adamc@536 796 }
adamc@536 797 \quad \infer{\Gamma \vdash X \equiv S}{
adamc@536 798 X = S \in \Gamma
adamc@536 799 }
adamc@536 800 \quad \infer{\Gamma \vdash M.X \equiv S}{
adamc@537 801 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 802 & \mt{proj}(M, \overline{s}, \mt{signature} \; X) = S
adamc@536 803 }$$
adamc@536 804
adamc@536 805 $$\infer{\Gamma \vdash S \; \mt{where} \; \mt{con} \; x = c \equiv \mt{sig} \; \overline{s^1} \; \mt{con} \; x :: \kappa = c \; \overline{s_2} \; \mt{end}}{
adamc@536 806 \Gamma \vdash S \equiv \mt{sig} \; \overline{s^1} \; \mt{con} \; x :: \kappa \; \overline{s_2} \; \mt{end}
adamc@536 807 & \Gamma \vdash c :: \kappa
adamc@537 808 }
adamc@537 809 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s^1} \; \mt{include} \; S \; \overline{s^2} \; \mt{end} \equiv \mt{sig} \; \overline{s^1} \; \overline{s} \; \overline{s^2} \; \mt{end}}{
adamc@537 810 \Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}
adamc@536 811 }$$
adamc@536 812
adamc@536 813 $$\infer{\Gamma \vdash S_1 \leq S_2}{
adamc@536 814 \Gamma \vdash S_1 \equiv S_2
adamc@536 815 }
adamc@536 816 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; \mt{end}}{}
adamc@537 817 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; s' \; \overline{s'} \; \mt{end}}{
adamc@537 818 \Gamma \vdash \overline{s} \leq s'
adamc@537 819 & \Gamma \vdash s' \leadsto \Gamma'
adamc@537 820 & \Gamma' \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; \overline{s'} \; \mt{end}
adamc@537 821 }$$
adamc@537 822
adamc@537 823 $$\infer{\Gamma \vdash s \; \overline{s} \leq s'}{
adamc@537 824 \Gamma \vdash s \leq s'
adamc@537 825 }
adamc@537 826 \quad \infer{\Gamma \vdash s \; \overline{s} \leq s'}{
adamc@537 827 \Gamma \vdash s \leadsto \Gamma'
adamc@537 828 & \Gamma' \vdash \overline{s} \leq s'
adamc@536 829 }$$
adamc@536 830
adamc@536 831 $$\infer{\Gamma \vdash \mt{functor} (X : S_1) : S_2 \leq \mt{functor} (X : S'_1) : S'_2}{
adamc@536 832 \Gamma \vdash S'_1 \leq S_1
adamc@536 833 & \Gamma, X : S'_1 \vdash S_2 \leq S'_2
adamc@536 834 }$$
adamc@536 835
adamc@537 836 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa \leq \mt{con} \; x :: \kappa}{}
adamc@537 837 \quad \infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leq \mt{con} \; x :: \kappa}{}
adamc@558 838 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leq \mt{con} \; x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type}}{}$$
adamc@537 839
adamc@537 840 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{con} \; x :: \mt{Type}^{\mt{len}(y)} \to \mt{Type}}{
adamc@537 841 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 842 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537 843 }$$
adamc@537 844
adamc@537 845 $$\infer{\Gamma \vdash \mt{class} \; x \leq \mt{con} \; x :: \mt{Type} \to \mt{Type}}{}
adamc@537 846 \quad \infer{\Gamma \vdash \mt{class} \; x = c \leq \mt{con} \; x :: \mt{Type} \to \mt{Type}}{}$$
adamc@537 847
adamc@537 848 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa = c_1 \leq \mt{con} \; x :: \mt{\kappa} = c_2}{
adamc@537 849 \Gamma \vdash c_1 \equiv c_2
adamc@537 850 }
adamc@537 851 \quad \infer{\Gamma \vdash \mt{class} \; x = c_1 \leq \mt{con} \; x :: \mt{Type} \to \mt{Type} = c_2}{
adamc@537 852 \Gamma \vdash c_1 \equiv c_2
adamc@537 853 }$$
adamc@537 854
adamc@537 855 $$\infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leq \mt{datatype} \; x \; \overline{y} = \overline{dc'}}{
adamc@537 856 \Gamma, \overline{y :: \mt{Type}} \vdash \overline{dc} \leq \overline{dc'}
adamc@537 857 }$$
adamc@537 858
adamc@537 859 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{datatype} \; x \; \overline{y} = \overline{dc'}}{
adamc@537 860 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537 861 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537 862 & \Gamma, \overline{y :: \mt{Type}} \vdash \overline{dc} \leq \overline{dc'}
adamc@537 863 }$$
adamc@537 864
adamc@537 865 $$\infer{\Gamma \vdash \cdot \leq \cdot}{}
adamc@537 866 \quad \infer{\Gamma \vdash X; \overline{dc} \leq X; \overline{dc'}}{
adamc@537 867 \Gamma \vdash \overline{dc} \leq \overline{dc'}
adamc@537 868 }
adamc@537 869 \quad \infer{\Gamma \vdash X \; \mt{of} \; \tau_1; \overline{dc} \leq X \; \mt{of} \; \tau_2; \overline{dc'}}{
adamc@537 870 \Gamma \vdash \tau_1 \equiv \tau_2
adamc@537 871 & \Gamma \vdash \overline{dc} \leq \overline{dc'}
adamc@537 872 }$$
adamc@537 873
adamc@537 874 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{datatype} \; x = \mt{datatype} \; M'.z'}{
adamc@537 875 \Gamma \vdash M.z \equiv M'.z'
adamc@537 876 }$$
adamc@537 877
adamc@537 878 $$\infer{\Gamma \vdash \mt{val} \; x : \tau_1 \leq \mt{val} \; x : \tau_2}{
adamc@537 879 \Gamma \vdash \tau_1 \equiv \tau_2
adamc@537 880 }
adamc@537 881 \quad \infer{\Gamma \vdash \mt{structure} \; X : S_1 \leq \mt{structure} \; X : S_2}{
adamc@537 882 \Gamma \vdash S_1 \leq S_2
adamc@537 883 }
adamc@537 884 \quad \infer{\Gamma \vdash \mt{signature} \; X = S_1 \leq \mt{signature} \; X = S_2}{
adamc@537 885 \Gamma \vdash S_1 \leq S_2
adamc@537 886 & \Gamma \vdash S_2 \leq S_1
adamc@537 887 }$$
adamc@537 888
adamc@537 889 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leq \mt{constraint} \; c'_1 \sim c'_2}{
adamc@537 890 \Gamma \vdash c_1 \equiv c'_1
adamc@537 891 & \Gamma \vdash c_2 \equiv c'_2
adamc@537 892 }$$
adamc@537 893
adamc@537 894 $$\infer{\Gamma \vdash \mt{class} \; x \leq \mt{class} \; x}{}
adamc@537 895 \quad \infer{\Gamma \vdash \mt{class} \; x = c \leq \mt{class} \; x}{}
adamc@537 896 \quad \infer{\Gamma \vdash \mt{class} \; x = c_1 \leq \mt{class} \; x = c_2}{
adamc@537 897 \Gamma \vdash c_1 \equiv c_2
adamc@537 898 }$$
adamc@537 899
adamc@538 900 \subsection{Module Typing}
adamc@538 901
adamc@538 902 We use a helper function $\mt{sigOf}$, which converts declarations and sequences of declarations into their principal signature items and sequences of signature items, respectively.
adamc@538 903
adamc@538 904 $$\infer{\Gamma \vdash M : S}{
adamc@538 905 \Gamma \vdash M : S'
adamc@538 906 & \Gamma \vdash S' \leq S
adamc@538 907 }
adamc@538 908 \quad \infer{\Gamma \vdash \mt{struct} \; \overline{d} \; \mt{end} : \mt{sig} \; \mt{sigOf}(\overline{d}) \; \mt{end}}{
adamc@538 909 \Gamma \vdash \overline{d} \leadsto \Gamma'
adamc@538 910 }
adamc@538 911 \quad \infer{\Gamma \vdash X : S}{
adamc@538 912 X : S \in \Gamma
adamc@538 913 }$$
adamc@538 914
adamc@538 915 $$\infer{\Gamma \vdash M.X : S}{
adamc@538 916 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@538 917 & \mt{proj}(M, \overline{s}, \mt{structure} \; X) = S
adamc@538 918 }$$
adamc@538 919
adamc@538 920 $$\infer{\Gamma \vdash M_1(M_2) : [X \mapsto M_2]S_2}{
adamc@538 921 \Gamma \vdash M_1 : \mt{functor}(X : S_1) : S_2
adamc@538 922 & \Gamma \vdash M_2 : S_1
adamc@538 923 }
adamc@538 924 \quad \infer{\Gamma \vdash \mt{functor} (X : S_1) : S_2 = M : \mt{functor} (X : S_1) : S_2}{
adamc@538 925 \Gamma \vdash S_1
adamc@538 926 & \Gamma, X : S_1 \vdash S_2
adamc@538 927 & \Gamma, X : S_1 \vdash M : S_2
adamc@538 928 }$$
adamc@538 929
adamc@538 930 \begin{eqnarray*}
adamc@538 931 \mt{sigOf}(\cdot) &=& \cdot \\
adamc@538 932 \mt{sigOf}(s \; \overline{s'}) &=& \mt{sigOf}(s) \; \mt{sigOf}(\overline{s'}) \\
adamc@538 933 \\
adamc@538 934 \mt{sigOf}(\mt{con} \; x :: \kappa = c) &=& \mt{con} \; x :: \kappa = c \\
adamc@538 935 \mt{sigOf}(\mt{datatype} \; x \; \overline{y} = \overline{dc}) &=& \mt{datatype} \; x \; \overline{y} = \overline{dc} \\
adamc@538 936 \mt{sigOf}(\mt{datatype} \; x = \mt{datatype} \; M.z) &=& \mt{datatype} \; x = \mt{datatype} \; M.z \\
adamc@538 937 \mt{sigOf}(\mt{val} \; x : \tau = e) &=& \mt{val} \; x : \tau \\
adamc@538 938 \mt{sigOf}(\mt{val} \; \mt{rec} \; \overline{x : \tau = e}) &=& \overline{\mt{val} \; x : \tau} \\
adamc@538 939 \mt{sigOf}(\mt{structure} \; X : S = M) &=& \mt{structure} \; X : S \\
adamc@538 940 \mt{sigOf}(\mt{signature} \; X = S) &=& \mt{signature} \; X = S \\
adamc@538 941 \mt{sigOf}(\mt{open} \; M) &=& \mt{include} \; S \textrm{ (where $\Gamma \vdash M : S$)} \\
adamc@538 942 \mt{sigOf}(\mt{constraint} \; c_1 \sim c_2) &=& \mt{constraint} \; c_1 \sim c_2 \\
adamc@538 943 \mt{sigOf}(\mt{open} \; \mt{constraints} \; M) &=& \cdot \\
adamc@538 944 \mt{sigOf}(\mt{table} \; x : c) &=& \mt{table} \; x : c \\
adamc@538 945 \mt{sigOf}(\mt{sequence} \; x) &=& \mt{sequence} \; x \\
adamc@538 946 \mt{sigOf}(\mt{cookie} \; x : \tau) &=& \mt{cookie} \; x : \tau \\
adamc@538 947 \mt{sigOf}(\mt{class} \; x = c) &=& \mt{class} \; x = c \\
adamc@538 948 \end{eqnarray*}
adamc@539 949 \begin{eqnarray*}
adamc@539 950 \mt{selfify}(M, \cdot) &=& \cdot \\
adamc@558 951 \mt{selfify}(M, s \; \overline{s'}) &=& \mt{selfify}(M, s) \; \mt{selfify}(M, \overline{s'}) \\
adamc@539 952 \\
adamc@539 953 \mt{selfify}(M, \mt{con} \; x :: \kappa) &=& \mt{con} \; x :: \kappa = M.x \\
adamc@539 954 \mt{selfify}(M, \mt{con} \; x :: \kappa = c) &=& \mt{con} \; x :: \kappa = c \\
adamc@539 955 \mt{selfify}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc}) &=& \mt{datatype} \; x \; \overline{y} = \mt{datatype} \; M.x \\
adamc@539 956 \mt{selfify}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z) &=& \mt{datatype} \; x = \mt{datatype} \; M'.z \\
adamc@539 957 \mt{selfify}(M, \mt{val} \; x : \tau) &=& \mt{val} \; x : \tau \\
adamc@539 958 \mt{selfify}(M, \mt{structure} \; X : S) &=& \mt{structure} \; X : \mt{selfify}(M.X, \overline{s}) \textrm{ (where $\Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}$)} \\
adamc@539 959 \mt{selfify}(M, \mt{signature} \; X = S) &=& \mt{signature} \; X = S \\
adamc@539 960 \mt{selfify}(M, \mt{include} \; S) &=& \mt{include} \; S \\
adamc@539 961 \mt{selfify}(M, \mt{constraint} \; c_1 \sim c_2) &=& \mt{constraint} \; c_1 \sim c_2 \\
adamc@539 962 \mt{selfify}(M, \mt{class} \; x) &=& \mt{class} \; x = M.x \\
adamc@539 963 \mt{selfify}(M, \mt{class} \; x = c) &=& \mt{class} \; x = c \\
adamc@539 964 \end{eqnarray*}
adamc@539 965
adamc@540 966 \subsection{Module Projection}
adamc@540 967
adamc@540 968 \begin{eqnarray*}
adamc@540 969 \mt{proj}(M, \mt{con} \; x :: \kappa \; \overline{s}, \mt{con} \; x) &=& \kappa \\
adamc@540 970 \mt{proj}(M, \mt{con} \; x :: \kappa = c \; \overline{s}, \mt{con} \; x) &=& (\kappa, c) \\
adamc@540 971 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{con} \; x) &=& \mt{Type}^{\mt{len}(\overline{y})} \to \mt{Type} \\
adamc@540 972 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, \mt{con} \; x) &=& (\mt{Type}^{\mt{len}(\overline{y})} \to \mt{Type}, M'.z) \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@540 973 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})$)} \\
adamc@540 974 \mt{proj}(M, \mt{class} \; x \; \overline{s}, \mt{con} \; x) &=& \mt{Type} \to \mt{Type} \\
adamc@540 975 \mt{proj}(M, \mt{class} \; x = c \; \overline{s}, \mt{con} \; x) &=& (\mt{Type} \to \mt{Type}, c) \\
adamc@540 976 \\
adamc@540 977 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{datatype} \; x) &=& (\overline{y}, \overline{dc}) \\
adamc@540 978 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, \mt{con} \; x) &=& \mt{proj}(M', \overline{s'}, \mt{datatype} \; z) \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$)} \\
adamc@540 979 \\
adamc@540 980 \mt{proj}(M, \mt{val} \; x : \tau \; \overline{s}, \mt{val} \; x) &=& \tau \\
adamc@540 981 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to M.x \; \overline y \textrm{ (where $X \in \overline{dc}$)} \\
adamc@540 982 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to \tau \to M.x \; \overline y \textrm{ (where $X \; \mt{of} \; \tau \in \overline{dc}$)} \\
adamc@540 983 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to M.x \; \overline y \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@540 984 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z = (\overline{y}, \overline{dc})$ and $X \in \overline{dc}$)} \\
adamc@540 985 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to \tau \to M.x \; \overline y \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@558 986 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z = (\overline{y}, \overline{dc})$ and $X \; \mt{of} \; \tau \in \overline{dc}$)} \\
adamc@540 987 \\
adamc@540 988 \mt{proj}(M, \mt{structure} \; X : S \; \overline{s}, \mt{structure} \; X) &=& S \\
adamc@540 989 \\
adamc@540 990 \mt{proj}(M, \mt{signature} \; X = S \; \overline{s}, \mt{signature} \; X) &=& S \\
adamc@540 991 \\
adamc@540 992 \mt{proj}(M, \mt{con} \; x :: \kappa \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 993 \mt{proj}(M, \mt{con} \; x :: \kappa = c \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 994 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 995 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 996 \mt{proj}(M, \mt{val} \; x : \tau \; \overline{s}, V) &=& \mt{proj}(M, \overline{s}, V) \\
adamc@540 997 \mt{proj}(M, \mt{structure} \; X : S \; \overline{s}, V) &=& [X \mapsto M.X]\mt{proj}(M, \overline{s}, V) \\
adamc@540 998 \mt{proj}(M, \mt{signature} \; X = S \; \overline{s}, V) &=& [X \mapsto M.X]\mt{proj}(M, \overline{s}, V) \\
adamc@540 999 \mt{proj}(M, \mt{include} \; S \; \overline{s}, V) &=& \mt{proj}(M, \overline{s'} \; \overline{s}, V) \textrm{ (where $\Gamma \vdash S \equiv \mt{sig} \; \overline{s'} \; \mt{end}$)} \\
adamc@540 1000 \mt{proj}(M, \mt{constraint} \; c_1 \sim c_2 \; \overline{s}, V) &=& \mt{proj}(M, \overline{s}, V) \\
adamc@540 1001 \mt{proj}(M, \mt{class} \; x \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 1002 \mt{proj}(M, \mt{class} \; x = c \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540 1003 \end{eqnarray*}
adamc@540 1004
adamc@541 1005
adamc@541 1006 \section{Type Inference}
adamc@541 1007
adamc@541 1008 The Ur/Web compiler uses \emph{heuristic type inference}, with no claims of completeness with respect to the declarative specification of the last section. The rules in use seem to work well in practice. This section summarizes those rules, to help Ur programmers predict what will work and what won't.
adamc@541 1009
adamc@541 1010 \subsection{Basic Unification}
adamc@541 1011
adamc@560 1012 Type-checkers for languages based on the Hindley-Milner type discipline, like ML and Haskell, take advantage of \emph{principal typing} properties, making complete type inference relatively straightforward. Inference algorithms are traditionally implemented using type unification variables, at various points asserting equalities between types, in the process discovering the values of type variables. The Ur/Web compiler uses the same basic strategy, but the complexity of the type system rules out easy completeness.
adamc@541 1013
adamc@560 1014 Type-checking can require evaluating recursive functional programs, thanks to the type-level $\mt{fold}$ operator. When a unification variable appears in such a type, the next step of computation can be undetermined. The value of that variable might be determined later, but this would be ``too late'' for the unification problems generated at the first occurrence. This is the essential source of incompleteness.
adamc@541 1015
adamc@541 1016 Nonetheless, the unification engine tends to do reasonably well. Unlike in ML, polymorphism is never inferred in definitions; it must be indicated explicitly by writing out constructor-level parameters. By writing these and other annotations, the programmer can generally get the type inference engine to do most of the type reconstruction work.
adamc@541 1017
adamc@541 1018 \subsection{Unifying Record Types}
adamc@541 1019
adamc@570 1020 The type inference engine tries to take advantage of the algebraic rules governing type-level records, as shown in Section \ref{definitional}. When two constructors of record kind are unified, they are reduced to normal forms, with like terms crossed off from each normal form until, hopefully, nothing remains. This cannot be complete, with the inclusion of unification variables. The type-checker can help you understand what goes wrong when the process fails, as it outputs the unmatched remainders of the two normal forms.
adamc@541 1021
adamc@541 1022 \subsection{\label{typeclasses}Type Classes}
adamc@541 1023
adamc@541 1024 Ur includes a type class facility inspired by Haskell's. The current version is very rudimentary, only supporting instances for particular types built up from abstract types and datatypes and type-level application.
adamc@541 1025
adamc@541 1026 Type classes are integrated with the module system. A type class is just a constructor of kind $\mt{Type} \to \mt{Type}$. By marking such a constructor $c$ as a type class, the programmer instructs the type inference engine to, in each scope, record all values of types $c \; \tau$ as \emph{instances}. Any function argument whose type is of such a form is treated as implicit, to be determined by examining the current instance database.
adamc@541 1027
adamc@541 1028 The ``dictionary encoding'' often used in Haskell implementations is made explicit in Ur. Type class instances are just properly-typed values, and they can also be considered as ``proofs'' of membership in the class. In some cases, it is useful to pass these proofs around explicitly. An underscore written where a proof is expected will also be inferred, if possible, from the current instance database.
adamc@541 1029
adamc@541 1030 Just as for types, type classes may be exported from modules, and they may be exported as concrete or abstract. Concrete type classes have their ``real'' definitions exposed, so that client code may add new instances freely. Abstract type classes are useful as ``predicates'' that can be used to enforce invariants, as we will see in some definitions of SQL syntax in the Ur/Web standard library.
adamc@541 1031
adamc@541 1032 \subsection{Reverse-Engineering Record Types}
adamc@541 1033
adamc@541 1034 It's useful to write Ur functions and functors that take record constructors as inputs, but these constructors can grow quite long, even though their values are often implied by other arguments. The compiler uses a simple heuristic to infer the values of unification variables that are folded over, yielding known results. Often, as in the case of $\mt{map}$-like folds, the base and recursive cases of a fold produce constructors with different top-level structure. Thus, if the result of the fold is known, examining its top-level structure reveals whether the record being folded over is empty or not. If it's empty, we're done; if it's not empty, we replace a single unification variable with a new constructor formed from three new unification variables, as in $[\alpha = \beta] \rc \gamma$. This process can often be repeated to determine a unification variable fully.
adamc@541 1035
adamc@541 1036 \subsection{Implicit Arguments in Functor Applications}
adamc@541 1037
adamc@541 1038 Constructor, constraint, and type class witness members of structures may be omitted, when those structures are used in contexts where their assigned signatures imply how to fill in those missing members. This feature combines well with reverse-engineering to allow for uses of complicated meta-programming functors with little more code than would be necessary to invoke an untyped, ad-hoc code generator.
adamc@541 1039
adamc@541 1040
adamc@542 1041 \section{The Ur Standard Library}
adamc@542 1042
adamc@542 1043 The built-in parts of the Ur/Web standard library are described by the signature in \texttt{lib/basis.urs} in the distribution. A module $\mt{Basis}$ ascribing to that signature is available in the initial environment, and every program is implicitly prefixed by $\mt{open} \; \mt{Basis}$.
adamc@542 1044
adamc@542 1045 Additionally, other common functions that are definable within Ur are included in \texttt{lib/top.urs} and \texttt{lib/top.ur}. This $\mt{Top}$ module is also opened implicitly.
adamc@542 1046
adamc@542 1047 The idea behind Ur is to serve as the ideal host for embedded domain-specific languages. For now, however, the ``generic'' functionality is intermixed with Ur/Web-specific functionality, including in these two library modules. We hope that these generic library components have types that speak for themselves. The next section introduces the Ur/Web-specific elements. Here, we only give the type declarations from the beginning of $\mt{Basis}$.
adamc@542 1048 $$\begin{array}{l}
adamc@542 1049 \mt{type} \; \mt{int} \\
adamc@542 1050 \mt{type} \; \mt{float} \\
adamc@542 1051 \mt{type} \; \mt{string} \\
adamc@542 1052 \mt{type} \; \mt{time} \\
adamc@542 1053 \\
adamc@542 1054 \mt{type} \; \mt{unit} = \{\} \\
adamc@542 1055 \\
adamc@542 1056 \mt{datatype} \; \mt{bool} = \mt{False} \mid \mt{True} \\
adamc@542 1057 \\
adamc@542 1058 \mt{datatype} \; \mt{option} \; \mt{t} = \mt{None} \mid \mt{Some} \; \mt{of} \; \mt{t}
adamc@542 1059 \end{array}$$
adamc@542 1060
adamc@542 1061
adamc@542 1062 \section{The Ur/Web Standard Library}
adamc@542 1063
adamc@542 1064 \subsection{Transactions}
adamc@542 1065
adamc@542 1066 Ur is a pure language; we use Haskell's trick to support controlled side effects. The standard library defines a monad $\mt{transaction}$, meant to stand for actions that may be undone cleanly. By design, no other kinds of actions are supported.
adamc@542 1067 $$\begin{array}{l}
adamc@542 1068 \mt{con} \; \mt{transaction} :: \mt{Type} \to \mt{Type} \\
adamc@542 1069 \\
adamc@542 1070 \mt{val} \; \mt{return} : \mt{t} ::: \mt{Type} \to \mt{t} \to \mt{transaction} \; \mt{t} \\
adamc@542 1071 \mt{val} \; \mt{bind} : \mt{t_1} ::: \mt{Type} \to \mt{t_2} ::: \mt{Type} \to \mt{transaction} \; \mt{t_1} \to (\mt{t_1} \to \mt{transaction} \; \mt{t_2}) \to \mt{transaction} \; \mt{t_2}
adamc@542 1072 \end{array}$$
adamc@542 1073
adamc@542 1074 \subsection{HTTP}
adamc@542 1075
adamc@542 1076 There are transactions for reading an HTTP header by name and for getting and setting strongly-typed cookies. Cookies may only be created by the $\mt{cookie}$ declaration form, ensuring that they be named consistently based on module structure.
adamc@542 1077 $$\begin{array}{l}
adamc@542 1078 \mt{val} \; \mt{requestHeader} : \mt{string} \to \mt{transaction} \; (\mt{option} \; \mt{string}) \\
adamc@542 1079 \\
adamc@542 1080 \mt{con} \; \mt{http\_cookie} :: \mt{Type} \to \mt{Type} \\
adamc@542 1081 \mt{val} \; \mt{getCookie} : \mt{t} ::: \mt{Type} \to \mt{http\_cookie} \; \mt{t} \to \mt{transaction} \; (\mt{option} \; \mt{t}) \\
adamc@542 1082 \mt{val} \; \mt{setCookie} : \mt{t} ::: \mt{Type} \to \mt{http\_cookie} \; \mt{t} \to \mt{t} \to \mt{transaction} \; \mt{unit}
adamc@542 1083 \end{array}$$
adamc@542 1084
adamc@543 1085 \subsection{SQL}
adamc@543 1086
adamc@543 1087 The fundamental unit of interest in the embedding of SQL is tables, described by a type family and creatable only via the $\mt{table}$ declaration form.
adamc@543 1088 $$\begin{array}{l}
adamc@543 1089 \mt{con} \; \mt{sql\_table} :: \{\mt{Type}\} \to \mt{Type}
adamc@543 1090 \end{array}$$
adamc@543 1091
adamc@543 1092 \subsubsection{Queries}
adamc@543 1093
adamc@543 1094 A final query is constructed via the $\mt{sql\_query}$ function. Constructor arguments respectively specify the table fields we select (as records mapping tables to the subsets of their fields that we choose) and the (always named) extra expressions that we select.
adamc@543 1095 $$\begin{array}{l}
adamc@543 1096 \mt{con} \; \mt{sql\_query} :: \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@543 1097 \mt{val} \; \mt{sql\_query} : \mt{tables} ::: \{\{\mt{Type}\}\} \\
adamc@543 1098 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543 1099 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@543 1100 \hspace{.1in} \to \{\mt{Rows} : \mt{sql\_query1} \; \mt{tables} \; \mt{selectedFields} \; \mt{selectedExps}, \\
adamc@543 1101 \hspace{.2in} \mt{OrderBy} : \mt{sql\_order\_by} \; \mt{tables} \; \mt{selectedExps}, \\
adamc@543 1102 \hspace{.2in} \mt{Limit} : \mt{sql\_limit}, \\
adamc@543 1103 \hspace{.2in} \mt{Offset} : \mt{sql\_offset}\} \\
adamc@543 1104 \hspace{.1in} \to \mt{sql\_query} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543 1105 \end{array}$$
adamc@543 1106
adamc@545 1107 Queries are used by folding over their results inside transactions.
adamc@545 1108 $$\begin{array}{l}
adamc@545 1109 \mt{val} \; \mt{query} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \lambda [\mt{tables} \sim \mt{exps}] \Rightarrow \mt{state} ::: \mt{Type} \to \mt{sql\_query} \; \mt{tables} \; \mt{exps} \\
adamc@545 1110 \hspace{.1in} \to (\$(\mt{exps} \rc \mt{fold} \; (\lambda \mt{nm} \; (\mt{fields} :: \{\mt{Type}\}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \$\mt{fields}] \rc \mt{acc}) \; [] \; \mt{tables}) \\
adamc@545 1111 \hspace{.2in} \to \mt{state} \to \mt{transaction} \; \mt{state}) \\
adamc@545 1112 \hspace{.1in} \to \mt{state} \to \mt{transaction} \; \mt{state}
adamc@545 1113 \end{array}$$
adamc@545 1114
adamc@543 1115 Most of the complexity of the query encoding is in the type $\mt{sql\_query1}$, which includes simple queries and derived queries based on relational operators. Constructor arguments respectively specify the tables we select from, the subset of fields that we keep from each table for the result rows, and the extra expressions that we select.
adamc@543 1116 $$\begin{array}{l}
adamc@543 1117 \mt{con} \; \mt{sql\_query1} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@543 1118 \\
adamc@543 1119 \mt{type} \; \mt{sql\_relop} \\
adamc@543 1120 \mt{val} \; \mt{sql\_union} : \mt{sql\_relop} \\
adamc@543 1121 \mt{val} \; \mt{sql\_intersect} : \mt{sql\_relop} \\
adamc@543 1122 \mt{val} \; \mt{sql\_except} : \mt{sql\_relop} \\
adamc@543 1123 \mt{val} \; \mt{sql\_relop} : \mt{tables1} ::: \{\{\mt{Type}\}\} \\
adamc@543 1124 \hspace{.1in} \to \mt{tables2} ::: \{\{\mt{Type}\}\} \\
adamc@543 1125 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543 1126 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@543 1127 \hspace{.1in} \to \mt{sql\_relop} \\
adamc@543 1128 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables1} \; \mt{selectedFields} \; \mt{selectedExps} \\
adamc@543 1129 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables2} \; \mt{selectedFields} \; \mt{selectedExps} \\
adamc@543 1130 \hspace{.1in} \to \mt{sql\_query1} \; \mt{selectedFields} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543 1131 \end{array}$$
adamc@543 1132
adamc@543 1133 $$\begin{array}{l}
adamc@543 1134 \mt{val} \; \mt{sql\_query1} : \mt{tables} ::: \{\{\mt{Type}\}\} \\
adamc@543 1135 \hspace{.1in} \to \mt{grouped} ::: \{\{\mt{Type}\}\} \\
adamc@543 1136 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543 1137 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@543 1138 \hspace{.1in} \to \{\mt{From} : \$(\mt{fold} \; (\lambda \mt{nm} \; (\mt{fields} :: \{\mt{Type}\}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{sql\_table} \; \mt{fields}] \rc \mt{acc}) \; [] \; \mt{tables}), \\
adamc@543 1139 \hspace{.2in} \mt{Where} : \mt{sql\_exp} \; \mt{tables} \; [] \; [] \; \mt{bool}, \\
adamc@543 1140 \hspace{.2in} \mt{GroupBy} : \mt{sql\_subset} \; \mt{tables} \; \mt{grouped}, \\
adamc@543 1141 \hspace{.2in} \mt{Having} : \mt{sql\_exp} \; \mt{grouped} \; \mt{tables} \; [] \; \mt{bool}, \\
adamc@543 1142 \hspace{.2in} \mt{SelectFields} : \mt{sql\_subset} \; \mt{grouped} \; \mt{selectedFields}, \\
adamc@543 1143 \hspace{.2in} \mt {SelectExps} : \$(\mt{fold} \; (\lambda \mt{nm} \; (\mt{t} :: \mt{Type}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{sql\_exp} \; \mt{grouped} \; \mt{tables} \; [] \; \mt{t}] \rc \mt{acc}) \; [] \; \mt{selectedExps}) \} \\
adamc@543 1144 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543 1145 \end{array}$$
adamc@543 1146
adamc@543 1147 To encode projection of subsets of fields in $\mt{SELECT}$ clauses, and to encode $\mt{GROUP} \; \mt{BY}$ clauses, we rely on a type family $\mt{sql\_subset}$, capturing what it means for one record of table fields to be a subset of another. The main constructor $\mt{sql\_subset}$ ``proves subset facts'' by requiring a split of a record into kept and dropped parts. The extra constructor $\mt{sql\_subset\_all}$ is a convenience for keeping all fields of a record.
adamc@543 1148 $$\begin{array}{l}
adamc@543 1149 \mt{con} \; \mt{sql\_subset} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \mt{Type} \\
adamc@543 1150 \mt{val} \; \mt{sql\_subset} : \mt{keep\_drop} :: \{(\{\mt{Type}\} \times \{\mt{Type}\})\} \\
adamc@543 1151 \hspace{.1in} \to \mt{sql\_subset} \\
adamc@543 1152 \hspace{.2in} (\mt{fold} \; (\lambda \mt{nm} \; (\mt{fields} :: (\{\mt{Type}\} \times \{\mt{Type}\})) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \; [\mt{fields}.1 \sim \mt{fields}.2] \Rightarrow \\
adamc@543 1153 \hspace{.3in} [\mt{nm} = \mt{fields}.1 \rc \mt{fields}.2] \rc \mt{acc}) \; [] \; \mt{keep\_drop}) \\
adamc@543 1154 \hspace{.2in} (\mt{fold} \; (\lambda \mt{nm} \; (\mt{fields} :: (\{\mt{Type}\} \times \{\mt{Type}\})) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{fields}.1] \rc \mt{acc}) \; [] \; \mt{keep\_drop}) \\
adamc@543 1155 \mt{val} \; \mt{sql\_subset\_all} : \mt{tables} :: \{\{\mt{Type}\}\} \to \mt{sql\_subset} \; \mt{tables} \; \mt{tables}
adamc@543 1156 \end{array}$$
adamc@543 1157
adamc@560 1158 SQL expressions are used in several places, including $\mt{SELECT}$, $\mt{WHERE}$, $\mt{HAVING}$, and $\mt{ORDER} \; \mt{BY}$ clauses. They reify a fragment of the standard SQL expression language, while making it possible to inject ``native'' Ur values in some places. The arguments to the $\mt{sql\_exp}$ type family respectively give the unrestricted-availability table fields, the table fields that may only be used in arguments to aggregate functions, the available selected expressions, and the type of the expression.
adamc@543 1159 $$\begin{array}{l}
adamc@543 1160 \mt{con} \; \mt{sql\_exp} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \to \mt{Type}
adamc@543 1161 \end{array}$$
adamc@543 1162
adamc@543 1163 Any field in scope may be converted to an expression.
adamc@543 1164 $$\begin{array}{l}
adamc@543 1165 \mt{val} \; \mt{sql\_field} : \mt{otherTabs} ::: \{\{\mt{Type}\}\} \to \mt{otherFields} ::: \{\mt{Type}\} \\
adamc@543 1166 \hspace{.1in} \to \mt{fieldType} ::: \mt{Type} \to \mt{agg} ::: \{\{\mt{Type}\}\} \\
adamc@543 1167 \hspace{.1in} \to \mt{exps} ::: \{\mt{Type}\} \\
adamc@543 1168 \hspace{.1in} \to \mt{tab} :: \mt{Name} \to \mt{field} :: \mt{Name} \\
adamc@543 1169 \hspace{.1in} \to \mt{sql\_exp} \; ([\mt{tab} = [\mt{field} = \mt{fieldType}] \rc \mt{otherFields}] \rc \mt{otherTabs}) \; \mt{agg} \; \mt{exps} \; \mt{fieldType}
adamc@543 1170 \end{array}$$
adamc@543 1171
adamc@544 1172 There is an analogous function for referencing named expressions.
adamc@544 1173 $$\begin{array}{l}
adamc@544 1174 \mt{val} \; \mt{sql\_exp} : \mt{tabs} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{t} ::: \mt{Type} \to \mt{rest} ::: \{\mt{Type}\} \to \mt{nm} :: \mt{Name} \\
adamc@544 1175 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tabs} \; \mt{agg} \; ([\mt{nm} = \mt{t}] \rc \mt{rest}) \; \mt{t}
adamc@544 1176 \end{array}$$
adamc@544 1177
adamc@544 1178 Ur values of appropriate types may be injected into SQL expressions.
adamc@544 1179 $$\begin{array}{l}
adamc@544 1180 \mt{class} \; \mt{sql\_injectable} \\
adamc@544 1181 \mt{val} \; \mt{sql\_bool} : \mt{sql\_injectable} \; \mt{bool} \\
adamc@544 1182 \mt{val} \; \mt{sql\_int} : \mt{sql\_injectable} \; \mt{int} \\
adamc@544 1183 \mt{val} \; \mt{sql\_float} : \mt{sql\_injectable} \; \mt{float} \\
adamc@544 1184 \mt{val} \; \mt{sql\_string} : \mt{sql\_injectable} \; \mt{string} \\
adamc@544 1185 \mt{val} \; \mt{sql\_time} : \mt{sql\_injectable} \; \mt{time} \\
adamc@544 1186 \mt{val} \; \mt{sql\_option\_bool} : \mt{sql\_injectable} \; (\mt{option} \; \mt{bool}) \\
adamc@544 1187 \mt{val} \; \mt{sql\_option\_int} : \mt{sql\_injectable} \; (\mt{option} \; \mt{int}) \\
adamc@544 1188 \mt{val} \; \mt{sql\_option\_float} : \mt{sql\_injectable} \; (\mt{option} \; \mt{float}) \\
adamc@544 1189 \mt{val} \; \mt{sql\_option\_string} : \mt{sql\_injectable} \; (\mt{option} \; \mt{string}) \\
adamc@544 1190 \mt{val} \; \mt{sql\_option\_time} : \mt{sql\_injectable} \; (\mt{option} \; \mt{time}) \\
adamc@544 1191 \mt{val} \; \mt{sql\_inject} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \to \mt{sql\_injectable} \; \mt{t} \\
adamc@544 1192 \hspace{.1in} \to \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t}
adamc@544 1193 \end{array}$$
adamc@544 1194
adamc@544 1195 We have the SQL nullness test, which is necessary because of the strange SQL semantics of equality in the presence of null values.
adamc@544 1196 $$\begin{array}{l}
adamc@544 1197 \mt{val} \; \mt{sql\_is\_null} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1198 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; (\mt{option} \; \mt{t}) \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{bool}
adamc@544 1199 \end{array}$$
adamc@544 1200
adamc@559 1201 We have generic nullary, unary, and binary operators.
adamc@544 1202 $$\begin{array}{l}
adamc@544 1203 \mt{con} \; \mt{sql\_nfunc} :: \mt{Type} \to \mt{Type} \\
adamc@544 1204 \mt{val} \; \mt{sql\_current\_timestamp} : \mt{sql\_nfunc} \; \mt{time} \\
adamc@544 1205 \mt{val} \; \mt{sql\_nfunc} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1206 \hspace{.1in} \to \mt{sql\_nfunc} \; \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t} \\\end{array}$$
adamc@544 1207
adamc@544 1208 $$\begin{array}{l}
adamc@544 1209 \mt{con} \; \mt{sql\_unary} :: \mt{Type} \to \mt{Type} \to \mt{Type} \\
adamc@544 1210 \mt{val} \; \mt{sql\_not} : \mt{sql\_unary} \; \mt{bool} \; \mt{bool} \\
adamc@544 1211 \mt{val} \; \mt{sql\_unary} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{arg} ::: \mt{Type} \to \mt{res} ::: \mt{Type} \\
adamc@544 1212 \hspace{.1in} \to \mt{sql\_unary} \; \mt{arg} \; \mt{res} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{res} \\
adamc@544 1213 \end{array}$$
adamc@544 1214
adamc@544 1215 $$\begin{array}{l}
adamc@544 1216 \mt{con} \; \mt{sql\_binary} :: \mt{Type} \to \mt{Type} \to \mt{Type} \to \mt{Type} \\
adamc@544 1217 \mt{val} \; \mt{sql\_and} : \mt{sql\_binary} \; \mt{bool} \; \mt{bool} \; \mt{bool} \\
adamc@544 1218 \mt{val} \; \mt{sql\_or} : \mt{sql\_binary} \; \mt{bool} \; \mt{bool} \; \mt{bool} \\
adamc@544 1219 \mt{val} \; \mt{sql\_binary} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{arg_1} ::: \mt{Type} \to \mt{arg_2} ::: \mt{Type} \to \mt{res} ::: \mt{Type} \\
adamc@544 1220 \hspace{.1in} \to \mt{sql\_binary} \; \mt{arg_1} \; \mt{arg_2} \; \mt{res} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg_1} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg_2} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{res}
adamc@544 1221 \end{array}$$
adamc@544 1222
adamc@544 1223 $$\begin{array}{l}
adamc@559 1224 \mt{class} \; \mt{sql\_arith} \\
adamc@559 1225 \mt{val} \; \mt{sql\_int\_arith} : \mt{sql\_arith} \; \mt{int} \\
adamc@559 1226 \mt{val} \; \mt{sql\_float\_arith} : \mt{sql\_arith} \; \mt{float} \\
adamc@559 1227 \mt{val} \; \mt{sql\_neg} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_unary} \; \mt{t} \; \mt{t} \\
adamc@559 1228 \mt{val} \; \mt{sql\_plus} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_binary} \; \mt{t} \; \mt{t} \; \mt{t} \\
adamc@559 1229 \mt{val} \; \mt{sql\_minus} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_binary} \; \mt{t} \; \mt{t} \; \mt{t} \\
adamc@559 1230 \mt{val} \; \mt{sql\_times} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_binary} \; \mt{t} \; \mt{t} \; \mt{t} \\
adamc@559 1231 \mt{val} \; \mt{sql\_div} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_binary} \; \mt{t} \; \mt{t} \; \mt{t} \\
adamc@559 1232 \mt{val} \; \mt{sql\_mod} : \mt{sql\_binary} \; \mt{int} \; \mt{int} \; \mt{int}
adamc@559 1233 \end{array}$$
adamc@544 1234
adamc@544 1235 Finally, we have aggregate functions. The $\mt{COUNT(\ast)}$ syntax is handled specially, since it takes no real argument. The other aggregate functions are placed into a general type family, using type classes to restrict usage to properly-typed arguments. The key aspect of the $\mt{sql\_aggregate}$ function's type is the shift of aggregate-function-only fields into unrestricted fields.
adamc@544 1236
adamc@544 1237 $$\begin{array}{l}
adamc@544 1238 \mt{val} \; \mt{sql\_count} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{int}
adamc@544 1239 \end{array}$$
adamc@544 1240
adamc@544 1241 $$\begin{array}{l}
adamc@544 1242 \mt{con} \; \mt{sql\_aggregate} :: \mt{Type} \to \mt{Type} \\
adamc@544 1243 \mt{val} \; \mt{sql\_aggregate} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1244 \hspace{.1in} \to \mt{sql\_aggregate} \; \mt{t} \to \mt{sql\_exp} \; \mt{agg} \; \mt{agg} \; \mt{exps} \; \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t}
adamc@544 1245 \end{array}$$
adamc@544 1246
adamc@544 1247 $$\begin{array}{l}
adamc@544 1248 \mt{class} \; \mt{sql\_summable} \\
adamc@544 1249 \mt{val} \; \mt{sql\_summable\_int} : \mt{sql\_summable} \; \mt{int} \\
adamc@544 1250 \mt{val} \; \mt{sql\_summable\_float} : \mt{sql\_summable} \; \mt{float} \\
adamc@544 1251 \mt{val} \; \mt{sql\_avg} : \mt{t} ::: \mt{Type} \to \mt{sql\_summable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t} \\
adamc@544 1252 \mt{val} \; \mt{sql\_sum} : \mt{t} ::: \mt{Type} \to \mt{sql\_summable} \mt{t} \to \mt{sql\_aggregate} \; \mt{t}
adamc@544 1253 \end{array}$$
adamc@544 1254
adamc@544 1255 $$\begin{array}{l}
adamc@544 1256 \mt{class} \; \mt{sql\_maxable} \\
adamc@544 1257 \mt{val} \; \mt{sql\_maxable\_int} : \mt{sql\_maxable} \; \mt{int} \\
adamc@544 1258 \mt{val} \; \mt{sql\_maxable\_float} : \mt{sql\_maxable} \; \mt{float} \\
adamc@544 1259 \mt{val} \; \mt{sql\_maxable\_string} : \mt{sql\_maxable} \; \mt{string} \\
adamc@544 1260 \mt{val} \; \mt{sql\_maxable\_time} : \mt{sql\_maxable} \; \mt{time} \\
adamc@544 1261 \mt{val} \; \mt{sql\_max} : \mt{t} ::: \mt{Type} \to \mt{sql\_maxable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t} \\
adamc@544 1262 \mt{val} \; \mt{sql\_min} : \mt{t} ::: \mt{Type} \to \mt{sql\_maxable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t}
adamc@544 1263 \end{array}$$
adamc@544 1264
adamc@544 1265 We wrap up the definition of query syntax with the types used in representing $\mt{ORDER} \; \mt{BY}$, $\mt{LIMIT}$, and $\mt{OFFSET}$ clauses.
adamc@544 1266 $$\begin{array}{l}
adamc@544 1267 \mt{type} \; \mt{sql\_direction} \\
adamc@544 1268 \mt{val} \; \mt{sql\_asc} : \mt{sql\_direction} \\
adamc@544 1269 \mt{val} \; \mt{sql\_desc} : \mt{sql\_direction} \\
adamc@544 1270 \\
adamc@544 1271 \mt{con} \; \mt{sql\_order\_by} :: \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@544 1272 \mt{val} \; \mt{sql\_order\_by\_Nil} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} :: \{\mt{Type}\} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \\
adamc@544 1273 \mt{val} \; \mt{sql\_order\_by\_Cons} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544 1274 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tables} \; [] \; \mt{exps} \; \mt{t} \to \mt{sql\_direction} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \\
adamc@544 1275 \\
adamc@544 1276 \mt{type} \; \mt{sql\_limit} \\
adamc@544 1277 \mt{val} \; \mt{sql\_no\_limit} : \mt{sql\_limit} \\
adamc@544 1278 \mt{val} \; \mt{sql\_limit} : \mt{int} \to \mt{sql\_limit} \\
adamc@544 1279 \\
adamc@544 1280 \mt{type} \; \mt{sql\_offset} \\
adamc@544 1281 \mt{val} \; \mt{sql\_no\_offset} : \mt{sql\_offset} \\
adamc@544 1282 \mt{val} \; \mt{sql\_offset} : \mt{int} \to \mt{sql\_offset}
adamc@544 1283 \end{array}$$
adamc@544 1284
adamc@545 1285
adamc@545 1286 \subsubsection{DML}
adamc@545 1287
adamc@545 1288 The Ur/Web library also includes an embedding of a fragment of SQL's DML, the Data Manipulation Language, for modifying database tables. Any piece of DML may be executed in a transaction.
adamc@545 1289
adamc@545 1290 $$\begin{array}{l}
adamc@545 1291 \mt{type} \; \mt{dml} \\
adamc@545 1292 \mt{val} \; \mt{dml} : \mt{dml} \to \mt{transaction} \; \mt{unit}
adamc@545 1293 \end{array}$$
adamc@545 1294
adamc@545 1295 Properly-typed records may be used to form $\mt{INSERT}$ commands.
adamc@545 1296 $$\begin{array}{l}
adamc@545 1297 \mt{val} \; \mt{insert} : \mt{fields} ::: \{\mt{Type}\} \to \mt{sql\_table} \; \mt{fields} \\
adamc@545 1298 \hspace{.1in} \to \$(\mt{fold} \; (\lambda \mt{nm} \; (\mt{t} :: \mt{Type}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{sql\_exp} \; [] \; [] \; [] \; \mt{t}] \rc \mt{acc}) \; [] \; \mt{fields}) \to \mt{dml}
adamc@545 1299 \end{array}$$
adamc@545 1300
adamc@545 1301 An $\mt{UPDATE}$ command is formed from a choice of which table fields to leave alone and which to change, along with an expression to use to compute the new value of each changed field and a $\mt{WHERE}$ clause.
adamc@545 1302 $$\begin{array}{l}
adamc@545 1303 \mt{val} \; \mt{update} : \mt{unchanged} ::: \{\mt{Type}\} \to \mt{changed} :: \{\mt{Type}\} \to \lambda [\mt{changed} \sim \mt{unchanged}] \\
adamc@545 1304 \hspace{.1in} \Rightarrow \$(\mt{fold} \; (\lambda \mt{nm} \; (\mt{t} :: \mt{Type}) \; \mt{acc} \; [[\mt{nm}] \sim \mt{acc}] \Rightarrow [\mt{nm} = \mt{sql\_exp} \; [\mt{T} = \mt{changed} \rc \mt{unchanged}] \; [] \; [] \; \mt{t}] \rc \mt{acc}) \; [] \; \mt{changed}) \\
adamc@545 1305 \hspace{.1in} \to \mt{sql\_table} \; (\mt{changed} \rc \mt{unchanged}) \to \mt{sql\_exp} \; [\mt{T} = \mt{changed} \rc \mt{unchanged}] \; [] \; [] \; \mt{bool} \to \mt{dml}
adamc@545 1306 \end{array}$$
adamc@545 1307
adamc@545 1308 A $\mt{DELETE}$ command is formed from a table and a $\mt{WHERE}$ clause.
adamc@545 1309 $$\begin{array}{l}
adamc@545 1310 \mt{val} \; \mt{delete} : \mt{fields} ::: \{\mt{Type}\} \to \mt{sql\_table} \; \mt{fields} \to \mt{sql\_exp} \; [\mt{T} = \mt{fields}] \; [] \; [] \; \mt{bool} \to \mt{dml}
adamc@545 1311 \end{array}$$
adamc@545 1312
adamc@546 1313 \subsubsection{Sequences}
adamc@546 1314
adamc@546 1315 SQL sequences are counters with concurrency control, often used to assign unique IDs. Ur/Web supports them via a simple interface. The only way to create a sequence is with the $\mt{sequence}$ declaration form.
adamc@546 1316
adamc@546 1317 $$\begin{array}{l}
adamc@546 1318 \mt{type} \; \mt{sql\_sequence} \\
adamc@546 1319 \mt{val} \; \mt{nextval} : \mt{sql\_sequence} \to \mt{transaction} \; \mt{int}
adamc@546 1320 \end{array}$$
adamc@546 1321
adamc@546 1322
adamc@547 1323 \subsection{XML}
adamc@547 1324
adamc@547 1325 Ur/Web's library contains an encoding of XML syntax and semantic constraints. We make no effort to follow the standards governing XML schemas. Rather, XML fragments are viewed more as values of ML datatypes, and we only track which tags are allowed inside which other tags.
adamc@547 1326
adamc@547 1327 The basic XML type family has arguments respectively indicating the \emph{context} of a fragment, the fields that the fragment expects to be bound on entry (and their types), and the fields that the fragment will bind (and their types). Contexts are a record-based ``poor man's subtyping'' encoding, with each possible set of valid tags corresponding to a different context record. The arguments dealing with field binding are only relevant to HTML forms.
adamc@547 1328 $$\begin{array}{l}
adamc@547 1329 \mt{con} \; \mt{xml} :: \{\mt{Unit}\} \to \{\mt{Type}\} \to \{\mt{Type}\} \to \mt{Type}
adamc@547 1330 \end{array}$$
adamc@547 1331
adamc@547 1332 We also have a type family of XML tags, indexed respectively by the record of optional attributes accepted by the tag, the context in which the tag may be placed, the context required of children of the tag, which form fields the tag uses, and which fields the tag defines.
adamc@547 1333 $$\begin{array}{l}
adamc@547 1334 \mt{con} \; \mt{tag} :: \{\mt{Type}\} \to \{\mt{Unit}\} \to \{\mt{Unit}\} \to \{\mt{Type}\} \to \{\mt{Type}\} \to \mt{Type}
adamc@547 1335 \end{array}$$
adamc@547 1336
adamc@547 1337 Literal text may be injected into XML as ``CDATA.''
adamc@547 1338 $$\begin{array}{l}
adamc@547 1339 \mt{val} \; \mt{cdata} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use} ::: \{\mt{Type}\} \to \mt{string} \to \mt{xml} \; \mt{ctx} \; \mt{use} \; []
adamc@547 1340 \end{array}$$
adamc@547 1341
adamc@547 1342 There is a function for producing an XML tree with a particular tag at its root.
adamc@547 1343 $$\begin{array}{l}
adamc@547 1344 \mt{val} \; \mt{tag} : \mt{attrsGiven} ::: \{\mt{Type}\} \to \mt{attrsAbsent} ::: \{\mt{Type}\} \to \mt{ctxOuter} ::: \{\mt{Unit}\} \to \mt{ctxInner} ::: \{\mt{Unit}\} \\
adamc@547 1345 \hspace{.1in} \to \mt{useOuter} ::: \{\mt{Type}\} \to \mt{useInner} ::: \{\mt{Type}\} \to \mt{bindOuter} ::: \{\mt{Type}\} \to \mt{bindInner} ::: \{\mt{Type}\} \\
adamc@547 1346 \hspace{.1in} \to \lambda [\mt{attrsGiven} \sim \mt{attrsAbsent}] \; [\mt{useOuter} \sim \mt{useInner}] \; [\mt{bindOuter} \sim \mt{bindInner}] \Rightarrow \$\mt{attrsGiven} \\
adamc@547 1347 \hspace{.1in} \to \mt{tag} \; (\mt{attrsGiven} \rc \mt{attrsAbsent}) \; \mt{ctxOuter} \; \mt{ctxInner} \; \mt{useOuter} \; \mt{bindOuter} \\
adamc@547 1348 \hspace{.1in} \to \mt{xml} \; \mt{ctxInner} \; \mt{useInner} \; \mt{bindInner} \to \mt{xml} \; \mt{ctxOuter} \; (\mt{useOuter} \rc \mt{useInner}) \; (\mt{bindOuter} \rc \mt{bindInner})
adamc@547 1349 \end{array}$$
adamc@547 1350
adamc@547 1351 Two XML fragments may be concatenated.
adamc@547 1352 $$\begin{array}{l}
adamc@547 1353 \mt{val} \; \mt{join} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use_1} ::: \{\mt{Type}\} \to \mt{bind_1} ::: \{\mt{Type}\} \to \mt{bind_2} ::: \{\mt{Type}\} \\
adamc@547 1354 \hspace{.1in} \to \lambda [\mt{use_1} \sim \mt{bind_1}] \; [\mt{bind_1} \sim \mt{bind_2}] \\
adamc@547 1355 \hspace{.1in} \Rightarrow \mt{xml} \; \mt{ctx} \; \mt{use_1} \; \mt{bind_1} \to \mt{xml} \; \mt{ctx} \; (\mt{use_1} \rc \mt{bind_1}) \; \mt{bind_2} \to \mt{xml} \; \mt{ctx} \; \mt{use_1} \; (\mt{bind_1} \rc \mt{bind_2})
adamc@547 1356 \end{array}$$
adamc@547 1357
adamc@547 1358 Finally, any XML fragment may be updated to ``claim'' to use more form fields than it does.
adamc@547 1359 $$\begin{array}{l}
adamc@547 1360 \mt{val} \; \mt{useMore} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use_1} ::: \{\mt{Type}\} \to \mt{use_2} ::: \{\mt{Type}\} \to \mt{bind} ::: \{\mt{Type}\} \to \lambda [\mt{use_1} \sim \mt{use_2}] \\
adamc@547 1361 \hspace{.1in} \Rightarrow \mt{xml} \; \mt{ctx} \; \mt{use_1} \; \mt{bind} \to \mt{xml} \; \mt{ctx} \; (\mt{use_1} \rc \mt{use_2}) \; \mt{bind}
adamc@547 1362 \end{array}$$
adamc@547 1363
adamc@547 1364 We will not list here the different HTML tags and related functions from the standard library. They should be easy enough to understand from the code in \texttt{basis.urs}. The set of tags in the library is not yet claimed to be complete for HTML standards.
adamc@547 1365
adamc@547 1366 One last useful function is for aborting any page generation, returning some XML as an error message. This function takes the place of some uses of a general exception mechanism.
adamc@547 1367 $$\begin{array}{l}
adamc@547 1368 \mt{val} \; \mt{error} : \mt{t} ::: \mt{Type} \to \mt{xml} \; [\mt{Body}] \; [] \; [] \to \mt{t}
adamc@547 1369 \end{array}$$
adamc@547 1370
adamc@549 1371
adamc@549 1372 \section{Ur/Web Syntax Extensions}
adamc@549 1373
adamc@549 1374 Ur/Web features some syntactic shorthands for building values using the functions from the last section. This section sketches the grammar of those extensions. We write spans of syntax inside brackets to indicate that they are optional.
adamc@549 1375
adamc@549 1376 \subsection{SQL}
adamc@549 1377
adamc@549 1378 \subsubsection{Queries}
adamc@549 1379
adamc@550 1380 Queries $Q$ are added to the rules for expressions $e$.
adamc@550 1381
adamc@549 1382 $$\begin{array}{rrcll}
adamc@550 1383 \textrm{Queries} & Q &::=& (q \; [\mt{ORDER} \; \mt{BY} \; (E \; [o],)^+] \; [\mt{LIMIT} \; N] \; [\mt{OFFSET} \; N]) \\
adamc@549 1384 \textrm{Pre-queries} & q &::=& \mt{SELECT} \; P \; \mt{FROM} \; T,^+ \; [\mt{WHERE} \; E] \; [\mt{GROUP} \; \mt{BY} \; p,^+] \; [\mt{HAVING} \; E] \\
adamc@549 1385 &&& \mid q \; R \; q \\
adamc@549 1386 \textrm{Relational operators} & R &::=& \mt{UNION} \mid \mt{INTERSECT} \mid \mt{EXCEPT}
adamc@549 1387 \end{array}$$
adamc@549 1388
adamc@549 1389 $$\begin{array}{rrcll}
adamc@549 1390 \textrm{Projections} & P &::=& \ast & \textrm{all columns} \\
adamc@549 1391 &&& p,^+ & \textrm{particular columns} \\
adamc@549 1392 \textrm{Pre-projections} & p &::=& t.f & \textrm{one column from a table} \\
adamc@558 1393 &&& t.\{\{c\}\} & \textrm{a record of columns from a table (of kind $\{\mt{Type}\}$)} \\
adamc@549 1394 \textrm{Table names} & t &::=& x & \textrm{constant table name (automatically capitalized)} \\
adamc@549 1395 &&& X & \textrm{constant table name} \\
adamc@549 1396 &&& \{\{c\}\} & \textrm{computed table name (of kind $\mt{Name}$)} \\
adamc@549 1397 \textrm{Column names} & f &::=& X & \textrm{constant column name} \\
adamc@549 1398 &&& \{c\} & \textrm{computed column name (of kind $\mt{Name}$)} \\
adamc@549 1399 \textrm{Tables} & T &::=& x & \textrm{table variable, named locally by its own capitalization} \\
adamc@549 1400 &&& x \; \mt{AS} \; t & \textrm{table variable, with local name} \\
adamc@549 1401 &&& \{\{e\}\} \; \mt{AS} \; t & \textrm{computed table expression, with local name} \\
adamc@549 1402 \textrm{SQL expressions} & E &::=& p & \textrm{column references} \\
adamc@549 1403 &&& X & \textrm{named expression references} \\
adamc@549 1404 &&& \{\{e\}\} & \textrm{injected native Ur expressions} \\
adamc@549 1405 &&& \{e\} & \textrm{computed expressions, probably using $\mt{sql\_exp}$ directly} \\
adamc@549 1406 &&& \mt{TRUE} \mid \mt{FALSE} & \textrm{boolean constants} \\
adamc@549 1407 &&& \ell & \textrm{primitive type literals} \\
adamc@549 1408 &&& \mt{NULL} & \textrm{null value (injection of $\mt{None}$)} \\
adamc@549 1409 &&& E \; \mt{IS} \; \mt{NULL} & \textrm{nullness test} \\
adamc@549 1410 &&& n & \textrm{nullary operators} \\
adamc@549 1411 &&& u \; E & \textrm{unary operators} \\
adamc@549 1412 &&& E \; b \; E & \textrm{binary operators} \\
adamc@549 1413 &&& \mt{COUNT}(\ast) & \textrm{count number of rows} \\
adamc@549 1414 &&& a(E) & \textrm{other aggregate function} \\
adamc@549 1415 &&& (E) & \textrm{explicit precedence} \\
adamc@549 1416 \textrm{Nullary operators} & n &::=& \mt{CURRENT\_TIMESTAMP} \\
adamc@549 1417 \textrm{Unary operators} & u &::=& \mt{NOT} \\
adamc@549 1418 \textrm{Binary operators} & b &::=& \mt{AND} \mid \mt{OR} \mid \neq \mid < \mid \leq \mid > \mid \geq \\
adamc@549 1419 \textrm{Aggregate functions} & a &::=& \mt{AVG} \mid \mt{SUM} \mid \mt{MIN} \mid \mt{MAX} \\
adamc@550 1420 \textrm{Directions} & o &::=& \mt{ASC} \mid \mt{DESC} \\
adamc@549 1421 \textrm{SQL integer} & N &::=& n \mid \{e\} \\
adamc@549 1422 \end{array}$$
adamc@549 1423
adamc@549 1424 Additionally, an SQL expression may be inserted into normal Ur code with the syntax $(\mt{SQL} \; E)$ or $(\mt{WHERE} \; E)$.
adamc@549 1425
adamc@550 1426 \subsubsection{DML}
adamc@550 1427
adamc@550 1428 DML commands $D$ are added to the rules for expressions $e$.
adamc@550 1429
adamc@550 1430 $$\begin{array}{rrcll}
adamc@550 1431 \textrm{Commands} & D &::=& (\mt{INSERT} \; \mt{INTO} \; T^E \; (f,^+) \; \mt{VALUES} \; (E,^+)) \\
adamc@550 1432 &&& (\mt{UPDATE} \; T^E \; \mt{SET} \; (f = E,)^+ \; \mt{WHERE} \; E) \\
adamc@550 1433 &&& (\mt{DELETE} \; \mt{FROM} \; T^E \; \mt{WHERE} \; E) \\
adamc@550 1434 \textrm{Table expressions} & T^E &::=& x \mid \{\{e\}\}
adamc@550 1435 \end{array}$$
adamc@550 1436
adamc@550 1437 Inside $\mt{UPDATE}$ and $\mt{DELETE}$ commands, lone variables $X$ are interpreted as references to columns of the implicit table $\mt{T}$, rather than to named expressions.
adamc@549 1438
adamc@551 1439 \subsection{XML}
adamc@551 1440
adamc@551 1441 XML fragments $L$ are added to the rules for expressions $e$.
adamc@551 1442
adamc@551 1443 $$\begin{array}{rrcll}
adamc@551 1444 \textrm{XML fragments} & L &::=& \texttt{<xml/>} \mid \texttt{<xml>}l^*\texttt{</xml>} \\
adamc@551 1445 \textrm{XML pieces} & l &::=& \textrm{text} & \textrm{cdata} \\
adamc@551 1446 &&& \texttt{<}g\texttt{/>} & \textrm{tag with no children} \\
adamc@551 1447 &&& \texttt{<}g\texttt{>}l^*\texttt{</}x\texttt{>} & \textrm{tag with children} \\
adamc@559 1448 &&& \{e\} & \textrm{computed XML fragment} \\
adamc@559 1449 &&& \{[e]\} & \textrm{injection of an Ur expression, via the $\mt{Top}.\mt{txt}$ function} \\
adamc@551 1450 \textrm{Tag} & g &::=& h \; (x = v)^* \\
adamc@551 1451 \textrm{Tag head} & h &::=& x & \textrm{tag name} \\
adamc@551 1452 &&& h\{c\} & \textrm{constructor parameter} \\
adamc@551 1453 \textrm{Attribute value} & v &::=& \ell & \textrm{literal value} \\
adamc@551 1454 &&& \{e\} & \textrm{computed value} \\
adamc@551 1455 \end{array}$$
adamc@551 1456
adamc@552 1457
adamc@553 1458 \section{The Structure of Web Applications}
adamc@553 1459
adamc@553 1460 A web application is built from a series of modules, with one module, the last one appearing in the \texttt{.urp} file, designated as the main module. The signature of the main module determines the URL entry points to the application. Such an entry point should have type $\mt{unit} \to \mt{transaction} \; \mt{page}$, where $\mt{page}$ is a type synonym for top-level HTML pages, defined in $\mt{Basis}$. If such a function is at the top level of main module $M$, it will be accessible at URI \texttt{/M/f}, and so on for more deeply-nested functions, as described in Section \ref{tag} below.
adamc@553 1461
adamc@553 1462 When the standalone web server receives a request for a known page, it calls the function for that page, ``running'' the resulting transaction to produce the page to return to the client. Pages link to other pages with the \texttt{link} attribute of the \texttt{a} HTML tag. A link has type $\mt{transaction} \; \mt{page}$, and the semantics of a link are that this transaction should be run to compute the result page, when the link is followed. Link targets are assigned URL names in the same way as top-level entry points.
adamc@553 1463
adamc@553 1464 HTML forms are handled in a similar way. The $\mt{action}$ attribute of a $\mt{submit}$ form tag takes a value of type $\$\mt{use} \to \mt{transaction} \; \mt{page}$, where $\mt{use}$ is a kind-$\{\mt{Type}\}$ record of the form fields used by this action handler. Action handlers are assigned URL patterns in the same way as above.
adamc@553 1465
adamc@558 1466 For both links and actions, direct arguments and local variables mentioned implicitly via closures are automatically included in serialized form in URLs, in the order in which they appear in the source code.
adamc@553 1467
adamc@553 1468
adamc@552 1469 \section{Compiler Phases}
adamc@552 1470
adamc@552 1471 The Ur/Web compiler is unconventional in that it relies on a kind of \emph{heuristic compilation}. Not all valid programs will compile successfully. Informally, programs fail to compile when they are ``too higher order.'' Compiler phases do their best to eliminate different kinds of higher order-ness, but some programs just won't compile. This is a trade-off for producing very efficient executables. Compiled Ur/Web programs use native C representations and require no garbage collection.
adamc@552 1472
adamc@552 1473 In this section, we step through the main phases of compilation, noting what consequences each phase has for effective programming.
adamc@552 1474
adamc@552 1475 \subsection{Parse}
adamc@552 1476
adamc@552 1477 The compiler reads a \texttt{.urp} file, figures out which \texttt{.urs} and \texttt{.ur} files it references, and combines them all into what is conceptually a single sequence of declarations in the core language of Section \ref{core}.
adamc@552 1478
adamc@552 1479 \subsection{Elaborate}
adamc@552 1480
adamc@552 1481 This is where type inference takes place, translating programs into an explicit form with no more wildcards. This phase is the most likely source of compiler error messages.
adamc@552 1482
adamc@552 1483 \subsection{Unnest}
adamc@552 1484
adamc@552 1485 Named local function definitions are moved to the top level, to avoid the need to generate closures.
adamc@552 1486
adamc@552 1487 \subsection{Corify}
adamc@552 1488
adamc@552 1489 Module system features are compiled away, through inlining of functor definitions at application sites. Afterward, most abstraction boundaries are broken, facilitating optimization.
adamc@552 1490
adamc@552 1491 \subsection{Especialize}
adamc@552 1492
adamc@552 1493 Functions are specialized to particular argument patterns. This is an important trick for avoiding the need to maintain any closures at runtime.
adamc@552 1494
adamc@552 1495 \subsection{Untangle}
adamc@552 1496
adamc@552 1497 Remove unnecessary mutual recursion, splitting recursive groups into strongly-connected components.
adamc@552 1498
adamc@552 1499 \subsection{Shake}
adamc@552 1500
adamc@552 1501 Remove all definitions not needed to run the page handlers that are visible in the signature of the last module listed in the \texttt{.urp} file.
adamc@552 1502
adamc@553 1503 \subsection{\label{tag}Tag}
adamc@552 1504
adamc@552 1505 Assign a URL name to each link and form action. It is important that these links and actions are written as applications of named functions, because such names are used to generate URL patterns. A URL pattern has a name built from the full module path of the named function, followed by the function name, with all pieces separated by slashes. The path of a functor application is based on the name given to the result, rather than the path of the functor itself.
adamc@552 1506
adamc@552 1507 \subsection{Reduce}
adamc@552 1508
adamc@552 1509 Apply definitional equality rules to simplify the program as much as possible. This effectively includes inlining of every non-recursive definition.
adamc@552 1510
adamc@552 1511 \subsection{Unpoly}
adamc@552 1512
adamc@552 1513 This phase specializes polymorphic functions to the specific arguments passed to them in the program. If the program contains real polymorphic recursion, Unpoly will be insufficient to avoid later error messages about too much polymorphism.
adamc@552 1514
adamc@552 1515 \subsection{Specialize}
adamc@552 1516
adamc@558 1517 Replace uses of parameterized datatypes with versions specialized to specific parameters. As for Unpoly, this phase will not be effective enough in the presence of polymorphic recursion or other fancy uses of impredicative polymorphism.
adamc@552 1518
adamc@552 1519 \subsection{Shake}
adamc@552 1520
adamc@558 1521 Here the compiler repeats the earlier Shake phase.
adamc@552 1522
adamc@552 1523 \subsection{Monoize}
adamc@552 1524
adamc@552 1525 Programs are translated to a new intermediate language without polymorphism or non-$\mt{Type}$ constructors. Error messages may pop up here if earlier phases failed to remove such features.
adamc@552 1526
adamc@552 1527 This is the stage at which concrete names are generated for cookies, tables, and sequences. They are named following the same convention as for links and actions, based on module path information saved from earlier stages. Table and sequence names separate path elements with underscores instead of slashes, and they are prefixed by \texttt{uw\_}.
adamc@552 1528 \subsection{MonoOpt}
adamc@552 1529
adamc@552 1530 Simple algebraic laws are applied to simplify the program, focusing especially on efficient imperative generation of HTML pages.
adamc@552 1531
adamc@552 1532 \subsection{MonoUntangle}
adamc@552 1533
adamc@552 1534 Unnecessary mutual recursion is broken up again.
adamc@552 1535
adamc@552 1536 \subsection{MonoReduce}
adamc@552 1537
adamc@552 1538 Equivalents of the definitional equality rules are applied to simplify programs, with inlining again playing a major role.
adamc@552 1539
adamc@552 1540 \subsection{MonoShake, MonoOpt}
adamc@552 1541
adamc@552 1542 Unneeded declarations are removed, and basic optimizations are repeated.
adamc@552 1543
adamc@552 1544 \subsection{Fuse}
adamc@552 1545
adamc@552 1546 The compiler tries to simplify calls to recursive functions whose results are immediately written as page output. The write action is pushed inside the function definitions to avoid allocation of intermediate results.
adamc@552 1547
adamc@552 1548 \subsection{MonoUntangle, MonoShake}
adamc@552 1549
adamc@552 1550 Fuse often creates more opportunities to remove spurious mutual recursion.
adamc@552 1551
adamc@552 1552 \subsection{Pathcheck}
adamc@552 1553
adamc@552 1554 The compiler checks that no link or action name has been used more than once.
adamc@552 1555
adamc@552 1556 \subsection{Cjrize}
adamc@552 1557
adamc@552 1558 The program is translated to what is more or less a subset of C. If any use of functions as data remains at this point, the compiler will complain.
adamc@552 1559
adamc@552 1560 \subsection{C Compilation and Linking}
adamc@552 1561
adamc@552 1562 The output of the last phase is pretty-printed as C source code and passed to GCC.
adamc@552 1563
adamc@552 1564
adamc@524 1565 \end{document}