Ur/Web: doc/manual.tex annotate

annotate doc/manual.tex @ 783:ec0a0dd0ca12

Revising manual through end of Section 3

author	Adam Chlipala <adamc@hcoop.net>
date	Tue, 05 May 2009 11:45:03 -0400
parents	f963356b53fd
children	990f80c1cec1

rev	line source
adamc@524	1 \documentclass{article}
adamc@554	2 \usepackage{fullpage,amsmath,amssymb,proof,url}
adamc@524	3
adamc@524	4 \newcommand{\cd}[1]{\texttt{#1}}
adamc@524	5 \newcommand{\mt}[1]{\mathsf{#1}}
adamc@524	6
adamc@524	7 \newcommand{\rc}{+ \hspace{-.075in} + \;}
adamc@527	8 \newcommand{\rcut}{\; \texttt{--} \;}
adamc@527	9 \newcommand{\rcutM}{\; \texttt{---} \;}
adamc@524	10
adamc@524	11 \begin{document}
adamc@524	12
adamc@524	13 \title{The Ur/Web Manual}
adamc@524	14 \author{Adam Chlipala}
adamc@524	15
adamc@524	16 \maketitle
adamc@524	17
adamc@540	18 \tableofcontents
adamc@540	19
adamc@554	20
adamc@554	21 \section{Introduction}
adamc@554	22
adamc@554	23 \emph{Ur} is a programming language designed to introduce richer type system features into functional programming in the tradition of ML and Haskell. Ur is functional, pure, statically-typed, and strict. Ur supports a powerful kind of \emph{metaprogramming} based on \emph{row types}.
adamc@554	24
adamc@554	25 \emph{Ur/Web} is Ur plus a special standard library and associated rules for parsing and optimization. Ur/Web supports construction of dynamic web applications backed by SQL databases. The signature of the standard library is such that well-typed Ur/Web programs ``don't go wrong'' in a very broad sense. Not only do they not crash during particular page generations, but they also may not:
adamc@554	26
adamc@554	27 \begin{itemize}
adamc@554	28 \item Suffer from any kinds of code-injection attacks
adamc@554	29 \item Return invalid HTML
adamc@554	30 \item Contain dead intra-application links
adamc@554	31 \item Have mismatches between HTML forms and the fields expected by their handlers
adamc@652	32 \item Include client-side code that makes incorrect assumptions about the ``AJAX''-style services that the remote web server provides
adamc@554	33 \item Attempt invalid SQL queries
adamc@652	34 \item Use improper marshaling or unmarshaling in communication with SQL databases or between browsers and web servers
adamc@554	35 \end{itemize}
adamc@554	36
adamc@554	37 This type safety is just the foundation of the Ur/Web methodology. It is also possible to use metaprogramming to build significant application pieces by analysis of type structure. For instance, the demo includes an ML-style functor for building an admin interface for an arbitrary SQL table. The type system guarantees that the admin interface sub-application that comes out will always be free of the above-listed bugs, no matter which well-typed table description is given as input.
adamc@554	38
adamc@652	39 The Ur/Web compiler also produces very efficient object code that does not use garbage collection. These compiled programs will often be even more efficient than what most programmers would bother to write in C. The compiler also generates JavaScript versions of client-side code, with no need to write those parts of applications in a different language.
adamc@554	40
adamc@554	41 \medskip
adamc@554	42
adamc@554	43 The official web site for Ur is:
adamc@554	44 \begin{center}
adamc@554	45 \url{http://www.impredicative.com/ur/}
adamc@554	46 \end{center}
adamc@554	47
adamc@555	48
adamc@555	49 \section{Installation}
adamc@555	50
adamc@555	51 If you are lucky, then the following standard command sequence will suffice for installation, in a directory to which you have unpacked the latest distribution tarball.
adamc@555	52
adamc@555	53 \begin{verbatim}
adamc@555	54 ./configure
adamc@555	55 make
adamc@555	56 sudo make install
adamc@555	57 \end{verbatim}
adamc@555	58
adamc@783	59 Some other packages must be installed for the above to work. At a minimum, you need a standard UNIX shell, with standard UNIX tools like sed and GCC in your execution path; MLton, the whole-program optimizing compiler for Standard ML; and the mhash C library. To build programs that access SQL databases, you also need libpq, the PostgreSQL client library. As of this writing, in the ``testing'' version of Debian Linux, this command will install the more uncommon of these dependencies:
adamc@555	60
adamc@555	61 \begin{verbatim}
adamc@783	62 apt-get install mlton libmhash-dev libpq-dev
adamc@555	63 \end{verbatim}
adamc@555	64
adamc@555	65 It is also possible to access the modules of the Ur/Web compiler interactively, within Standard ML of New Jersey. To install the prerequisites in Debian testing:
adamc@555	66
adamc@555	67 \begin{verbatim}
adamc@555	68 apt-get install smlnj libsmlnj-smlnj ml-yacc ml-lpt
adamc@555	69 \end{verbatim}
adamc@555	70
adamc@555	71 To begin an interactive session with the Ur compiler modules, run \texttt{make smlnj}, and then, from within an \texttt{sml} session, run \texttt{CM.make "src/urweb.cm";}. The \texttt{Compiler} module is the main entry point.
adamc@555	72
adamc@555	73 To run an SQL-backed application, you will probably want to install the PostgreSQL server. Version 8.3 or higher is required.
adamc@555	74
adamc@555	75 \begin{verbatim}
adamc@555	76 apt-get install postgresql-8.3
adamc@555	77 \end{verbatim}
adamc@555	78
adamc@555	79 To use the Emacs mode, you must have a modern Emacs installed. We assume that you already know how to do this, if you're in the business of looking for an Emacs mode. The demo generation facility of the compiler will also call out to Emacs to syntax-highlight code, and that process depends on the \texttt{htmlize} module, which can be installed in Debian testing via:
adamc@555	80
adamc@555	81 \begin{verbatim}
adamc@555	82 apt-get install emacs-goodies-el
adamc@555	83 \end{verbatim}
adamc@555	84
adamc@555	85 Even with the right packages installed, configuration and building might fail to work. After you run \texttt{./configure}, you will see the values of some named environment variables printed. You may need to adjust these values to get proper installation for your system. To change a value, store your preferred alternative in the corresponding UNIX environment variable, before running \texttt{./configure}. For instance, here is how to change the list of extra arguments that the Ur/Web compiler will pass to GCC on every invocation.
adamc@555	86
adamc@555	87 \begin{verbatim}
adamc@555	88 GCCARGS=-fnested-functions ./configure
adamc@555	89 \end{verbatim}
adamc@555	90
adamc@555	91 Some OSX users have reported needing to use this particular GCCARGS value.
adamc@555	92
adamc@555	93 The Emacs mode can be set to autoload by adding the following to your \texttt{.emacs} file.
adamc@555	94
adamc@555	95 \begin{verbatim}
adamc@555	96 (add-to-list 'load-path "/usr/local/share/emacs/site-lisp/urweb-mode")
adamc@555	97 (load "urweb-mode-startup")
adamc@555	98 \end{verbatim}
adamc@555	99
adamc@555	100 Change the path in the first line if you chose a different Emacs installation path during configuration.
adamc@555	101
adamc@555	102
adamc@556	103 \section{Command-Line Compiler}
adamc@556	104
adamc@556	105 \subsection{Project Files}
adamc@556	106
adamc@556	107 The basic inputs to the \texttt{urweb} compiler are project files, which have the extension \texttt{.urp}. Here is a sample \texttt{.urp} file.
adamc@556	108
adamc@556	109 \begin{verbatim}
adamc@556	110 database dbname=test
adamc@556	111 sql crud1.sql
adamc@556	112
adamc@556	113 crud
adamc@556	114 crud1
adamc@556	115 \end{verbatim}
adamc@556	116
adamc@556	117 The \texttt{database} line gives the database information string to pass to libpq. In this case, the string only says to connect to a local database named \texttt{test}.
adamc@556	118
adamc@556	119 The \texttt{sql} line asks for an SQL source file to be generated, giving the commands to run to create the tables and sequences that this application expects to find. After building this \texttt{.urp} file, the following commands could be used to initialize the database, assuming that the current UNIX user exists as a Postgres user with database creation privileges:
adamc@556	120
adamc@556	121 \begin{verbatim}
adamc@556	122 createdb test
adamc@556	123 psql -f crud1.sql test
adamc@556	124 \end{verbatim}
adamc@556	125
adamc@556	126 A blank line always separates the named directives from a list of modules to include in the project; if there are no named directives, a blank line must begin the file.
adamc@556	127
adamc@556	128 For each entry \texttt{M} in the module list, the file \texttt{M.urs} is included in the project if it exists, and the file \texttt{M.ur} must exist and is always included.
adamc@556	129
adamc@783	130 Here is the complete list of directive forms. ``FFI'' stands for ``foreign function interface,'' Ur's facility for interaction between Ur programs and C and JavaScript libraries.
adamc@783	131 \begin{itemize}
adamc@783	132 \item \texttt{[allow\|deny] [url\|mime] PATTERN} registers a rule governing which URLs or MIME types are allowed in this application. The first such rule to match a URL or MIME type determines the verdict. If \texttt{PATTERN} ends in \texttt{*}, it is interpreted as a prefix rule. Otherwise, a string must match it exactly.
adamc@783	133 \item \texttt{clientOnly Module.ident} registers an FFI function or transaction that may only be run in client browsers.
adamc@783	134 \item \texttt{clientToServer Module.ident} adds FFI type \texttt{Module.ident} to the list of types that are OK to marshal from clients to servers. Values like XML trees and SQL queries are hard to marshal without introducing expensive validity checks, so it's easier to ensure that the server never trusts clients to send such values. The file \texttt{include/urweb.h} shows examples of the C support functions that are required of any type that may be marshalled. These include \texttt{attrify}, \texttt{urlify}, and \texttt{unurlify} functions.
adamc@783	135 \item \texttt{database DBSTRING} sets the string to pass to libpq to open a database connection.
adamc@783	136 \item \texttt{debug} saves some intermediate C files, which is mostly useful to help in debugging the compiler itself.
adamc@783	137 \item \texttt{effectful Module.ident} registers an FFI function or transaction as having side effects. The optimizer avoids removing, moving, or duplicating calls to such functions. Every effectful FFI function must be registered, or the optimizer may make invalid transformations.
adamc@783	138 \item \texttt{exe FILENAME} sets the filename to which to write the output executable. The default for file \texttt{P.urp} is \texttt{P.exe}.
adamc@783	139 \item \texttt{ffi FILENAME} reads the file \texttt{FILENAME.urs} to determine the interface to a new FFI module. The name of the module is calculated from \texttt{FILENAME} in the same way as for normal source files. See the files \texttt{include/urweb.h} and \texttt{src/c/urweb.c} for examples of C headers and implementations for FFI modules. In general, every type or value \texttt{Module.ident} becomes \texttt{uw\_Module\_ident} in C.
adamc@783	140 \item \texttt{jsFunc Module.ident=name} gives the JavaScript name of an FFI value.
adamc@783	141 \item \texttt{library FILENAME} parses \texttt{FILENAME.urp} and merges its contents with the rest of the current file's contents.
adamc@783	142 \item \texttt{link FILENAME} adds \texttt{FILENAME} to the list of files to be passed to the GCC linker at the end of compilation. This is most useful for importing extra libraries needed by new FFI modules.
adamc@783	143 \item \texttt{prefix PREFIX} sets the prefix included before every URI within the generated application. The default is \texttt{/}.
adamc@783	144 \item \texttt{profile} generates an executable that may be used with gprof.
adamc@783	145 \item \texttt{rewrite KIND FROM TO} gives a rule for rewriting canonical module paths. For instance, the canonical path of a page may be \texttt{Mod1.Mod2.mypage}, while you would rather the page were accessed via a URL containing only \texttt{page}. The directive \texttt{rewrite url Mod1/Mod2/mypage page} would accomplish that. The possible values of \texttt{KIND} determine which kinds of objects are affected. The kind \texttt{all} matches any object, and \texttt{url} matches page URLs. The kinds \texttt{table}, \texttt{sequence}, and \texttt{view} match those sorts of SQL entities, and \texttt{relation} matches any of those three. \texttt{cookie} matches HTTP cookies, and \texttt{style} matches CSS class names. If \texttt{FROM} ends in \texttt{/*}, it is interpreted as a prefix matching rule, and rewriting occurs by replacing only the appropriate prefix of a path with \texttt{TO}. While the actual external names of relations and styles have parts separated by underscores instead of slashes, all rewrite rules must be written in terms of slashes.
adamc@783	146 \item \texttt{script URL} adds \texttt{URL} to the list of extra JavaScript files to be included at the beginning of any page that uses JavaScript. This is most useful for importing JavaScript versions of functions found in new FFI modules.
adamc@783	147 \item \texttt{serverOnly Module.ident} registers an FFI function or transaction that may only be run on the server.
adamc@783	148 \item \texttt{sql FILENAME} sets where to write an SQL file with the commands to create the expected database schema. The default is not to create such a file.
adamc@783	149 \item \texttt{timeout N} sets to \texttt{N} seconds the amount of time that the generated server will wait after the last contact from a client before determining that that client has exited the application. Clients that remain active will take the timeout setting into account in determining how often to ping the server, so it only makes sense to set a high timeout to cope with browser and network delays and failures. Higher timeouts can lead to more unnecessary client information taking up memory on the server. The timeout goes unused by any page that doesn't involve the \texttt{recv} function, since the server only needs to store per-client information for clients that receive asynchronous messages.
adamc@783	150 \end{itemize}
adamc@701	151
adamc@701	152
adamc@557	153 \subsection{Building an Application}
adamc@557	154
adamc@557	155 To compile project \texttt{P.urp}, simply run
adamc@557	156 \begin{verbatim}
adamc@557	157 urweb P
adamc@557	158 \end{verbatim}
adamc@558	159 The output executable is a standalone web server. Run it with the command-line argument \texttt{-h} to see which options it takes. If the project file lists a database, the web server will attempt to connect to that database on startup.
adamc@557	160
adamc@557	161 To time how long the different compiler phases run, without generating an executable, run
adamc@557	162 \begin{verbatim}
adamc@557	163 urweb -timing P
adamc@557	164 \end{verbatim}
adamc@557	165
adamc@556	166
adamc@529	167 \section{Ur Syntax}
adamc@529	168
adamc@529	169 In this section, we describe the syntax of Ur, deferring to a later section discussion of most of the syntax specific to SQL and XML. The sole exceptions are the declaration forms for tables, sequences, and cookies.
adamc@524	170
adamc@524	171 \subsection{Lexical Conventions}
adamc@524	172
adamc@524	173 We give the Ur language definition in \LaTeX $\;$ math mode, since that is prettier than monospaced ASCII. The corresponding ASCII syntax can be read off directly. Here is the key for mapping math symbols to ASCII character sequences.
adamc@524	174
adamc@524	175 \begin{center}
adamc@524	176 \begin{tabular}{rl}
adamc@524	177 \textbf{\LaTeX} & \textbf{ASCII} \\
adamc@524	178 $\to$ & \cd{->} \\
adamc@652	179 $\longrightarrow$ & \cd{-->} \\
adamc@524	180 $\times$ & \cd{*} \\
adamc@524	181 $\lambda$ & \cd{fn} \\
adamc@524	182 $\Rightarrow$ & \cd{=>} \\
adamc@652	183 $\Longrightarrow$ & \cd{==>} \\
adamc@529	184 $\neq$ & \cd{<>} \\
adamc@529	185 $\leq$ & \cd{<=} \\
adamc@529	186 $\geq$ & \cd{>=} \\
adamc@524	187 \\
adamc@524	188 $x$ & Normal textual identifier, not beginning with an uppercase letter \\
adamc@525	189 $X$ & Normal textual identifier, beginning with an uppercase letter \\
adamc@524	190 \end{tabular}
adamc@524	191 \end{center}
adamc@524	192
adamc@525	193 We often write syntax like $e^$ to indicate zero or more copies of $e$, $e^+$ to indicate one or more copies, and $e,^$ and $e,^+$ to indicate multiple copies separated by commas. Another separator may be used in place of a comma. The $e$ term may be surrounded by parentheses to indicate grouping; those parentheses should not be included in the actual ASCII.
adamc@524	194
adamc@526	195 We write $\ell$ for literals of the primitive types, for the most part following C conventions. There are $\mt{int}$, $\mt{float}$, and $\mt{string}$ literals.
adamc@526	196
adamc@527	197 This version of the manual doesn't include operator precedences; see \texttt{src/urweb.grm} for that.
adamc@527	198
adamc@552	199 \subsection{\label{core}Core Syntax}
adamc@524	200
adamc@524	201 \emph{Kinds} classify types and other compile-time-only entities. Each kind in the grammar is listed with a description of the sort of data it classifies.
adamc@524	202 $$\begin{array}{rrcll}
adamc@524	203 \textrm{Kinds} & \kappa &::=& \mt{Type} & \textrm{proper types} \\
adamc@525	204 &&& \mt{Unit} & \textrm{the trivial constructor} \\
adamc@525	205 &&& \mt{Name} & \textrm{field names} \\
adamc@525	206 &&& \kappa \to \kappa & \textrm{type-level functions} \\
adamc@525	207 &&& \{\kappa\} & \textrm{type-level records} \\
adamc@525	208 &&& (\kappa\times^+) & \textrm{type-level tuples} \\
adamc@652	209 &&& X & \textrm{variable} \\
adamc@652	210 &&& X \longrightarrow k & \textrm{kind-polymorphic type-level function} \\
adamc@529	211 &&& \_\_ & \textrm{wildcard} \\
adamc@525	212 &&& (\kappa) & \textrm{explicit precedence} \\
adamc@524	213 \end{array}$$
adamc@524	214
adamc@524	215 Ur supports several different notions of functions that take types as arguments. These arguments can be either implicit, causing them to be inferred at use sites; or explicit, forcing them to be specified manually at use sites. There is a common explicitness annotation convention applied at the definitions of and in the types of such functions.
adamc@524	216 $$\begin{array}{rrcll}
adamc@524	217 \textrm{Explicitness} & ? &::=& :: & \textrm{explicit} \\
adamc@558	218 &&& ::: & \textrm{implicit}
adamc@524	219 \end{array}$$
adamc@524	220
adamc@524	221 \emph{Constructors} are the main class of compile-time-only data. They include proper types and are classified by kinds.
adamc@524	222 $$\begin{array}{rrcll}
adamc@524	223 \textrm{Constructors} & c, \tau &::=& (c) :: \kappa & \textrm{kind annotation} \\
adamc@530	224 &&& \hat{x} & \textrm{constructor variable} \\
adamc@524	225 \\
adamc@525	226 &&& \tau \to \tau & \textrm{function type} \\
adamc@525	227 &&& x \; ? \; \kappa \to \tau & \textrm{polymorphic function type} \\
adamc@652	228 &&& X \longrightarrow \tau & \textrm{kind-polymorphic function type} \\
adamc@525	229 &&& \$ c & \textrm{record type} \\
adamc@524	230 \\
adamc@525	231 &&& c \; c & \textrm{type-level function application} \\
adamc@530	232 &&& \lambda x \; :: \; \kappa \Rightarrow c & \textrm{type-level function abstraction} \\
adamc@524	233 \\
adamc@652	234 &&& X \Longrightarrow c & \textrm{type-level kind-polymorphic function abstraction} \\
adamc@655	235 &&& c [\kappa] & \textrm{type-level kind-polymorphic function application} \\
adamc@652	236 \\
adamc@525	237 &&& () & \textrm{type-level unit} \\
adamc@525	238 &&& \#X & \textrm{field name} \\
adamc@524	239 \\
adamc@525	240 &&& [(c = c)^*] & \textrm{known-length type-level record} \\
adamc@525	241 &&& c \rc c & \textrm{type-level record concatenation} \\
adamc@652	242 &&& \mt{map} & \textrm{type-level record map} \\
adamc@524	243 \\
adamc@558	244 &&& (c,^+) & \textrm{type-level tuple} \\
adamc@525	245 &&& c.n & \textrm{type-level tuple projection ($n \in \mathbb N^+$)} \\
adamc@524	246 \\
adamc@652	247 &&& [c \sim c] \Rightarrow \tau & \textrm{guarded type} \\
adamc@524	248 \\
adamc@529	249 &&& \_ :: \kappa & \textrm{wildcard} \\
adamc@525	250 &&& (c) & \textrm{explicit precedence} \\
adamc@530	251 \\
adamc@530	252 \textrm{Qualified uncapitalized variables} & \hat{x} &::=& x & \textrm{not from a module} \\
adamc@530	253 &&& M.x & \textrm{projection from a module} \\
adamc@525	254 \end{array}$$
adamc@525	255
adamc@655	256 We include both abstraction and application for kind polymorphism, but applications are only inferred internally; they may not be written explicitly in source programs.
adamc@655	257
adamc@525	258 Modules of the module system are described by \emph{signatures}.
adamc@525	259 $$\begin{array}{rrcll}
adamc@525	260 \textrm{Signatures} & S &::=& \mt{sig} \; s^* \; \mt{end} & \textrm{constant} \\
adamc@525	261 &&& X & \textrm{variable} \\
adamc@525	262 &&& \mt{functor}(X : S) : S & \textrm{functor} \\
adamc@529	263 &&& S \; \mt{where} \; \mt{con} \; x = c & \textrm{concretizing an abstract constructor} \\
adamc@525	264 &&& M.X & \textrm{projection from a module} \\
adamc@525	265 \\
adamc@525	266 \textrm{Signature items} & s &::=& \mt{con} \; x :: \kappa & \textrm{abstract constructor} \\
adamc@525	267 &&& \mt{con} \; x :: \kappa = c & \textrm{concrete constructor} \\
adamc@528	268 &&& \mt{datatype} \; x \; x^* = dc\mid^+ & \textrm{algebraic datatype definition} \\
adamc@529	269 &&& \mt{datatype} \; x = \mt{datatype} \; M.x & \textrm{algebraic datatype import} \\
adamc@525	270 &&& \mt{val} \; x : \tau & \textrm{value} \\
adamc@525	271 &&& \mt{structure} \; X : S & \textrm{sub-module} \\
adamc@525	272 &&& \mt{signature} \; X = S & \textrm{sub-signature} \\
adamc@525	273 &&& \mt{include} \; S & \textrm{signature inclusion} \\
adamc@525	274 &&& \mt{constraint} \; c \sim c & \textrm{record disjointness constraint} \\
adamc@654	275 &&& \mt{class} \; x :: \kappa & \textrm{abstract constructor class} \\
adamc@654	276 &&& \mt{class} \; x :: \kappa = c & \textrm{concrete constructor class} \\
adamc@525	277 \\
adamc@525	278 \textrm{Datatype constructors} & dc &::=& X & \textrm{nullary constructor} \\
adamc@525	279 &&& X \; \mt{of} \; \tau & \textrm{unary constructor} \\
adamc@524	280 \end{array}$$
adamc@524	281
adamc@526	282 \emph{Patterns} are used to describe structural conditions on expressions, such that expressions may be tested against patterns, generating assignments to pattern variables if successful.
adamc@526	283 $$\begin{array}{rrcll}
adamc@526	284 \textrm{Patterns} & p &::=& \_ & \textrm{wildcard} \\
adamc@526	285 &&& x & \textrm{variable} \\
adamc@526	286 &&& \ell & \textrm{constant} \\
adamc@526	287 &&& \hat{X} & \textrm{nullary constructor} \\
adamc@526	288 &&& \hat{X} \; p & \textrm{unary constructor} \\
adamc@526	289 &&& \{(x = p,)^*\} & \textrm{rigid record pattern} \\
adamc@526	290 &&& \{(x = p,)^+, \ldots\} & \textrm{flexible record pattern} \\
adamc@527	291 &&& (p) & \textrm{explicit precedence} \\
adamc@526	292 \\
adamc@529	293 \textrm{Qualified capitalized variables} & \hat{X} &::=& X & \textrm{not from a module} \\
adamc@526	294 &&& M.X & \textrm{projection from a module} \\
adamc@526	295 \end{array}$$
adamc@526	296
adamc@527	297 \emph{Expressions} are the main run-time entities, corresponding to both ``expressions'' and ``statements'' in mainstream imperative languages.
adamc@527	298 $$\begin{array}{rrcll}
adamc@527	299 \textrm{Expressions} & e &::=& e : \tau & \textrm{type annotation} \\
adamc@529	300 &&& \hat{x} & \textrm{variable} \\
adamc@529	301 &&& \hat{X} & \textrm{datatype constructor} \\
adamc@527	302 &&& \ell & \textrm{constant} \\
adamc@527	303 \\
adamc@527	304 &&& e \; e & \textrm{function application} \\
adamc@527	305 &&& \lambda x : \tau \Rightarrow e & \textrm{function abstraction} \\
adamc@527	306 &&& e [c] & \textrm{polymorphic function application} \\
adamc@527	307 &&& \lambda x \; ? \; \kappa \Rightarrow e & \textrm{polymorphic function abstraction} \\
adamc@655	308 &&& e [\kappa] & \textrm{kind-polymorphic function application} \\
adamc@652	309 &&& X \Longrightarrow e & \textrm{kind-polymorphic function abstraction} \\
adamc@527	310 \\
adamc@527	311 &&& \{(c = e,)^*\} & \textrm{known-length record} \\
adamc@527	312 &&& e.c & \textrm{record field projection} \\
adamc@527	313 &&& e \rc e & \textrm{record concatenation} \\
adamc@527	314 &&& e \rcut c & \textrm{removal of a single record field} \\
adamc@527	315 &&& e \rcutM c & \textrm{removal of multiple record fields} \\
adamc@527	316 \\
adamc@527	317 &&& \mt{let} \; ed^* \; \mt{in} \; e \; \mt{end} & \textrm{local definitions} \\
adamc@527	318 \\
adamc@527	319 &&& \mt{case} \; e \; \mt{of} \; (p \Rightarrow e\|)^+ & \textrm{pattern matching} \\
adamc@527	320 \\
adamc@654	321 &&& \lambda [c \sim c] \Rightarrow e & \textrm{guarded expression abstraction} \\
adamc@654	322 &&& e \; ! & \textrm{guarded expression application} \\
adamc@527	323 \\
adamc@527	324 &&& \_ & \textrm{wildcard} \\
adamc@527	325 &&& (e) & \textrm{explicit precedence} \\
adamc@527	326 \\
adamc@527	327 \textrm{Local declarations} & ed &::=& \cd{val} \; x : \tau = e & \textrm{non-recursive value} \\
adamc@527	328 &&& \cd{val} \; \cd{rec} \; (x : \tau = e \; \cd{and})^+ & \textrm{mutually-recursive values} \\
adamc@527	329 \end{array}$$
adamc@527	330
adamc@655	331 As with constructors, we include both abstraction and application for kind polymorphism, but applications are only inferred internally.
adamc@655	332
adamc@528	333 \emph{Declarations} primarily bring new symbols into context.
adamc@528	334 $$\begin{array}{rrcll}
adamc@528	335 \textrm{Declarations} & d &::=& \mt{con} \; x :: \kappa = c & \textrm{constructor synonym} \\
adamc@528	336 &&& \mt{datatype} \; x \; x^* = dc\mid^+ & \textrm{algebraic datatype definition} \\
adamc@529	337 &&& \mt{datatype} \; x = \mt{datatype} \; M.x & \textrm{algebraic datatype import} \\
adamc@528	338 &&& \mt{val} \; x : \tau = e & \textrm{value} \\
adamc@528	339 &&& \mt{val} \; \cd{rec} \; (x : \tau = e \; \mt{and})^+ & \textrm{mutually-recursive values} \\
adamc@528	340 &&& \mt{structure} \; X : S = M & \textrm{module definition} \\
adamc@528	341 &&& \mt{signature} \; X = S & \textrm{signature definition} \\
adamc@528	342 &&& \mt{open} \; M & \textrm{module inclusion} \\
adamc@528	343 &&& \mt{constraint} \; c \sim c & \textrm{record disjointness constraint} \\
adamc@528	344 &&& \mt{open} \; \mt{constraints} \; M & \textrm{inclusion of just the constraints from a module} \\
adamc@528	345 &&& \mt{table} \; x : c & \textrm{SQL table} \\
adamc@528	346 &&& \mt{sequence} \; x & \textrm{SQL sequence} \\
adamc@535	347 &&& \mt{cookie} \; x : \tau & \textrm{HTTP cookie} \\
adamc@654	348 &&& \mt{class} \; x :: \kappa = c & \textrm{concrete constructor class} \\
adamc@528	349 \\
adamc@529	350 \textrm{Modules} & M &::=& \mt{struct} \; d^* \; \mt{end} & \textrm{constant} \\
adamc@529	351 &&& X & \textrm{variable} \\
adamc@529	352 &&& M.X & \textrm{projection} \\
adamc@529	353 &&& M(M) & \textrm{functor application} \\
adamc@529	354 &&& \mt{functor}(X : S) : S = M & \textrm{functor abstraction} \\
adamc@528	355 \end{array}$$
adamc@528	356
adamc@528	357 There are two kinds of Ur files. A file named $M\texttt{.ur}$ is an \emph{implementation file}, and it should contain a sequence of declarations $d^$. A file named $M\texttt{.urs}$ is an \emph{interface file}; it must always have a matching $M\texttt{.ur}$ and should contain a sequence of signature items $s^$. When both files are present, the overall effect is the same as a monolithic declaration $\mt{structure} \; M : \mt{sig} \; s^* \; \mt{end} = \mt{struct} \; d^* \; \mt{end}$. When no interface file is included, the overall effect is similar, with a signature for module $M$ being inferred rather than just checked against an interface.
adamc@527	358
adamc@529	359 \subsection{Shorthands}
adamc@529	360
adamc@529	361 There are a variety of derived syntactic forms that elaborate into the core syntax from the last subsection. We will present the additional forms roughly following the order in which we presented the constructs that they elaborate into.
adamc@529	362
adamc@529	363 In many contexts where record fields are expected, like in a projection $e.c$, a constant field may be written as simply $X$, rather than $\#X$.
adamc@529	364
adamc@529	365 A record type may be written $\{(c = c,)^\}$, which elaborates to $\$[(c = c,)^]$.
adamc@529	366
adamc@533	367 The notation $[c_1, \ldots, c_n]$ is shorthand for $[c_1 = (), \ldots, c_n = ()]$.
adamc@533	368
adamc@529	369 A tuple type $(\tau_1, \ldots, \tau_n)$ expands to a record type $\{1 = \tau_1, \ldots, n = \tau_n\}$, with natural numbers as field names. A tuple pattern $(p_1, \ldots, p_n)$ expands to a rigid record pattern $\{1 = p_1, \ldots, n = p_n\}$. Positive natural numbers may be used in most places where field names would be allowed.
adamc@529	370
adamc@654	371 In general, several adjacent $\lambda$ forms may be combined into one, and kind and type annotations may be omitted, in which case they are implicitly included as wildcards. More formally, for constructor-level abstractions, we can define a new non-terminal $b ::= x \mid (x :: \kappa) \mid X$ and allow composite abstractions of the form $\lambda b^+ \Rightarrow c$, elaborating into the obvious sequence of one core $\lambda$ per element of $b^+$.
adamc@529	372
adamc@529	373 For any signature item or declaration that defines some entity to be equal to $A$ with classification annotation $B$ (e.g., $\mt{val} \; x : B = A$), $B$ and the preceding colon (or similar punctuation) may be omitted, in which case it is filled in as a wildcard.
adamc@529	374
adamc@529	375 A signature item or declaration $\mt{type} \; x$ or $\mt{type} \; x = \tau$ is elaborated into $\mt{con} \; x :: \mt{Type}$ or $\mt{con} \; x :: \mt{Type} = \tau$, respectively.
adamc@529	376
adamc@654	377 A signature item or declaration $\mt{class} \; x = \lambda y \Rightarrow c$ may be abbreviated $\mt{class} \; x \; y = c$.
adamc@529	378
adamc@654	379 Handling of implicit and explicit constructor arguments may be tweaked with some prefixes to variable references. An expression $@x$ is a version of $x$ where all implicit constructor arguments have been made explicit. An expression $@@x$ achieves the same effect, additionally halting automatic resolution of type class instances and automatic proving of disjointness constraints. The default is that any prefix of a variable's type consisting only of implicit polymorphism, type class instances, and disjointness obligations is resolved automatically, with the variable treated as having the type that starts after the last implicit element, with suitable unification variables substituted. The same syntax works for variables projected out of modules and for capitalized variables (datatype constructors).
adamc@529	380
adamc@654	381 At the expression level, an analogue is available of the composite $\lambda$ form for constructors. We define the language of binders as $b ::= x \mid (x : \tau) \mid (x \; ? \; \kappa) \mid X \mid [c \sim c]$. A lone variable $x$ as a binder stands for an expression variable of unspecified type.
adamc@529	382
adamc@529	383 A $\mt{val}$ or $\mt{val} \; \mt{rec}$ declaration may include expression binders before the equal sign, following the binder grammar from the last paragraph. Such declarations are elaborated into versions that add additional $\lambda$s to the fronts of the righthand sides, as appropriate. The keyword $\mt{fun}$ is a synonym for $\mt{val} \; \mt{rec}$.
adamc@529	384
adamc@529	385 A signature item $\mt{functor} \; X_1 \; (X_2 : S_1) : S_2$ is elaborated into $\mt{structure} \; X_1 : \mt{functor}(X_2 : S_1) : S_2$. A declaration $\mt{functor} \; X_1 \; (X_2 : S_1) : S_2 = M$ is elaborated into $\mt{structure} \; X_1 : \mt{functor}(X_2 : S_1) : S_2 = \mt{functor}(X_2 : S_1) : S_2 = M$.
adamc@529	386
adamc@529	387 A declaration $\mt{table} \; x : \{(c = c,)^\}$ is elaborated into $\mt{table} \; x : [(c = c,)^]$
adamc@529	388
adamc@529	389 The syntax $\mt{where} \; \mt{type}$ is an alternate form of $\mt{where} \; \mt{con}$.
adamc@529	390
adamc@529	391 The syntax $\mt{if} \; e \; \mt{then} \; e_1 \; \mt{else} \; e_2$ expands to $\mt{case} \; e \; \mt{of} \; \mt{Basis}.\mt{True} \Rightarrow e_1 \mid \mt{Basis}.\mt{False} \Rightarrow e_2$.
adamc@529	392
adamc@529	393 There are infix operator syntaxes for a number of functions defined in the $\mt{Basis}$ module. There is $=$ for $\mt{eq}$, $\neq$ for $\mt{neq}$, $-$ for $\mt{neg}$ (as a prefix operator) and $\mt{minus}$, $+$ for $\mt{plus}$, $\times$ for $\mt{times}$, $/$ for $\mt{div}$, $\%$ for $\mt{mod}$, $<$ for $\mt{lt}$, $\leq$ for $\mt{le}$, $>$ for $\mt{gt}$, and $\geq$ for $\mt{ge}$.
adamc@529	394
adamc@529	395 A signature item $\mt{table} \; x : c$ is shorthand for $\mt{val} \; x : \mt{Basis}.\mt{sql\_table} \; c$. $\mt{sequence} \; x$ is short for $\mt{val} \; x : \mt{Basis}.\mt{sql\_sequence}$, and $\mt{cookie} \; x : \tau$ is shorthand for $\mt{val} \; x : \mt{Basis}.\mt{http\_cookie} \; \tau$.
adamc@529	396
adamc@530	397
adamc@530	398 \section{Static Semantics}
adamc@530	399
adamc@530	400 In this section, we give a declarative presentation of Ur's typing rules and related judgments. Inference is the subject of the next section; here, we assume that an oracle has filled in all wildcards with concrete values.
adamc@530	401
adamc@530	402 Since there is significant mutual recursion among the judgments, we introduce them all before beginning to give rules. We use the same variety of contexts throughout this section, implicitly introducing new sorts of context entries as needed.
adamc@530	403 \begin{itemize}
adamc@655	404 \item $\Gamma \vdash \kappa$ expresses kind well-formedness.
adamc@530	405 \item $\Gamma \vdash c :: \kappa$ assigns a kind to a constructor in a context.
adamc@530	406 \item $\Gamma \vdash c \sim c$ proves the disjointness of two record constructors; that is, that they share no field names. We overload the judgment to apply to pairs of field names as well.
adamc@531	407 \item $\Gamma \vdash c \hookrightarrow C$ proves that record constructor $c$ decomposes into set $C$ of field names and record constructors.
adamc@530	408 \item $\Gamma \vdash c \equiv c$ proves the computational equivalence of two constructors. This is often called a \emph{definitional equality} in the world of type theory.
adamc@530	409 \item $\Gamma \vdash e : \tau$ is a standard typing judgment.
adamc@534	410 \item $\Gamma \vdash p \leadsto \Gamma; \tau$ combines typing of patterns with calculation of which new variables they bind.
adamc@537	411 \item $\Gamma \vdash d \leadsto \Gamma$ expresses how a declaration modifies a context. We overload this judgment to apply to sequences of declarations, as well as to signature items and sequences of signature items.
adamc@537	412 \item $\Gamma \vdash S \equiv S$ is the signature equivalence judgment.
adamc@536	413 \item $\Gamma \vdash S \leq S$ is the signature compatibility judgment. We write $\Gamma \vdash S$ as shorthand for $\Gamma \vdash S \leq S$.
adamc@530	414 \item $\Gamma \vdash M : S$ is the module signature checking judgment.
adamc@537	415 \item $\mt{proj}(M, \overline{s}, V)$ is a partial function for projecting a signature item from $\overline{s}$, given the module $M$ that we project from. $V$ may be $\mt{con} \; x$, $\mt{datatype} \; x$, $\mt{val} \; x$, $\mt{signature} \; X$, or $\mt{structure} \; X$. The parameter $M$ is needed because the projected signature item may refer to other items from $\overline{s}$.
adamc@539	416 \item $\mt{selfify}(M, \overline{s})$ adds information to signature items $\overline{s}$ to reflect the fact that we are concerned with the particular module $M$. This function is overloaded to work over individual signature items as well.
adamc@530	417 \end{itemize}
adamc@530	418
adamc@655	419
adamc@655	420 \subsection{Kind Well-Formedness}
adamc@655	421
adamc@655	422 $$\infer{\Gamma \vdash \mt{Type}}{}
adamc@655	423 \quad \infer{\Gamma \vdash \mt{Unit}}{}
adamc@655	424 \quad \infer{\Gamma \vdash \mt{Name}}{}
adamc@655	425 \quad \infer{\Gamma \vdash \kappa_1 \to \kappa_2}{
adamc@655	426 \Gamma \vdash \kappa_1
adamc@655	427 & \Gamma \vdash \kappa_2
adamc@655	428 }
adamc@655	429 \quad \infer{\Gamma \vdash \{\kappa\}}{
adamc@655	430 \Gamma \vdash \kappa
adamc@655	431 }
adamc@655	432 \quad \infer{\Gamma \vdash (\kappa_1 \times \ldots \times \kappa_n)}{
adamc@655	433 \forall i: \Gamma \vdash \kappa_i
adamc@655	434 }$$
adamc@655	435
adamc@655	436 $$\infer{\Gamma \vdash X}{
adamc@655	437 X \in \Gamma
adamc@655	438 }
adamc@655	439 \quad \infer{\Gamma \vdash X \longrightarrow \kappa}{
adamc@655	440 \Gamma, X \vdash \kappa
adamc@655	441 }$$
adamc@655	442
adamc@530	443 \subsection{Kinding}
adamc@530	444
adamc@655	445 We write $[X \mapsto \kappa_1]\kappa_2$ for capture-avoiding substitution of $\kappa_1$ for $X$ in $\kappa_2$.
adamc@655	446
adamc@530	447 $$\infer{\Gamma \vdash (c) :: \kappa :: \kappa}{
adamc@530	448 \Gamma \vdash c :: \kappa
adamc@530	449 }
adamc@530	450 \quad \infer{\Gamma \vdash x :: \kappa}{
adamc@530	451 x :: \kappa \in \Gamma
adamc@530	452 }
adamc@530	453 \quad \infer{\Gamma \vdash x :: \kappa}{
adamc@530	454 x :: \kappa = c \in \Gamma
adamc@530	455 }$$
adamc@530	456
adamc@530	457 $$\infer{\Gamma \vdash M.x :: \kappa}{
adamc@537	458 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	459 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = \kappa
adamc@530	460 }
adamc@530	461 \quad \infer{\Gamma \vdash M.x :: \kappa}{
adamc@537	462 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	463 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = (\kappa, c)
adamc@530	464 }$$
adamc@530	465
adamc@530	466 $$\infer{\Gamma \vdash \tau_1 \to \tau_2 :: \mt{Type}}{
adamc@530	467 \Gamma \vdash \tau_1 :: \mt{Type}
adamc@530	468 & \Gamma \vdash \tau_2 :: \mt{Type}
adamc@530	469 }
adamc@530	470 \quad \infer{\Gamma \vdash x \; ? \: \kappa \to \tau :: \mt{Type}}{
adamc@530	471 \Gamma, x :: \kappa \vdash \tau :: \mt{Type}
adamc@530	472 }
adamc@655	473 \quad \infer{\Gamma \vdash X \longrightarrow \tau :: \mt{Type}}{
adamc@655	474 \Gamma, X \vdash \tau :: \mt{Type}
adamc@655	475 }
adamc@530	476 \quad \infer{\Gamma \vdash \$c :: \mt{Type}}{
adamc@530	477 \Gamma \vdash c :: \{\mt{Type}\}
adamc@530	478 }$$
adamc@530	479
adamc@530	480 $$\infer{\Gamma \vdash c_1 \; c_2 :: \kappa_2}{
adamc@530	481 \Gamma \vdash c_1 :: \kappa_1 \to \kappa_2
adamc@530	482 & \Gamma \vdash c_2 :: \kappa_1
adamc@530	483 }
adamc@530	484 \quad \infer{\Gamma \vdash \lambda x \; :: \; \kappa_1 \Rightarrow c :: \kappa_1 \to \kappa_2}{
adamc@530	485 \Gamma, x :: \kappa_1 \vdash c :: \kappa_2
adamc@530	486 }$$
adamc@530	487
adamc@655	488 $$\infer{\Gamma \vdash c[\kappa'] :: [X \mapsto \kappa']\kappa}{
adamc@655	489 \Gamma \vdash c :: X \to \kappa
adamc@655	490 & \Gamma \vdash \kappa'
adamc@655	491 }
adamc@655	492 \quad \infer{\Gamma \vdash X \Longrightarrow c :: X \to \kappa}{
adamc@655	493 \Gamma, X \vdash c :: \kappa
adamc@655	494 }$$
adamc@655	495
adamc@530	496 $$\infer{\Gamma \vdash () :: \mt{Unit}}{}
adamc@530	497 \quad \infer{\Gamma \vdash \#X :: \mt{Name}}{}$$
adamc@530	498
adamc@530	499 $$\infer{\Gamma \vdash [\overline{c_i = c'_i}] :: \{\kappa\}}{
adamc@530	500 \forall i: \Gamma \vdash c_i : \mt{Name}
adamc@530	501 & \Gamma \vdash c'_i :: \kappa
adamc@530	502 & \forall i \neq j: \Gamma \vdash c_i \sim c_j
adamc@530	503 }
adamc@530	504 \quad \infer{\Gamma \vdash c_1 \rc c_2 :: \{\kappa\}}{
adamc@530	505 \Gamma \vdash c_1 :: \{\kappa\}
adamc@530	506 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@530	507 & \Gamma \vdash c_1 \sim c_2
adamc@530	508 }$$
adamc@530	509
adamc@655	510 $$\infer{\Gamma \vdash \mt{map} :: (\kappa_1 \to \kappa_2) \to \{\kappa_1\} \to \{\kappa_2\}}{}$$
adamc@530	511
adamc@573	512 $$\infer{\Gamma \vdash (\overline c) :: (\kappa_1 \times \ldots \times \kappa_n)}{
adamc@573	513 \forall i: \Gamma \vdash c_i :: \kappa_i
adamc@530	514 }
adamc@573	515 \quad \infer{\Gamma \vdash c.i :: \kappa_i}{
adamc@573	516 \Gamma \vdash c :: (\kappa_1 \times \ldots \times \kappa_n)
adamc@530	517 }$$
adamc@530	518
adamc@655	519 $$\infer{\Gamma \vdash \lambda [c_1 \sim c_2] \Rightarrow \tau :: \mt{Type}}{
adamc@655	520 \Gamma \vdash c_1 :: \{\kappa\}
adamc@530	521 & \Gamma \vdash c_2 :: \{\kappa'\}
adamc@655	522 & \Gamma, c_1 \sim c_2 \vdash \tau :: \mt{Type}
adamc@530	523 }$$
adamc@530	524
adamc@531	525 \subsection{Record Disjointness}
adamc@531	526
adamc@531	527 $$\infer{\Gamma \vdash c_1 \sim c_2}{
adamc@558	528 \Gamma \vdash c_1 \hookrightarrow C_1
adamc@558	529 & \Gamma \vdash c_2 \hookrightarrow C_2
adamc@558	530 & \forall c'_1 \in C_1, c'_2 \in C_2: \Gamma \vdash c'_1 \sim c'_2
adamc@531	531 }
adamc@531	532 \quad \infer{\Gamma \vdash X \sim X'}{
adamc@531	533 X \neq X'
adamc@531	534 }$$
adamc@531	535
adamc@531	536 $$\infer{\Gamma \vdash c_1 \sim c_2}{
adamc@531	537 c'_1 \sim c'_2 \in \Gamma
adamc@558	538 & \Gamma \vdash c'_1 \hookrightarrow C_1
adamc@558	539 & \Gamma \vdash c'_2 \hookrightarrow C_2
adamc@558	540 & c_1 \in C_1
adamc@558	541 & c_2 \in C_2
adamc@531	542 }$$
adamc@531	543
adamc@531	544 $$\infer{\Gamma \vdash c \hookrightarrow \{c\}}{}
adamc@531	545 \quad \infer{\Gamma \vdash [\overline{c = c'}] \hookrightarrow \{\overline{c}\}}{}
adamc@531	546 \quad \infer{\Gamma \vdash c_1 \rc c_2 \hookrightarrow C_1 \cup C_2}{
adamc@531	547 \Gamma \vdash c_1 \hookrightarrow C_1
adamc@531	548 & \Gamma \vdash c_2 \hookrightarrow C_2
adamc@531	549 }
adamc@531	550 \quad \infer{\Gamma \vdash c \hookrightarrow C}{
adamc@531	551 \Gamma \vdash c \equiv c'
adamc@531	552 & \Gamma \vdash c' \hookrightarrow C
adamc@531	553 }
adamc@531	554 \quad \infer{\Gamma \vdash \mt{map} \; f \; c \hookrightarrow C}{
adamc@531	555 \Gamma \vdash c \hookrightarrow C
adamc@531	556 }$$
adamc@531	557
adamc@541	558 \subsection{\label{definitional}Definitional Equality}
adamc@532	559
adamc@655	560 We use $\mathcal C$ to stand for a one-hole context that, when filled, yields a constructor. The notation $\mathcal C[c]$ plugs $c$ into $\mathcal C$. We omit the standard definition of one-hole contexts. We write $[x \mapsto c_1]c_2$ for capture-avoiding substitution of $c_1$ for $x$ in $c_2$, with analogous notation for substituting a kind in a constructor.
adamc@532	561
adamc@532	562 $$\infer{\Gamma \vdash c \equiv c}{}
adamc@532	563 \quad \infer{\Gamma \vdash c_1 \equiv c_2}{
adamc@532	564 \Gamma \vdash c_2 \equiv c_1
adamc@532	565 }
adamc@532	566 \quad \infer{\Gamma \vdash c_1 \equiv c_3}{
adamc@532	567 \Gamma \vdash c_1 \equiv c_2
adamc@532	568 & \Gamma \vdash c_2 \equiv c_3
adamc@532	569 }
adamc@532	570 \quad \infer{\Gamma \vdash \mathcal C[c_1] \equiv \mathcal C[c_2]}{
adamc@532	571 \Gamma \vdash c_1 \equiv c_2
adamc@532	572 }$$
adamc@532	573
adamc@532	574 $$\infer{\Gamma \vdash x \equiv c}{
adamc@532	575 x :: \kappa = c \in \Gamma
adamc@532	576 }
adamc@532	577 \quad \infer{\Gamma \vdash M.x \equiv c}{
adamc@537	578 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	579 & \mt{proj}(M, \overline{s}, \mt{con} \; x) = (\kappa, c)
adamc@532	580 }
adamc@532	581 \quad \infer{\Gamma \vdash (\overline c).i \equiv c_i}{}$$
adamc@532	582
adamc@532	583 $$\infer{\Gamma \vdash (\lambda x :: \kappa \Rightarrow c) \; c' \equiv [x \mapsto c'] c}{}
adamc@655	584 \quad \infer{\Gamma \vdash (X \Longrightarrow c) [\kappa] \equiv [X \mapsto \kappa] c}{}$$
adamc@655	585
adamc@655	586 $$\infer{\Gamma \vdash c_1 \rc c_2 \equiv c_2 \rc c_1}{}
adamc@532	587 \quad \infer{\Gamma \vdash c_1 \rc (c_2 \rc c_3) \equiv (c_1 \rc c_2) \rc c_3}{}$$
adamc@532	588
adamc@532	589 $$\infer{\Gamma \vdash [] \rc c \equiv c}{}
adamc@532	590 \quad \infer{\Gamma \vdash [\overline{c_1 = c'_1}] \rc [\overline{c_2 = c'_2}] \equiv [\overline{c_1 = c'_1}, \overline{c_2 = c'_2}]}{}$$
adamc@532	591
adamc@655	592 $$\infer{\Gamma \vdash \mt{map} \; f \; [] \equiv []}{}
adamc@655	593 \quad \infer{\Gamma \vdash \mt{map} \; f \; ([c_1 = c_2] \rc c) \equiv [c_1 = f \; c_2] \rc \mt{map} \; f \; c}{}$$
adamc@532	594
adamc@532	595 $$\infer{\Gamma \vdash \mt{map} \; (\lambda x \Rightarrow x) \; c \equiv c}{}
adamc@655	596 \quad \infer{\Gamma \vdash \mt{map} \; f \; (\mt{map} \; f' \; c)
adamc@655	597 \equiv \mt{map} \; (\lambda x \Rightarrow f \; (f' \; x)) \; c}{}$$
adamc@532	598
adamc@532	599 $$\infer{\Gamma \vdash \mt{map} \; f \; (c_1 \rc c_2) \equiv \mt{map} \; f \; c_1 \rc \mt{map} \; f \; c_2}{}$$
adamc@531	600
adamc@534	601 \subsection{Expression Typing}
adamc@533	602
adamc@533	603 We assume the existence of a function $T$ assigning types to literal constants. It maps integer constants to $\mt{Basis}.\mt{int}$, float constants to $\mt{Basis}.\mt{float}$, and string constants to $\mt{Basis}.\mt{string}$.
adamc@533	604
adamc@533	605 We also refer to a function $\mathcal I$, such that $\mathcal I(\tau)$ ``uses an oracle'' to instantiate all constructor function arguments at the beginning of $\tau$ that are marked implicit; i.e., replace $x_1 ::: \kappa_1 \to \ldots \to x_n ::: \kappa_n \to \tau$ with $[x_1 \mapsto c_1]\ldots[x_n \mapsto c_n]\tau$, where the $c_i$s are inferred and $\tau$ does not start like $x ::: \kappa \to \tau'$.
adamc@533	606
adamc@533	607 $$\infer{\Gamma \vdash e : \tau : \tau}{
adamc@533	608 \Gamma \vdash e : \tau
adamc@533	609 }
adamc@533	610 \quad \infer{\Gamma \vdash e : \tau}{
adamc@533	611 \Gamma \vdash e : \tau'
adamc@533	612 & \Gamma \vdash \tau' \equiv \tau
adamc@533	613 }
adamc@533	614 \quad \infer{\Gamma \vdash \ell : T(\ell)}{}$$
adamc@533	615
adamc@533	616 $$\infer{\Gamma \vdash x : \mathcal I(\tau)}{
adamc@533	617 x : \tau \in \Gamma
adamc@533	618 }
adamc@533	619 \quad \infer{\Gamma \vdash M.x : \mathcal I(\tau)}{
adamc@537	620 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	621 & \mt{proj}(M, \overline{s}, \mt{val} \; x) = \tau
adamc@533	622 }
adamc@533	623 \quad \infer{\Gamma \vdash X : \mathcal I(\tau)}{
adamc@533	624 X : \tau \in \Gamma
adamc@533	625 }
adamc@533	626 \quad \infer{\Gamma \vdash M.X : \mathcal I(\tau)}{
adamc@537	627 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	628 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \tau
adamc@533	629 }$$
adamc@533	630
adamc@533	631 $$\infer{\Gamma \vdash e_1 \; e_2 : \tau_2}{
adamc@533	632 \Gamma \vdash e_1 : \tau_1 \to \tau_2
adamc@533	633 & \Gamma \vdash e_2 : \tau_1
adamc@533	634 }
adamc@533	635 \quad \infer{\Gamma \vdash \lambda x : \tau_1 \Rightarrow e : \tau_1 \to \tau_2}{
adamc@533	636 \Gamma, x : \tau_1 \vdash e : \tau_2
adamc@533	637 }$$
adamc@533	638
adamc@533	639 $$\infer{\Gamma \vdash e [c] : [x \mapsto c]\tau}{
adamc@533	640 \Gamma \vdash e : x :: \kappa \to \tau
adamc@533	641 & \Gamma \vdash c :: \kappa
adamc@533	642 }
adamc@533	643 \quad \infer{\Gamma \vdash \lambda x \; ? \; \kappa \Rightarrow e : x \; ? \; \kappa \to \tau}{
adamc@533	644 \Gamma, x :: \kappa \vdash e : \tau
adamc@533	645 }$$
adamc@533	646
adamc@655	647 $$\infer{\Gamma \vdash e [\kappa] : [X \mapsto \kappa]\tau}{
adamc@655	648 \Gamma \vdash e : X \longrightarrow \tau
adamc@655	649 & \Gamma \vdash \kappa
adamc@655	650 }
adamc@655	651 \quad \infer{\Gamma \vdash X \Longrightarrow e : X \longrightarrow \tau}{
adamc@655	652 \Gamma, X \vdash e : \tau
adamc@655	653 }$$
adamc@655	654
adamc@533	655 $$\infer{\Gamma \vdash \{\overline{c = e}\} : \{\overline{c : \tau}\}}{
adamc@533	656 \forall i: \Gamma \vdash c_i :: \mt{Name}
adamc@533	657 & \Gamma \vdash e_i : \tau_i
adamc@533	658 & \forall i \neq j: \Gamma \vdash c_i \sim c_j
adamc@533	659 }
adamc@533	660 \quad \infer{\Gamma \vdash e.c : \tau}{
adamc@533	661 \Gamma \vdash e : \$([c = \tau] \rc c')
adamc@533	662 }
adamc@533	663 \quad \infer{\Gamma \vdash e_1 \rc e_2 : \$(c_1 \rc c_2)}{
adamc@533	664 \Gamma \vdash e_1 : \$c_1
adamc@533	665 & \Gamma \vdash e_2 : \$c_2
adamc@573	666 & \Gamma \vdash c_1 \sim c_2
adamc@533	667 }$$
adamc@533	668
adamc@533	669 $$\infer{\Gamma \vdash e \rcut c : \$c'}{
adamc@533	670 \Gamma \vdash e : \$([c = \tau] \rc c')
adamc@533	671 }
adamc@533	672 \quad \infer{\Gamma \vdash e \rcutM c : \$c'}{
adamc@533	673 \Gamma \vdash e : \$(c \rc c')
adamc@533	674 }$$
adamc@533	675
adamc@533	676 $$\infer{\Gamma \vdash \mt{let} \; \overline{ed} \; \mt{in} \; e \; \mt{end} : \tau}{
adamc@533	677 \Gamma \vdash \overline{ed} \leadsto \Gamma'
adamc@533	678 & \Gamma' \vdash e : \tau
adamc@533	679 }
adamc@533	680 \quad \infer{\Gamma \vdash \mt{case} \; e \; \mt{of} \; \overline{p \Rightarrow e} : \tau}{
adamc@533	681 \forall i: \Gamma \vdash p_i \leadsto \Gamma_i, \tau'
adamc@533	682 & \Gamma_i \vdash e_i : \tau
adamc@533	683 }$$
adamc@533	684
adamc@573	685 $$\infer{\Gamma \vdash \lambda [c_1 \sim c_2] \Rightarrow e : \lambda [c_1 \sim c_2] \Rightarrow \tau}{
adamc@533	686 \Gamma \vdash c_1 :: \{\kappa\}
adamc@655	687 & \Gamma \vdash c_2 :: \{\kappa'\}
adamc@533	688 & \Gamma, c_1 \sim c_2 \vdash e : \tau
adamc@662	689 }
adamc@662	690 \quad \infer{\Gamma \vdash e \; ! : \tau}{
adamc@662	691 \Gamma \vdash e : [c_1 \sim c_2] \Rightarrow \tau
adamc@662	692 & \Gamma \vdash c_1 \sim c_2
adamc@533	693 }$$
adamc@533	694
adamc@534	695 \subsection{Pattern Typing}
adamc@534	696
adamc@534	697 $$\infer{\Gamma \vdash \_ \leadsto \Gamma; \tau}{}
adamc@534	698 \quad \infer{\Gamma \vdash x \leadsto \Gamma, x : \tau; \tau}{}
adamc@534	699 \quad \infer{\Gamma \vdash \ell \leadsto \Gamma; T(\ell)}{}$$
adamc@534	700
adamc@534	701 $$\infer{\Gamma \vdash X \leadsto \Gamma; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@534	702 X : \overline{x ::: \mt{Type}} \to \tau \in \Gamma
adamc@534	703 & \textrm{$\tau$ not a function type}
adamc@534	704 }
adamc@534	705 \quad \infer{\Gamma \vdash X \; p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@534	706 X : \overline{x ::: \mt{Type}} \to \tau'' \to \tau \in \Gamma
adamc@534	707 & \Gamma \vdash p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau''
adamc@534	708 }$$
adamc@534	709
adamc@534	710 $$\infer{\Gamma \vdash M.X \leadsto \Gamma; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@537	711 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	712 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \overline{x ::: \mt{Type}} \to \tau
adamc@534	713 & \textrm{$\tau$ not a function type}
adamc@534	714 }$$
adamc@534	715
adamc@534	716 $$\infer{\Gamma \vdash M.X \; p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau}{
adamc@537	717 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	718 & \mt{proj}(M, \overline{s}, \mt{val} \; X) = \overline{x ::: \mt{Type}} \to \tau'' \to \tau
adamc@534	719 & \Gamma \vdash p \leadsto \Gamma'; \overline{[x_i \mapsto \tau'_i]}\tau''
adamc@534	720 }$$
adamc@534	721
adamc@534	722 $$\infer{\Gamma \vdash \{\overline{x = p}\} \leadsto \Gamma_n; \{\overline{x = \tau}\}}{
adamc@534	723 \Gamma_0 = \Gamma
adamc@534	724 & \forall i: \Gamma_i \vdash p_i \leadsto \Gamma_{i+1}; \tau_i
adamc@534	725 }
adamc@534	726 \quad \infer{\Gamma \vdash \{\overline{x = p}, \ldots\} \leadsto \Gamma_n; \$([\overline{x = \tau}] \rc c)}{
adamc@534	727 \Gamma_0 = \Gamma
adamc@534	728 & \forall i: \Gamma_i \vdash p_i \leadsto \Gamma_{i+1}; \tau_i
adamc@534	729 }$$
adamc@534	730
adamc@535	731 \subsection{Declaration Typing}
adamc@535	732
adamc@535	733 We use an auxiliary judgment $\overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'$, expressing the enrichment of $\Gamma$ with the types of the datatype constructors $\overline{dc}$, when they are known to belong to datatype $x$ with type parameters $\overline{y}$.
adamc@535	734
adamc@655	735 This is the first judgment where we deal with constructor classes, for the $\mt{class}$ declaration form. We will omit their special handling in this formal specification. Section \ref{typeclasses} gives an informal description of how constructor classes influence type inference.
adamc@535	736
adamc@558	737 We presuppose the existence of a function $\mathcal O$, where $\mathcal O(M, \overline{s})$ implements the $\mt{open}$ declaration by producing a context with the appropriate entry for each available component of module $M$ with signature items $\overline{s}$. Where possible, $\mathcal O$ uses ``transparent'' entries (e.g., an abstract type $M.x$ is mapped to $x :: \mt{Type} = M.x$), so that the relationship with $M$ is maintained. A related function $\mathcal O_c$ builds a context containing the disjointness constraints found in $\overline s$.
adamc@537	738 We write $\kappa_1^n \to \kappa$ as a shorthand, where $\kappa_1^0 \to \kappa = \kappa$ and $\kappa_1^{n+1} \to \kappa_2 = \kappa_1 \to (\kappa_1^n \to \kappa_2)$. We write $\mt{len}(\overline{y})$ for the length of vector $\overline{y}$ of variables.
adamc@535	739
adamc@535	740 $$\infer{\Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@535	741 \quad \infer{\Gamma \vdash d, \overline{d} \leadsto \Gamma''}{
adamc@535	742 \Gamma \vdash d \leadsto \Gamma'
adamc@535	743 & \Gamma' \vdash \overline{d} \leadsto \Gamma''
adamc@535	744 }$$
adamc@535	745
adamc@535	746 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leadsto \Gamma, x :: \kappa = c}{
adamc@535	747 \Gamma \vdash c :: \kappa
adamc@535	748 }
adamc@535	749 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leadsto \Gamma'}{
adamc@535	750 \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} \vdash \overline{dc} \leadsto \Gamma'
adamc@535	751 }$$
adamc@535	752
adamc@535	753 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leadsto \Gamma'}{
adamc@537	754 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	755 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@535	756 & \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} = M.z \vdash \overline{dc} \leadsto \Gamma'
adamc@535	757 }$$
adamc@535	758
adamc@535	759 $$\infer{\Gamma \vdash \mt{val} \; x : \tau = e \leadsto \Gamma, x : \tau}{
adamc@535	760 \Gamma \vdash e : \tau
adamc@535	761 }$$
adamc@535	762
adamc@535	763 $$\infer{\Gamma \vdash \mt{val} \; \mt{rec} \; \overline{x : \tau = e} \leadsto \Gamma, \overline{x : \tau}}{
adamc@535	764 \forall i: \Gamma, \overline{x : \tau} \vdash e_i : \tau_i
adamc@535	765 & \textrm{$e_i$ starts with an expression $\lambda$, optionally preceded by constructor and disjointness $\lambda$s}
adamc@535	766 }$$
adamc@535	767
adamc@535	768 $$\infer{\Gamma \vdash \mt{structure} \; X : S = M \leadsto \Gamma, X : S}{
adamc@535	769 \Gamma \vdash M : S
adamc@558	770 & \textrm{ $M$ not a constant or application}
adamc@535	771 }
adamc@558	772 \quad \infer{\Gamma \vdash \mt{structure} \; X : S = M \leadsto \Gamma, X : \mt{selfify}(X, \overline{s})}{
adamc@558	773 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@539	774 }$$
adamc@539	775
adamc@539	776 $$\infer{\Gamma \vdash \mt{signature} \; X = S \leadsto \Gamma, X = S}{
adamc@535	777 \Gamma \vdash S
adamc@535	778 }$$
adamc@535	779
adamc@537	780 $$\infer{\Gamma \vdash \mt{open} \; M \leadsto \Gamma, \mathcal O(M, \overline{s})}{
adamc@537	781 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@535	782 }$$
adamc@535	783
adamc@535	784 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leadsto \Gamma}{
adamc@535	785 \Gamma \vdash c_1 :: \{\kappa\}
adamc@535	786 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@535	787 & \Gamma \vdash c_1 \sim c_2
adamc@535	788 }
adamc@537	789 \quad \infer{\Gamma \vdash \mt{open} \; \mt{constraints} \; M \leadsto \Gamma, \mathcal O_c(M, \overline{s})}{
adamc@537	790 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@535	791 }$$
adamc@535	792
adamc@535	793 $$\infer{\Gamma \vdash \mt{table} \; x : c \leadsto \Gamma, x : \mt{Basis}.\mt{sql\_table} \; c}{
adamc@535	794 \Gamma \vdash c :: \{\mt{Type}\}
adamc@535	795 }
adamc@535	796 \quad \infer{\Gamma \vdash \mt{sequence} \; x \leadsto \Gamma, x : \mt{Basis}.\mt{sql\_sequence}}{}$$
adamc@535	797
adamc@535	798 $$\infer{\Gamma \vdash \mt{cookie} \; x : \tau \leadsto \Gamma, x : \mt{Basis}.\mt{http\_cookie} \; \tau}{
adamc@535	799 \Gamma \vdash \tau :: \mt{Type}
adamc@535	800 }$$
adamc@535	801
adamc@655	802 $$\infer{\Gamma \vdash \mt{class} \; x :: \kappa = c \leadsto \Gamma, x :: \kappa \to \mt{Type} = c}{
adamc@655	803 \Gamma \vdash c :: \kappa \to \mt{Type}
adamc@535	804 }$$
adamc@535	805
adamc@535	806 $$\infer{\overline{y}; x; \Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@535	807 \quad \infer{\overline{y}; x; \Gamma \vdash X \mid \overline{dc} \leadsto \Gamma', X : \overline{y ::: \mt{Type}} \to x \; \overline{y}}{
adamc@535	808 \overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'
adamc@535	809 }
adamc@535	810 \quad \infer{\overline{y}; x; \Gamma \vdash X \; \mt{of} \; \tau \mid \overline{dc} \leadsto \Gamma', X : \overline{y ::: \mt{Type}} \to \tau \to x \; \overline{y}}{
adamc@535	811 \overline{y}; x; \Gamma \vdash \overline{dc} \leadsto \Gamma'
adamc@535	812 }$$
adamc@535	813
adamc@537	814 \subsection{Signature Item Typing}
adamc@537	815
adamc@537	816 We appeal to a signature item analogue of the $\mathcal O$ function from the last subsection.
adamc@537	817
adamc@537	818 $$\infer{\Gamma \vdash \cdot \leadsto \Gamma}{}
adamc@537	819 \quad \infer{\Gamma \vdash s, \overline{s} \leadsto \Gamma''}{
adamc@537	820 \Gamma \vdash s \leadsto \Gamma'
adamc@537	821 & \Gamma' \vdash \overline{s} \leadsto \Gamma''
adamc@537	822 }$$
adamc@537	823
adamc@537	824 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa \leadsto \Gamma, x :: \kappa}{}
adamc@537	825 \quad \infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leadsto \Gamma, x :: \kappa = c}{
adamc@537	826 \Gamma \vdash c :: \kappa
adamc@537	827 }
adamc@537	828 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leadsto \Gamma'}{
adamc@537	829 \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} \vdash \overline{dc} \leadsto \Gamma'
adamc@537	830 }$$
adamc@537	831
adamc@537	832 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leadsto \Gamma'}{
adamc@537	833 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	834 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537	835 & \overline{y}; x; \Gamma, x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type} = M.z \vdash \overline{dc} \leadsto \Gamma'
adamc@537	836 }$$
adamc@537	837
adamc@537	838 $$\infer{\Gamma \vdash \mt{val} \; x : \tau \leadsto \Gamma, x : \tau}{
adamc@537	839 \Gamma \vdash \tau :: \mt{Type}
adamc@537	840 }$$
adamc@537	841
adamc@537	842 $$\infer{\Gamma \vdash \mt{structure} \; X : S \leadsto \Gamma, X : S}{
adamc@537	843 \Gamma \vdash S
adamc@537	844 }
adamc@537	845 \quad \infer{\Gamma \vdash \mt{signature} \; X = S \leadsto \Gamma, X = S}{
adamc@537	846 \Gamma \vdash S
adamc@537	847 }$$
adamc@537	848
adamc@537	849 $$\infer{\Gamma \vdash \mt{include} \; S \leadsto \Gamma, \mathcal O(\overline{s})}{
adamc@537	850 \Gamma \vdash S
adamc@537	851 & \Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	852 }$$
adamc@537	853
adamc@537	854 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leadsto \Gamma, c_1 \sim c_2}{
adamc@537	855 \Gamma \vdash c_1 :: \{\kappa\}
adamc@537	856 & \Gamma \vdash c_2 :: \{\kappa\}
adamc@537	857 }$$
adamc@537	858
adamc@655	859 $$\infer{\Gamma \vdash \mt{class} \; x :: \kappa = c \leadsto \Gamma, x :: \kappa \to \mt{Type} = c}{
adamc@655	860 \Gamma \vdash c :: \kappa \to \mt{Type}
adamc@537	861 }
adamc@655	862 \quad \infer{\Gamma \vdash \mt{class} \; x :: \kappa \leadsto \Gamma, x :: \kappa \to \mt{Type}}{}$$
adamc@537	863
adamc@536	864 \subsection{Signature Compatibility}
adamc@536	865
adamc@558	866 To simplify the judgments in this section, we assume that all signatures are alpha-varied as necessary to avoid including multiple bindings for the same identifier. This is in addition to the usual alpha-variation of locally-bound variables.
adamc@537	867
adamc@537	868 We rely on a judgment $\Gamma \vdash \overline{s} \leq s'$, which expresses the occurrence in signature items $\overline{s}$ of an item compatible with $s'$. We also use a judgment $\Gamma \vdash \overline{dc} \leq \overline{dc}$, which expresses compatibility of datatype definitions.
adamc@537	869
adamc@536	870 $$\infer{\Gamma \vdash S \equiv S}{}
adamc@536	871 \quad \infer{\Gamma \vdash S_1 \equiv S_2}{
adamc@536	872 \Gamma \vdash S_2 \equiv S_1
adamc@536	873 }
adamc@536	874 \quad \infer{\Gamma \vdash X \equiv S}{
adamc@536	875 X = S \in \Gamma
adamc@536	876 }
adamc@536	877 \quad \infer{\Gamma \vdash M.X \equiv S}{
adamc@537	878 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	879 & \mt{proj}(M, \overline{s}, \mt{signature} \; X) = S
adamc@536	880 }$$
adamc@536	881
adamc@536	882 $$\infer{\Gamma \vdash S \; \mt{where} \; \mt{con} \; x = c \equiv \mt{sig} \; \overline{s^1} \; \mt{con} \; x :: \kappa = c \; \overline{s_2} \; \mt{end}}{
adamc@536	883 \Gamma \vdash S \equiv \mt{sig} \; \overline{s^1} \; \mt{con} \; x :: \kappa \; \overline{s_2} \; \mt{end}
adamc@536	884 & \Gamma \vdash c :: \kappa
adamc@537	885 }
adamc@537	886 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s^1} \; \mt{include} \; S \; \overline{s^2} \; \mt{end} \equiv \mt{sig} \; \overline{s^1} \; \overline{s} \; \overline{s^2} \; \mt{end}}{
adamc@537	887 \Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}
adamc@536	888 }$$
adamc@536	889
adamc@536	890 $$\infer{\Gamma \vdash S_1 \leq S_2}{
adamc@536	891 \Gamma \vdash S_1 \equiv S_2
adamc@536	892 }
adamc@536	893 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; \mt{end}}{}
adamc@537	894 \quad \infer{\Gamma \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; s' \; \overline{s'} \; \mt{end}}{
adamc@537	895 \Gamma \vdash \overline{s} \leq s'
adamc@537	896 & \Gamma \vdash s' \leadsto \Gamma'
adamc@537	897 & \Gamma' \vdash \mt{sig} \; \overline{s} \; \mt{end} \leq \mt{sig} \; \overline{s'} \; \mt{end}
adamc@537	898 }$$
adamc@537	899
adamc@537	900 $$\infer{\Gamma \vdash s \; \overline{s} \leq s'}{
adamc@537	901 \Gamma \vdash s \leq s'
adamc@537	902 }
adamc@537	903 \quad \infer{\Gamma \vdash s \; \overline{s} \leq s'}{
adamc@537	904 \Gamma \vdash s \leadsto \Gamma'
adamc@537	905 & \Gamma' \vdash \overline{s} \leq s'
adamc@536	906 }$$
adamc@536	907
adamc@536	908 $$\infer{\Gamma \vdash \mt{functor} (X : S_1) : S_2 \leq \mt{functor} (X : S'_1) : S'_2}{
adamc@536	909 \Gamma \vdash S'_1 \leq S_1
adamc@536	910 & \Gamma, X : S'_1 \vdash S_2 \leq S'_2
adamc@536	911 }$$
adamc@536	912
adamc@537	913 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa \leq \mt{con} \; x :: \kappa}{}
adamc@537	914 \quad \infer{\Gamma \vdash \mt{con} \; x :: \kappa = c \leq \mt{con} \; x :: \kappa}{}
adamc@558	915 \quad \infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leq \mt{con} \; x :: \mt{Type}^{\mt{len}(\overline y)} \to \mt{Type}}{}$$
adamc@537	916
adamc@537	917 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{con} \; x :: \mt{Type}^{\mt{len}(y)} \to \mt{Type}}{
adamc@537	918 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	919 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537	920 }$$
adamc@537	921
adamc@655	922 $$\infer{\Gamma \vdash \mt{class} \; x :: \kappa \leq \mt{con} \; x :: \kappa \to \mt{Type}}{}
adamc@655	923 \quad \infer{\Gamma \vdash \mt{class} \; x :: \kappa = c \leq \mt{con} \; x :: \kappa \to \mt{Type}}{}$$
adamc@537	924
adamc@537	925 $$\infer{\Gamma \vdash \mt{con} \; x :: \kappa = c_1 \leq \mt{con} \; x :: \mt{\kappa} = c_2}{
adamc@537	926 \Gamma \vdash c_1 \equiv c_2
adamc@537	927 }
adamc@655	928 \quad \infer{\Gamma \vdash \mt{class} \; x :: \kappa = c_1 \leq \mt{con} \; x :: \kappa \to \mt{Type} = c_2}{
adamc@537	929 \Gamma \vdash c_1 \equiv c_2
adamc@537	930 }$$
adamc@537	931
adamc@537	932 $$\infer{\Gamma \vdash \mt{datatype} \; x \; \overline{y} = \overline{dc} \leq \mt{datatype} \; x \; \overline{y} = \overline{dc'}}{
adamc@537	933 \Gamma, \overline{y :: \mt{Type}} \vdash \overline{dc} \leq \overline{dc'}
adamc@537	934 }$$
adamc@537	935
adamc@537	936 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{datatype} \; x \; \overline{y} = \overline{dc'}}{
adamc@537	937 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@537	938 & \mt{proj}(M, \overline{s}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})
adamc@537	939 & \Gamma, \overline{y :: \mt{Type}} \vdash \overline{dc} \leq \overline{dc'}
adamc@537	940 }$$
adamc@537	941
adamc@537	942 $$\infer{\Gamma \vdash \cdot \leq \cdot}{}
adamc@537	943 \quad \infer{\Gamma \vdash X; \overline{dc} \leq X; \overline{dc'}}{
adamc@537	944 \Gamma \vdash \overline{dc} \leq \overline{dc'}
adamc@537	945 }
adamc@537	946 \quad \infer{\Gamma \vdash X \; \mt{of} \; \tau_1; \overline{dc} \leq X \; \mt{of} \; \tau_2; \overline{dc'}}{
adamc@537	947 \Gamma \vdash \tau_1 \equiv \tau_2
adamc@537	948 & \Gamma \vdash \overline{dc} \leq \overline{dc'}
adamc@537	949 }$$
adamc@537	950
adamc@537	951 $$\infer{\Gamma \vdash \mt{datatype} \; x = \mt{datatype} \; M.z \leq \mt{datatype} \; x = \mt{datatype} \; M'.z'}{
adamc@537	952 \Gamma \vdash M.z \equiv M'.z'
adamc@537	953 }$$
adamc@537	954
adamc@537	955 $$\infer{\Gamma \vdash \mt{val} \; x : \tau_1 \leq \mt{val} \; x : \tau_2}{
adamc@537	956 \Gamma \vdash \tau_1 \equiv \tau_2
adamc@537	957 }
adamc@537	958 \quad \infer{\Gamma \vdash \mt{structure} \; X : S_1 \leq \mt{structure} \; X : S_2}{
adamc@537	959 \Gamma \vdash S_1 \leq S_2
adamc@537	960 }
adamc@537	961 \quad \infer{\Gamma \vdash \mt{signature} \; X = S_1 \leq \mt{signature} \; X = S_2}{
adamc@537	962 \Gamma \vdash S_1 \leq S_2
adamc@537	963 & \Gamma \vdash S_2 \leq S_1
adamc@537	964 }$$
adamc@537	965
adamc@537	966 $$\infer{\Gamma \vdash \mt{constraint} \; c_1 \sim c_2 \leq \mt{constraint} \; c'_1 \sim c'_2}{
adamc@537	967 \Gamma \vdash c_1 \equiv c'_1
adamc@537	968 & \Gamma \vdash c_2 \equiv c'_2
adamc@537	969 }$$
adamc@537	970
adamc@655	971 $$\infer{\Gamma \vdash \mt{class} \; x :: \kappa \leq \mt{class} \; x :: \kappa}{}
adamc@655	972 \quad \infer{\Gamma \vdash \mt{class} \; x :: \kappa = c \leq \mt{class} \; x :: \kappa}{}
adamc@655	973 \quad \infer{\Gamma \vdash \mt{class} \; x :: \kappa = c_1 \leq \mt{class} \; x :: \kappa = c_2}{
adamc@537	974 \Gamma \vdash c_1 \equiv c_2
adamc@537	975 }$$
adamc@537	976
adamc@538	977 \subsection{Module Typing}
adamc@538	978
adamc@538	979 We use a helper function $\mt{sigOf}$, which converts declarations and sequences of declarations into their principal signature items and sequences of signature items, respectively.
adamc@538	980
adamc@538	981 $$\infer{\Gamma \vdash M : S}{
adamc@538	982 \Gamma \vdash M : S'
adamc@538	983 & \Gamma \vdash S' \leq S
adamc@538	984 }
adamc@538	985 \quad \infer{\Gamma \vdash \mt{struct} \; \overline{d} \; \mt{end} : \mt{sig} \; \mt{sigOf}(\overline{d}) \; \mt{end}}{
adamc@538	986 \Gamma \vdash \overline{d} \leadsto \Gamma'
adamc@538	987 }
adamc@538	988 \quad \infer{\Gamma \vdash X : S}{
adamc@538	989 X : S \in \Gamma
adamc@538	990 }$$
adamc@538	991
adamc@538	992 $$\infer{\Gamma \vdash M.X : S}{
adamc@538	993 \Gamma \vdash M : \mt{sig} \; \overline{s} \; \mt{end}
adamc@538	994 & \mt{proj}(M, \overline{s}, \mt{structure} \; X) = S
adamc@538	995 }$$
adamc@538	996
adamc@538	997 $$\infer{\Gamma \vdash M_1(M_2) : [X \mapsto M_2]S_2}{
adamc@538	998 \Gamma \vdash M_1 : \mt{functor}(X : S_1) : S_2
adamc@538	999 & \Gamma \vdash M_2 : S_1
adamc@538	1000 }
adamc@538	1001 \quad \infer{\Gamma \vdash \mt{functor} (X : S_1) : S_2 = M : \mt{functor} (X : S_1) : S_2}{
adamc@538	1002 \Gamma \vdash S_1
adamc@538	1003 & \Gamma, X : S_1 \vdash S_2
adamc@538	1004 & \Gamma, X : S_1 \vdash M : S_2
adamc@538	1005 }$$
adamc@538	1006
adamc@538	1007 \begin{eqnarray*}
adamc@538	1008 \mt{sigOf}(\cdot) &=& \cdot \\
adamc@538	1009 \mt{sigOf}(s \; \overline{s'}) &=& \mt{sigOf}(s) \; \mt{sigOf}(\overline{s'}) \\
adamc@538	1010 \\
adamc@538	1011 \mt{sigOf}(\mt{con} \; x :: \kappa = c) &=& \mt{con} \; x :: \kappa = c \\
adamc@538	1012 \mt{sigOf}(\mt{datatype} \; x \; \overline{y} = \overline{dc}) &=& \mt{datatype} \; x \; \overline{y} = \overline{dc} \\
adamc@538	1013 \mt{sigOf}(\mt{datatype} \; x = \mt{datatype} \; M.z) &=& \mt{datatype} \; x = \mt{datatype} \; M.z \\
adamc@538	1014 \mt{sigOf}(\mt{val} \; x : \tau = e) &=& \mt{val} \; x : \tau \\
adamc@538	1015 \mt{sigOf}(\mt{val} \; \mt{rec} \; \overline{x : \tau = e}) &=& \overline{\mt{val} \; x : \tau} \\
adamc@538	1016 \mt{sigOf}(\mt{structure} \; X : S = M) &=& \mt{structure} \; X : S \\
adamc@538	1017 \mt{sigOf}(\mt{signature} \; X = S) &=& \mt{signature} \; X = S \\
adamc@538	1018 \mt{sigOf}(\mt{open} \; M) &=& \mt{include} \; S \textrm{ (where $\Gamma \vdash M : S$)} \\
adamc@538	1019 \mt{sigOf}(\mt{constraint} \; c_1 \sim c_2) &=& \mt{constraint} \; c_1 \sim c_2 \\
adamc@538	1020 \mt{sigOf}(\mt{open} \; \mt{constraints} \; M) &=& \cdot \\
adamc@538	1021 \mt{sigOf}(\mt{table} \; x : c) &=& \mt{table} \; x : c \\
adamc@538	1022 \mt{sigOf}(\mt{sequence} \; x) &=& \mt{sequence} \; x \\
adamc@538	1023 \mt{sigOf}(\mt{cookie} \; x : \tau) &=& \mt{cookie} \; x : \tau \\
adamc@655	1024 \mt{sigOf}(\mt{class} \; x :: \kappa = c) &=& \mt{class} \; x :: \kappa = c \\
adamc@538	1025 \end{eqnarray*}
adamc@539	1026 \begin{eqnarray*}
adamc@539	1027 \mt{selfify}(M, \cdot) &=& \cdot \\
adamc@558	1028 \mt{selfify}(M, s \; \overline{s'}) &=& \mt{selfify}(M, s) \; \mt{selfify}(M, \overline{s'}) \\
adamc@539	1029 \\
adamc@539	1030 \mt{selfify}(M, \mt{con} \; x :: \kappa) &=& \mt{con} \; x :: \kappa = M.x \\
adamc@539	1031 \mt{selfify}(M, \mt{con} \; x :: \kappa = c) &=& \mt{con} \; x :: \kappa = c \\
adamc@539	1032 \mt{selfify}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc}) &=& \mt{datatype} \; x \; \overline{y} = \mt{datatype} \; M.x \\
adamc@539	1033 \mt{selfify}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z) &=& \mt{datatype} \; x = \mt{datatype} \; M'.z \\
adamc@539	1034 \mt{selfify}(M, \mt{val} \; x : \tau) &=& \mt{val} \; x : \tau \\
adamc@539	1035 \mt{selfify}(M, \mt{structure} \; X : S) &=& \mt{structure} \; X : \mt{selfify}(M.X, \overline{s}) \textrm{ (where $\Gamma \vdash S \equiv \mt{sig} \; \overline{s} \; \mt{end}$)} \\
adamc@539	1036 \mt{selfify}(M, \mt{signature} \; X = S) &=& \mt{signature} \; X = S \\
adamc@539	1037 \mt{selfify}(M, \mt{include} \; S) &=& \mt{include} \; S \\
adamc@539	1038 \mt{selfify}(M, \mt{constraint} \; c_1 \sim c_2) &=& \mt{constraint} \; c_1 \sim c_2 \\
adamc@655	1039 \mt{selfify}(M, \mt{class} \; x :: \kappa) &=& \mt{class} \; x :: \kappa = M.x \\
adamc@655	1040 \mt{selfify}(M, \mt{class} \; x :: \kappa = c) &=& \mt{class} \; x :: \kappa = c \\
adamc@539	1041 \end{eqnarray*}
adamc@539	1042
adamc@540	1043 \subsection{Module Projection}
adamc@540	1044
adamc@540	1045 \begin{eqnarray*}
adamc@540	1046 \mt{proj}(M, \mt{con} \; x :: \kappa \; \overline{s}, \mt{con} \; x) &=& \kappa \\
adamc@540	1047 \mt{proj}(M, \mt{con} \; x :: \kappa = c \; \overline{s}, \mt{con} \; x) &=& (\kappa, c) \\
adamc@540	1048 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{con} \; x) &=& \mt{Type}^{\mt{len}(\overline{y})} \to \mt{Type} \\
adamc@540	1049 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, \mt{con} \; x) &=& (\mt{Type}^{\mt{len}(\overline{y})} \to \mt{Type}, M'.z) \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@540	1050 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z) = (\overline{y}, \overline{dc})$)} \\
adamc@655	1051 \mt{proj}(M, \mt{class} \; x :: \kappa \; \overline{s}, \mt{con} \; x) &=& \kappa \to \mt{Type} \\
adamc@655	1052 \mt{proj}(M, \mt{class} \; x :: \kappa = c \; \overline{s}, \mt{con} \; x) &=& (\kappa \to \mt{Type}, c) \\
adamc@540	1053 \\
adamc@540	1054 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{datatype} \; x) &=& (\overline{y}, \overline{dc}) \\
adamc@540	1055 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, \mt{con} \; x) &=& \mt{proj}(M', \overline{s'}, \mt{datatype} \; z) \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$)} \\
adamc@540	1056 \\
adamc@540	1057 \mt{proj}(M, \mt{val} \; x : \tau \; \overline{s}, \mt{val} \; x) &=& \tau \\
adamc@540	1058 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to M.x \; \overline y \textrm{ (where $X \in \overline{dc}$)} \\
adamc@540	1059 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to \tau \to M.x \; \overline y \textrm{ (where $X \; \mt{of} \; \tau \in \overline{dc}$)} \\
adamc@540	1060 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to M.x \; \overline y \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@540	1061 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z = (\overline{y}, \overline{dc})$ and $X \in \overline{dc}$)} \\
adamc@540	1062 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z, \mt{val} \; X) &=& \overline{y ::: \mt{Type}} \to \tau \to M.x \; \overline y \textrm{ (where $\Gamma \vdash M' : \mt{sig} \; \overline{s'} \; \mt{end}$} \\
adamc@558	1063 && \textrm{and $\mt{proj}(M', \overline{s'}, \mt{datatype} \; z = (\overline{y}, \overline{dc})$ and $X \; \mt{of} \; \tau \in \overline{dc}$)} \\
adamc@540	1064 \\
adamc@540	1065 \mt{proj}(M, \mt{structure} \; X : S \; \overline{s}, \mt{structure} \; X) &=& S \\
adamc@540	1066 \\
adamc@540	1067 \mt{proj}(M, \mt{signature} \; X = S \; \overline{s}, \mt{signature} \; X) &=& S \\
adamc@540	1068 \\
adamc@540	1069 \mt{proj}(M, \mt{con} \; x :: \kappa \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540	1070 \mt{proj}(M, \mt{con} \; x :: \kappa = c \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540	1071 \mt{proj}(M, \mt{datatype} \; x \; \overline{y} = \overline{dc} \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540	1072 \mt{proj}(M, \mt{datatype} \; x = \mt{datatype} \; M'.z \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540	1073 \mt{proj}(M, \mt{val} \; x : \tau \; \overline{s}, V) &=& \mt{proj}(M, \overline{s}, V) \\
adamc@540	1074 \mt{proj}(M, \mt{structure} \; X : S \; \overline{s}, V) &=& [X \mapsto M.X]\mt{proj}(M, \overline{s}, V) \\
adamc@540	1075 \mt{proj}(M, \mt{signature} \; X = S \; \overline{s}, V) &=& [X \mapsto M.X]\mt{proj}(M, \overline{s}, V) \\
adamc@540	1076 \mt{proj}(M, \mt{include} \; S \; \overline{s}, V) &=& \mt{proj}(M, \overline{s'} \; \overline{s}, V) \textrm{ (where $\Gamma \vdash S \equiv \mt{sig} \; \overline{s'} \; \mt{end}$)} \\
adamc@540	1077 \mt{proj}(M, \mt{constraint} \; c_1 \sim c_2 \; \overline{s}, V) &=& \mt{proj}(M, \overline{s}, V) \\
adamc@655	1078 \mt{proj}(M, \mt{class} \; x :: \kappa \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@655	1079 \mt{proj}(M, \mt{class} \; x :: \kappa = c \; \overline{s}, V) &=& [x \mapsto M.x]\mt{proj}(M, \overline{s}, V) \\
adamc@540	1080 \end{eqnarray*}
adamc@540	1081
adamc@541	1082
adamc@541	1083 \section{Type Inference}
adamc@541	1084
adamc@541	1085 The Ur/Web compiler uses \emph{heuristic type inference}, with no claims of completeness with respect to the declarative specification of the last section. The rules in use seem to work well in practice. This section summarizes those rules, to help Ur programmers predict what will work and what won't.
adamc@541	1086
adamc@541	1087 \subsection{Basic Unification}
adamc@541	1088
adamc@560	1089 Type-checkers for languages based on the Hindley-Milner type discipline, like ML and Haskell, take advantage of \emph{principal typing} properties, making complete type inference relatively straightforward. Inference algorithms are traditionally implemented using type unification variables, at various points asserting equalities between types, in the process discovering the values of type variables. The Ur/Web compiler uses the same basic strategy, but the complexity of the type system rules out easy completeness.
adamc@541	1090
adamc@656	1091 Type-checking can require evaluating recursive functional programs, thanks to the type-level $\mt{map}$ operator. When a unification variable appears in such a type, the next step of computation can be undetermined. The value of that variable might be determined later, but this would be ``too late'' for the unification problems generated at the first occurrence. This is the essential source of incompleteness.
adamc@541	1092
adamc@541	1093 Nonetheless, the unification engine tends to do reasonably well. Unlike in ML, polymorphism is never inferred in definitions; it must be indicated explicitly by writing out constructor-level parameters. By writing these and other annotations, the programmer can generally get the type inference engine to do most of the type reconstruction work.
adamc@541	1094
adamc@541	1095 \subsection{Unifying Record Types}
adamc@541	1096
adamc@570	1097 The type inference engine tries to take advantage of the algebraic rules governing type-level records, as shown in Section \ref{definitional}. When two constructors of record kind are unified, they are reduced to normal forms, with like terms crossed off from each normal form until, hopefully, nothing remains. This cannot be complete, with the inclusion of unification variables. The type-checker can help you understand what goes wrong when the process fails, as it outputs the unmatched remainders of the two normal forms.
adamc@541	1098
adamc@656	1099 \subsection{\label{typeclasses}Constructor Classes}
adamc@541	1100
adamc@656	1101 Ur includes a constructor class facility inspired by Haskell's. The current version is very rudimentary, only supporting instances for particular constructors built up from abstract constructors and datatypes and type-level application.
adamc@541	1102
adamc@656	1103 Constructor classes are integrated with the module system. A constructor class of kind $\kappa$ is just a constructor of kind $\kappa \to \mt{Type}$. By marking such a constructor $c$ as a constructor class, the programmer instructs the type inference engine to, in each scope, record all values of types $c \; c'$ as \emph{instances}. Any function argument whose type is of such a form is treated as implicit, to be determined by examining the current instance database.
adamc@541	1104
adamc@656	1105 The ``dictionary encoding'' often used in Haskell implementations is made explicit in Ur. Constructor class instances are just properly-typed values, and they can also be considered as ``proofs'' of membership in the class. In some cases, it is useful to pass these proofs around explicitly. An underscore written where a proof is expected will also be inferred, if possible, from the current instance database.
adamc@541	1106
adamc@656	1107 Just as for constructors, constructors classes may be exported from modules, and they may be exported as concrete or abstract. Concrete constructor classes have their ``real'' definitions exposed, so that client code may add new instances freely. Abstract constructor classes are useful as ``predicates'' that can be used to enforce invariants, as we will see in some definitions of SQL syntax in the Ur/Web standard library.
adamc@541	1108
adamc@541	1109 \subsection{Reverse-Engineering Record Types}
adamc@541	1110
adamc@656	1111 It's useful to write Ur functions and functors that take record constructors as inputs, but these constructors can grow quite long, even though their values are often implied by other arguments. The compiler uses a simple heuristic to infer the values of unification variables that are mapped over, yielding known results. If the result is empty, we're done; if it's not empty, we replace a single unification variable with a new constructor formed from three new unification variables, as in $[\alpha = \beta] \rc \gamma$. This process can often be repeated to determine a unification variable fully.
adamc@541	1112
adamc@541	1113 \subsection{Implicit Arguments in Functor Applications}
adamc@541	1114
adamc@656	1115 Constructor, constraint, and constructor class witness members of structures may be omitted, when those structures are used in contexts where their assigned signatures imply how to fill in those missing members. This feature combines well with reverse-engineering to allow for uses of complicated meta-programming functors with little more code than would be necessary to invoke an untyped, ad-hoc code generator.
adamc@541	1116
adamc@541	1117
adamc@542	1118 \section{The Ur Standard Library}
adamc@542	1119
adamc@542	1120 The built-in parts of the Ur/Web standard library are described by the signature in \texttt{lib/basis.urs} in the distribution. A module $\mt{Basis}$ ascribing to that signature is available in the initial environment, and every program is implicitly prefixed by $\mt{open} \; \mt{Basis}$.
adamc@542	1121
adamc@542	1122 Additionally, other common functions that are definable within Ur are included in \texttt{lib/top.urs} and \texttt{lib/top.ur}. This $\mt{Top}$ module is also opened implicitly.
adamc@542	1123
adamc@542	1124 The idea behind Ur is to serve as the ideal host for embedded domain-specific languages. For now, however, the ``generic'' functionality is intermixed with Ur/Web-specific functionality, including in these two library modules. We hope that these generic library components have types that speak for themselves. The next section introduces the Ur/Web-specific elements. Here, we only give the type declarations from the beginning of $\mt{Basis}$.
adamc@542	1125 $$\begin{array}{l}
adamc@542	1126 \mt{type} \; \mt{int} \\
adamc@542	1127 \mt{type} \; \mt{float} \\
adamc@542	1128 \mt{type} \; \mt{string} \\
adamc@542	1129 \mt{type} \; \mt{time} \\
adamc@542	1130 \\
adamc@542	1131 \mt{type} \; \mt{unit} = \{\} \\
adamc@542	1132 \\
adamc@542	1133 \mt{datatype} \; \mt{bool} = \mt{False} \mid \mt{True} \\
adamc@542	1134 \\
adamc@542	1135 \mt{datatype} \; \mt{option} \; \mt{t} = \mt{None} \mid \mt{Some} \; \mt{of} \; \mt{t}
adamc@542	1136 \end{array}$$
adamc@542	1137
adamc@657	1138 Another important generic Ur element comes at the beginning of \texttt{top.urs}.
adamc@657	1139
adamc@657	1140 $$\begin{array}{l}
adamc@657	1141 \mt{con} \; \mt{folder} :: \mt{K} \longrightarrow \{\mt{K}\} \to \mt{Type} \\
adamc@657	1142 \\
adamc@657	1143 \mt{val} \; \mt{fold} : \mt{K} \longrightarrow \mt{tf} :: (\{\mt{K}\} \to \mt{Type}) \\
adamc@657	1144 \hspace{.1in} \to (\mt{nm} :: \mt{Name} \to \mt{v} :: \mt{K} \to \mt{r} :: \{\mt{K}\} \to [[\mt{nm}] \sim \mt{r}] \Rightarrow \\
adamc@657	1145 \hspace{.2in} \mt{tf} \; \mt{r} \to \mt{tf} \; ([\mt{nm} = \mt{v}] \rc \mt{r})) \\
adamc@657	1146 \hspace{.1in} \to \mt{tf} \; [] \\
adamc@657	1147 \hspace{.1in} \to \mt{r} :: \{\mt{K}\} \to \mt{folder} \; \mt{r} \to \mt{tf} \; \mt{r}
adamc@657	1148 \end{array}$$
adamc@657	1149
adamc@657	1150 For a type-level record $\mt{r}$, a $\mt{folder} \; \mt{r}$ encodes a permutation of $\mt{r}$'s elements. The $\mt{fold}$ function can be called on a $\mt{folder}$ to iterate over the elements of $\mt{r}$ in that order. $\mt{fold}$ is parameterized on a type-level function to be used to calculate the type of each intermediate result of folding. After processing a subset $\mt{r'}$ of $\mt{r}$'s entries, the type of the accumulator should be $\mt{tf} \; \mt{r'}$. The next two expression arguments to $\mt{fold}$ are the usual step function and initial accumulator, familiar from fold functions over lists. The final two arguments are the record to fold over and a $\mt{folder}$ for it.
adamc@657	1151
adamc@664	1152 The Ur compiler treats $\mt{folder}$ like a constructor class, using built-in rules to infer $\mt{folder}$s for records with known structure. The order in which field names are mentioned in source code is used as a hint about the permutation that the programmer would like.
adamc@657	1153
adamc@542	1154
adamc@542	1155 \section{The Ur/Web Standard Library}
adamc@542	1156
adamc@658	1157 \subsection{Monads}
adamc@658	1158
adamc@658	1159 The Ur Basis defines the monad constructor class from Haskell.
adamc@658	1160
adamc@658	1161 $$\begin{array}{l}
adamc@658	1162 \mt{class} \; \mt{monad} :: \mt{Type} \to \mt{Type} \\
adamc@658	1163 \mt{val} \; \mt{return} : \mt{m} ::: (\mt{Type} \to \mt{Type}) \to \mt{t} ::: \mt{Type} \\
adamc@658	1164 \hspace{.1in} \to \mt{monad} \; \mt{m} \\
adamc@658	1165 \hspace{.1in} \to \mt{t} \to \mt{m} \; \mt{t} \\
adamc@658	1166 \mt{val} \; \mt{bind} : \mt{m} ::: (\mt{Type} \to \mt{Type}) \to \mt{t1} ::: \mt{Type} \to \mt{t2} ::: \mt{Type} \\
adamc@658	1167 \hspace{.1in} \to \mt{monad} \; \mt{m} \\
adamc@658	1168 \hspace{.1in} \to \mt{m} \; \mt{t1} \to (\mt{t1} \to \mt{m} \; \mt{t2}) \\
adamc@658	1169 \hspace{.1in} \to \mt{m} \; \mt{t2}
adamc@658	1170 \end{array}$$
adamc@658	1171
adamc@542	1172 \subsection{Transactions}
adamc@542	1173
adamc@542	1174 Ur is a pure language; we use Haskell's trick to support controlled side effects. The standard library defines a monad $\mt{transaction}$, meant to stand for actions that may be undone cleanly. By design, no other kinds of actions are supported.
adamc@542	1175 $$\begin{array}{l}
adamc@542	1176 \mt{con} \; \mt{transaction} :: \mt{Type} \to \mt{Type} \\
adamc@658	1177 \mt{val} \; \mt{transaction\_monad} : \mt{monad} \; \mt{transaction}
adamc@542	1178 \end{array}$$
adamc@542	1179
adamc@542	1180 \subsection{HTTP}
adamc@542	1181
adamc@542	1182 There are transactions for reading an HTTP header by name and for getting and setting strongly-typed cookies. Cookies may only be created by the $\mt{cookie}$ declaration form, ensuring that they be named consistently based on module structure.
adamc@542	1183 $$\begin{array}{l}
adamc@542	1184 \mt{val} \; \mt{requestHeader} : \mt{string} \to \mt{transaction} \; (\mt{option} \; \mt{string}) \\
adamc@542	1185 \\
adamc@542	1186 \mt{con} \; \mt{http\_cookie} :: \mt{Type} \to \mt{Type} \\
adamc@542	1187 \mt{val} \; \mt{getCookie} : \mt{t} ::: \mt{Type} \to \mt{http\_cookie} \; \mt{t} \to \mt{transaction} \; (\mt{option} \; \mt{t}) \\
adamc@542	1188 \mt{val} \; \mt{setCookie} : \mt{t} ::: \mt{Type} \to \mt{http\_cookie} \; \mt{t} \to \mt{t} \to \mt{transaction} \; \mt{unit}
adamc@542	1189 \end{array}$$
adamc@542	1190
adamc@543	1191 \subsection{SQL}
adamc@543	1192
adamc@543	1193 The fundamental unit of interest in the embedding of SQL is tables, described by a type family and creatable only via the $\mt{table}$ declaration form.
adamc@543	1194 $$\begin{array}{l}
adamc@543	1195 \mt{con} \; \mt{sql\_table} :: \{\mt{Type}\} \to \mt{Type}
adamc@543	1196 \end{array}$$
adamc@543	1197
adamc@543	1198 \subsubsection{Queries}
adamc@543	1199
adamc@543	1200 A final query is constructed via the $\mt{sql\_query}$ function. Constructor arguments respectively specify the table fields we select (as records mapping tables to the subsets of their fields that we choose) and the (always named) extra expressions that we select.
adamc@543	1201 $$\begin{array}{l}
adamc@543	1202 \mt{con} \; \mt{sql\_query} :: \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@543	1203 \mt{val} \; \mt{sql\_query} : \mt{tables} ::: \{\{\mt{Type}\}\} \\
adamc@543	1204 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543	1205 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@543	1206 \hspace{.1in} \to \{\mt{Rows} : \mt{sql\_query1} \; \mt{tables} \; \mt{selectedFields} \; \mt{selectedExps}, \\
adamc@543	1207 \hspace{.2in} \mt{OrderBy} : \mt{sql\_order\_by} \; \mt{tables} \; \mt{selectedExps}, \\
adamc@543	1208 \hspace{.2in} \mt{Limit} : \mt{sql\_limit}, \\
adamc@543	1209 \hspace{.2in} \mt{Offset} : \mt{sql\_offset}\} \\
adamc@543	1210 \hspace{.1in} \to \mt{sql\_query} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543	1211 \end{array}$$
adamc@543	1212
adamc@545	1213 Queries are used by folding over their results inside transactions.
adamc@545	1214 $$\begin{array}{l}
adamc@545	1215 \mt{val} \; \mt{query} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \lambda [\mt{tables} \sim \mt{exps}] \Rightarrow \mt{state} ::: \mt{Type} \to \mt{sql\_query} \; \mt{tables} \; \mt{exps} \\
adamc@658	1216 \hspace{.1in} \to (\$(\mt{exps} \rc \mt{map} \; (\lambda \mt{fields} :: \{\mt{Type}\} \Rightarrow \$\mt{fields}) \; \mt{tables}) \\
adamc@545	1217 \hspace{.2in} \to \mt{state} \to \mt{transaction} \; \mt{state}) \\
adamc@545	1218 \hspace{.1in} \to \mt{state} \to \mt{transaction} \; \mt{state}
adamc@545	1219 \end{array}$$
adamc@545	1220
adamc@543	1221 Most of the complexity of the query encoding is in the type $\mt{sql\_query1}$, which includes simple queries and derived queries based on relational operators. Constructor arguments respectively specify the tables we select from, the subset of fields that we keep from each table for the result rows, and the extra expressions that we select.
adamc@543	1222 $$\begin{array}{l}
adamc@543	1223 \mt{con} \; \mt{sql\_query1} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@543	1224 \\
adamc@543	1225 \mt{type} \; \mt{sql\_relop} \\
adamc@543	1226 \mt{val} \; \mt{sql\_union} : \mt{sql\_relop} \\
adamc@543	1227 \mt{val} \; \mt{sql\_intersect} : \mt{sql\_relop} \\
adamc@543	1228 \mt{val} \; \mt{sql\_except} : \mt{sql\_relop} \\
adamc@543	1229 \mt{val} \; \mt{sql\_relop} : \mt{tables1} ::: \{\{\mt{Type}\}\} \\
adamc@543	1230 \hspace{.1in} \to \mt{tables2} ::: \{\{\mt{Type}\}\} \\
adamc@543	1231 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543	1232 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@543	1233 \hspace{.1in} \to \mt{sql\_relop} \\
adamc@543	1234 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables1} \; \mt{selectedFields} \; \mt{selectedExps} \\
adamc@543	1235 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables2} \; \mt{selectedFields} \; \mt{selectedExps} \\
adamc@543	1236 \hspace{.1in} \to \mt{sql\_query1} \; \mt{selectedFields} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543	1237 \end{array}$$
adamc@543	1238
adamc@543	1239 $$\begin{array}{l}
adamc@543	1240 \mt{val} \; \mt{sql\_query1} : \mt{tables} ::: \{\{\mt{Type}\}\} \\
adamc@543	1241 \hspace{.1in} \to \mt{grouped} ::: \{\{\mt{Type}\}\} \\
adamc@543	1242 \hspace{.1in} \to \mt{selectedFields} ::: \{\{\mt{Type}\}\} \\
adamc@543	1243 \hspace{.1in} \to \mt{selectedExps} ::: \{\mt{Type}\} \\
adamc@658	1244 \hspace{.1in} \to \{\mt{From} : \$(\mt{map} \; \mt{sql\_table} \; \mt{tables}), \\
adamc@543	1245 \hspace{.2in} \mt{Where} : \mt{sql\_exp} \; \mt{tables} \; [] \; [] \; \mt{bool}, \\
adamc@543	1246 \hspace{.2in} \mt{GroupBy} : \mt{sql\_subset} \; \mt{tables} \; \mt{grouped}, \\
adamc@543	1247 \hspace{.2in} \mt{Having} : \mt{sql\_exp} \; \mt{grouped} \; \mt{tables} \; [] \; \mt{bool}, \\
adamc@543	1248 \hspace{.2in} \mt{SelectFields} : \mt{sql\_subset} \; \mt{grouped} \; \mt{selectedFields}, \\
adamc@658	1249 \hspace{.2in} \mt {SelectExps} : \$(\mt{map} \; (\mt{sql\_exp} \; \mt{grouped} \; \mt{tables} \; []) \; \mt{selectedExps}) \} \\
adamc@543	1250 \hspace{.1in} \to \mt{sql\_query1} \; \mt{tables} \; \mt{selectedFields} \; \mt{selectedExps}
adamc@543	1251 \end{array}$$
adamc@543	1252
adamc@543	1253 To encode projection of subsets of fields in $\mt{SELECT}$ clauses, and to encode $\mt{GROUP} \; \mt{BY}$ clauses, we rely on a type family $\mt{sql\_subset}$, capturing what it means for one record of table fields to be a subset of another. The main constructor $\mt{sql\_subset}$ ``proves subset facts'' by requiring a split of a record into kept and dropped parts. The extra constructor $\mt{sql\_subset\_all}$ is a convenience for keeping all fields of a record.
adamc@543	1254 $$\begin{array}{l}
adamc@543	1255 \mt{con} \; \mt{sql\_subset} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \mt{Type} \\
adamc@543	1256 \mt{val} \; \mt{sql\_subset} : \mt{keep\_drop} :: \{(\{\mt{Type}\} \times \{\mt{Type}\})\} \\
adamc@543	1257 \hspace{.1in} \to \mt{sql\_subset} \\
adamc@658	1258 \hspace{.2in} (\mt{map} \; (\lambda \mt{fields} :: (\{\mt{Type}\} \times \{\mt{Type}\}) \Rightarrow \mt{fields}.1 \rc \mt{fields}.2)\; \mt{keep\_drop}) \\
adamc@658	1259 \hspace{.2in} (\mt{map} \; (\lambda \mt{fields} :: (\{\mt{Type}\} \times \{\mt{Type}\}) \Rightarrow \mt{fields}.1) \; \mt{keep\_drop}) \\
adamc@543	1260 \mt{val} \; \mt{sql\_subset\_all} : \mt{tables} :: \{\{\mt{Type}\}\} \to \mt{sql\_subset} \; \mt{tables} \; \mt{tables}
adamc@543	1261 \end{array}$$
adamc@543	1262
adamc@560	1263 SQL expressions are used in several places, including $\mt{SELECT}$, $\mt{WHERE}$, $\mt{HAVING}$, and $\mt{ORDER} \; \mt{BY}$ clauses. They reify a fragment of the standard SQL expression language, while making it possible to inject ``native'' Ur values in some places. The arguments to the $\mt{sql\_exp}$ type family respectively give the unrestricted-availability table fields, the table fields that may only be used in arguments to aggregate functions, the available selected expressions, and the type of the expression.
adamc@543	1264 $$\begin{array}{l}
adamc@543	1265 \mt{con} \; \mt{sql\_exp} :: \{\{\mt{Type}\}\} \to \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \to \mt{Type}
adamc@543	1266 \end{array}$$
adamc@543	1267
adamc@543	1268 Any field in scope may be converted to an expression.
adamc@543	1269 $$\begin{array}{l}
adamc@543	1270 \mt{val} \; \mt{sql\_field} : \mt{otherTabs} ::: \{\{\mt{Type}\}\} \to \mt{otherFields} ::: \{\mt{Type}\} \\
adamc@543	1271 \hspace{.1in} \to \mt{fieldType} ::: \mt{Type} \to \mt{agg} ::: \{\{\mt{Type}\}\} \\
adamc@543	1272 \hspace{.1in} \to \mt{exps} ::: \{\mt{Type}\} \\
adamc@543	1273 \hspace{.1in} \to \mt{tab} :: \mt{Name} \to \mt{field} :: \mt{Name} \\
adamc@543	1274 \hspace{.1in} \to \mt{sql\_exp} \; ([\mt{tab} = [\mt{field} = \mt{fieldType}] \rc \mt{otherFields}] \rc \mt{otherTabs}) \; \mt{agg} \; \mt{exps} \; \mt{fieldType}
adamc@543	1275 \end{array}$$
adamc@543	1276
adamc@544	1277 There is an analogous function for referencing named expressions.
adamc@544	1278 $$\begin{array}{l}
adamc@544	1279 \mt{val} \; \mt{sql\_exp} : \mt{tabs} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{t} ::: \mt{Type} \to \mt{rest} ::: \{\mt{Type}\} \to \mt{nm} :: \mt{Name} \\
adamc@544	1280 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tabs} \; \mt{agg} \; ([\mt{nm} = \mt{t}] \rc \mt{rest}) \; \mt{t}
adamc@544	1281 \end{array}$$
adamc@544	1282
adamc@544	1283 Ur values of appropriate types may be injected into SQL expressions.
adamc@544	1284 $$\begin{array}{l}
adamc@544	1285 \mt{class} \; \mt{sql\_injectable} \\
adamc@544	1286 \mt{val} \; \mt{sql\_bool} : \mt{sql\_injectable} \; \mt{bool} \\
adamc@544	1287 \mt{val} \; \mt{sql\_int} : \mt{sql\_injectable} \; \mt{int} \\
adamc@544	1288 \mt{val} \; \mt{sql\_float} : \mt{sql\_injectable} \; \mt{float} \\
adamc@544	1289 \mt{val} \; \mt{sql\_string} : \mt{sql\_injectable} \; \mt{string} \\
adamc@544	1290 \mt{val} \; \mt{sql\_time} : \mt{sql\_injectable} \; \mt{time} \\
adamc@544	1291 \mt{val} \; \mt{sql\_option\_bool} : \mt{sql\_injectable} \; (\mt{option} \; \mt{bool}) \\
adamc@544	1292 \mt{val} \; \mt{sql\_option\_int} : \mt{sql\_injectable} \; (\mt{option} \; \mt{int}) \\
adamc@544	1293 \mt{val} \; \mt{sql\_option\_float} : \mt{sql\_injectable} \; (\mt{option} \; \mt{float}) \\
adamc@544	1294 \mt{val} \; \mt{sql\_option\_string} : \mt{sql\_injectable} \; (\mt{option} \; \mt{string}) \\
adamc@544	1295 \mt{val} \; \mt{sql\_option\_time} : \mt{sql\_injectable} \; (\mt{option} \; \mt{time}) \\
adamc@544	1296 \mt{val} \; \mt{sql\_inject} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \to \mt{sql\_injectable} \; \mt{t} \\
adamc@544	1297 \hspace{.1in} \to \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t}
adamc@544	1298 \end{array}$$
adamc@544	1299
adamc@544	1300 We have the SQL nullness test, which is necessary because of the strange SQL semantics of equality in the presence of null values.
adamc@544	1301 $$\begin{array}{l}
adamc@544	1302 \mt{val} \; \mt{sql\_is\_null} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544	1303 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; (\mt{option} \; \mt{t}) \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{bool}
adamc@544	1304 \end{array}$$
adamc@544	1305
adamc@559	1306 We have generic nullary, unary, and binary operators.
adamc@544	1307 $$\begin{array}{l}
adamc@544	1308 \mt{con} \; \mt{sql\_nfunc} :: \mt{Type} \to \mt{Type} \\
adamc@544	1309 \mt{val} \; \mt{sql\_current\_timestamp} : \mt{sql\_nfunc} \; \mt{time} \\
adamc@544	1310 \mt{val} \; \mt{sql\_nfunc} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544	1311 \hspace{.1in} \to \mt{sql\_nfunc} \; \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t} \\\end{array}$$
adamc@544	1312
adamc@544	1313 $$\begin{array}{l}
adamc@544	1314 \mt{con} \; \mt{sql\_unary} :: \mt{Type} \to \mt{Type} \to \mt{Type} \\
adamc@544	1315 \mt{val} \; \mt{sql\_not} : \mt{sql\_unary} \; \mt{bool} \; \mt{bool} \\
adamc@544	1316 \mt{val} \; \mt{sql\_unary} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{arg} ::: \mt{Type} \to \mt{res} ::: \mt{Type} \\
adamc@544	1317 \hspace{.1in} \to \mt{sql\_unary} \; \mt{arg} \; \mt{res} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{res} \\
adamc@544	1318 \end{array}$$
adamc@544	1319
adamc@544	1320 $$\begin{array}{l}
adamc@544	1321 \mt{con} \; \mt{sql\_binary} :: \mt{Type} \to \mt{Type} \to \mt{Type} \to \mt{Type} \\
adamc@544	1322 \mt{val} \; \mt{sql\_and} : \mt{sql\_binary} \; \mt{bool} \; \mt{bool} \; \mt{bool} \\
adamc@544	1323 \mt{val} \; \mt{sql\_or} : \mt{sql\_binary} \; \mt{bool} \; \mt{bool} \; \mt{bool} \\
adamc@544	1324 \mt{val} \; \mt{sql\_binary} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{arg_1} ::: \mt{Type} \to \mt{arg_2} ::: \mt{Type} \to \mt{res} ::: \mt{Type} \\
adamc@544	1325 \hspace{.1in} \to \mt{sql\_binary} \; \mt{arg_1} \; \mt{arg_2} \; \mt{res} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg_1} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{arg_2} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{res}
adamc@544	1326 \end{array}$$
adamc@544	1327
adamc@544	1328 $$\begin{array}{l}
adamc@559	1329 \mt{class} \; \mt{sql\_arith} \\
adamc@559	1330 \mt{val} \; \mt{sql\_int\_arith} : \mt{sql\_arith} \; \mt{int} \\
adamc@559	1331 \mt{val} \; \mt{sql\_float\_arith} : \mt{sql\_arith} \; \mt{float} \\
adamc@559	1332 \mt{val} \; \mt{sql\_neg} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_unary} \; \mt{t} \; \mt{t} \\
adamc@559	1333 \mt{val} \; \mt{sql\_plus} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_binary} \; \mt{t} \; \mt{t} \; \mt{t} \\
adamc@559	1334 \mt{val} \; \mt{sql\_minus} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_binary} \; \mt{t} \; \mt{t} \; \mt{t} \\
adamc@559	1335 \mt{val} \; \mt{sql\_times} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_binary} \; \mt{t} \; \mt{t} \; \mt{t} \\
adamc@559	1336 \mt{val} \; \mt{sql\_div} : \mt{t} ::: \mt{Type} \to \mt{sql\_arith} \; \mt{t} \to \mt{sql\_binary} \; \mt{t} \; \mt{t} \; \mt{t} \\
adamc@559	1337 \mt{val} \; \mt{sql\_mod} : \mt{sql\_binary} \; \mt{int} \; \mt{int} \; \mt{int}
adamc@559	1338 \end{array}$$
adamc@544	1339
adamc@656	1340 Finally, we have aggregate functions. The $\mt{COUNT(\ast)}$ syntax is handled specially, since it takes no real argument. The other aggregate functions are placed into a general type family, using constructor classes to restrict usage to properly-typed arguments. The key aspect of the $\mt{sql\_aggregate}$ function's type is the shift of aggregate-function-only fields into unrestricted fields.
adamc@544	1341
adamc@544	1342 $$\begin{array}{l}
adamc@544	1343 \mt{val} \; \mt{sql\_count} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{int}
adamc@544	1344 \end{array}$$
adamc@544	1345
adamc@544	1346 $$\begin{array}{l}
adamc@544	1347 \mt{con} \; \mt{sql\_aggregate} :: \mt{Type} \to \mt{Type} \\
adamc@544	1348 \mt{val} \; \mt{sql\_aggregate} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{agg} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544	1349 \hspace{.1in} \to \mt{sql\_aggregate} \; \mt{t} \to \mt{sql\_exp} \; \mt{agg} \; \mt{agg} \; \mt{exps} \; \mt{t} \to \mt{sql\_exp} \; \mt{tables} \; \mt{agg} \; \mt{exps} \; \mt{t}
adamc@544	1350 \end{array}$$
adamc@544	1351
adamc@544	1352 $$\begin{array}{l}
adamc@544	1353 \mt{class} \; \mt{sql\_summable} \\
adamc@544	1354 \mt{val} \; \mt{sql\_summable\_int} : \mt{sql\_summable} \; \mt{int} \\
adamc@544	1355 \mt{val} \; \mt{sql\_summable\_float} : \mt{sql\_summable} \; \mt{float} \\
adamc@544	1356 \mt{val} \; \mt{sql\_avg} : \mt{t} ::: \mt{Type} \to \mt{sql\_summable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t} \\
adamc@544	1357 \mt{val} \; \mt{sql\_sum} : \mt{t} ::: \mt{Type} \to \mt{sql\_summable} \mt{t} \to \mt{sql\_aggregate} \; \mt{t}
adamc@544	1358 \end{array}$$
adamc@544	1359
adamc@544	1360 $$\begin{array}{l}
adamc@544	1361 \mt{class} \; \mt{sql\_maxable} \\
adamc@544	1362 \mt{val} \; \mt{sql\_maxable\_int} : \mt{sql\_maxable} \; \mt{int} \\
adamc@544	1363 \mt{val} \; \mt{sql\_maxable\_float} : \mt{sql\_maxable} \; \mt{float} \\
adamc@544	1364 \mt{val} \; \mt{sql\_maxable\_string} : \mt{sql\_maxable} \; \mt{string} \\
adamc@544	1365 \mt{val} \; \mt{sql\_maxable\_time} : \mt{sql\_maxable} \; \mt{time} \\
adamc@544	1366 \mt{val} \; \mt{sql\_max} : \mt{t} ::: \mt{Type} \to \mt{sql\_maxable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t} \\
adamc@544	1367 \mt{val} \; \mt{sql\_min} : \mt{t} ::: \mt{Type} \to \mt{sql\_maxable} \; \mt{t} \to \mt{sql\_aggregate} \; \mt{t}
adamc@544	1368 \end{array}$$
adamc@544	1369
adamc@544	1370 We wrap up the definition of query syntax with the types used in representing $\mt{ORDER} \; \mt{BY}$, $\mt{LIMIT}$, and $\mt{OFFSET}$ clauses.
adamc@544	1371 $$\begin{array}{l}
adamc@544	1372 \mt{type} \; \mt{sql\_direction} \\
adamc@544	1373 \mt{val} \; \mt{sql\_asc} : \mt{sql\_direction} \\
adamc@544	1374 \mt{val} \; \mt{sql\_desc} : \mt{sql\_direction} \\
adamc@544	1375 \\
adamc@544	1376 \mt{con} \; \mt{sql\_order\_by} :: \{\{\mt{Type}\}\} \to \{\mt{Type}\} \to \mt{Type} \\
adamc@544	1377 \mt{val} \; \mt{sql\_order\_by\_Nil} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} :: \{\mt{Type}\} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \\
adamc@544	1378 \mt{val} \; \mt{sql\_order\_by\_Cons} : \mt{tables} ::: \{\{\mt{Type}\}\} \to \mt{exps} ::: \{\mt{Type}\} \to \mt{t} ::: \mt{Type} \\
adamc@544	1379 \hspace{.1in} \to \mt{sql\_exp} \; \mt{tables} \; [] \; \mt{exps} \; \mt{t} \to \mt{sql\_direction} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \to \mt{sql\_order\_by} \; \mt{tables} \; \mt{exps} \\
adamc@544	1380 \\
adamc@544	1381 \mt{type} \; \mt{sql\_limit} \\
adamc@544	1382 \mt{val} \; \mt{sql\_no\_limit} : \mt{sql\_limit} \\
adamc@544	1383 \mt{val} \; \mt{sql\_limit} : \mt{int} \to \mt{sql\_limit} \\
adamc@544	1384 \\
adamc@544	1385 \mt{type} \; \mt{sql\_offset} \\
adamc@544	1386 \mt{val} \; \mt{sql\_no\_offset} : \mt{sql\_offset} \\
adamc@544	1387 \mt{val} \; \mt{sql\_offset} : \mt{int} \to \mt{sql\_offset}
adamc@544	1388 \end{array}$$
adamc@544	1389
adamc@545	1390
adamc@545	1391 \subsubsection{DML}
adamc@545	1392
adamc@545	1393 The Ur/Web library also includes an embedding of a fragment of SQL's DML, the Data Manipulation Language, for modifying database tables. Any piece of DML may be executed in a transaction.
adamc@545	1394
adamc@545	1395 $$\begin{array}{l}
adamc@545	1396 \mt{type} \; \mt{dml} \\
adamc@545	1397 \mt{val} \; \mt{dml} : \mt{dml} \to \mt{transaction} \; \mt{unit}
adamc@545	1398 \end{array}$$
adamc@545	1399
adamc@545	1400 Properly-typed records may be used to form $\mt{INSERT}$ commands.
adamc@545	1401 $$\begin{array}{l}
adamc@545	1402 \mt{val} \; \mt{insert} : \mt{fields} ::: \{\mt{Type}\} \to \mt{sql\_table} \; \mt{fields} \\
adamc@658	1403 \hspace{.1in} \to \$(\mt{map} \; (\mt{sql\_exp} \; [] \; [] \; []) \; \mt{fields}) \to \mt{dml}
adamc@545	1404 \end{array}$$
adamc@545	1405
adamc@545	1406 An $\mt{UPDATE}$ command is formed from a choice of which table fields to leave alone and which to change, along with an expression to use to compute the new value of each changed field and a $\mt{WHERE}$ clause.
adamc@545	1407 $$\begin{array}{l}
adamc@545	1408 \mt{val} \; \mt{update} : \mt{unchanged} ::: \{\mt{Type}\} \to \mt{changed} :: \{\mt{Type}\} \to \lambda [\mt{changed} \sim \mt{unchanged}] \\
adamc@658	1409 \hspace{.1in} \Rightarrow \$(\mt{map} \; (\mt{sql\_exp} \; [\mt{T} = \mt{changed} \rc \mt{unchanged}] \; [] \; []) \; \mt{changed}) \\
adamc@545	1410 \hspace{.1in} \to \mt{sql\_table} \; (\mt{changed} \rc \mt{unchanged}) \to \mt{sql\_exp} \; [\mt{T} = \mt{changed} \rc \mt{unchanged}] \; [] \; [] \; \mt{bool} \to \mt{dml}
adamc@545	1411 \end{array}$$
adamc@545	1412
adamc@545	1413 A $\mt{DELETE}$ command is formed from a table and a $\mt{WHERE}$ clause.
adamc@545	1414 $$\begin{array}{l}
adamc@545	1415 \mt{val} \; \mt{delete} : \mt{fields} ::: \{\mt{Type}\} \to \mt{sql\_table} \; \mt{fields} \to \mt{sql\_exp} \; [\mt{T} = \mt{fields}] \; [] \; [] \; \mt{bool} \to \mt{dml}
adamc@545	1416 \end{array}$$
adamc@545	1417
adamc@546	1418 \subsubsection{Sequences}
adamc@546	1419
adamc@546	1420 SQL sequences are counters with concurrency control, often used to assign unique IDs. Ur/Web supports them via a simple interface. The only way to create a sequence is with the $\mt{sequence}$ declaration form.
adamc@546	1421
adamc@546	1422 $$\begin{array}{l}
adamc@546	1423 \mt{type} \; \mt{sql\_sequence} \\
adamc@546	1424 \mt{val} \; \mt{nextval} : \mt{sql\_sequence} \to \mt{transaction} \; \mt{int}
adamc@546	1425 \end{array}$$
adamc@546	1426
adamc@546	1427
adamc@547	1428 \subsection{XML}
adamc@547	1429
adamc@547	1430 Ur/Web's library contains an encoding of XML syntax and semantic constraints. We make no effort to follow the standards governing XML schemas. Rather, XML fragments are viewed more as values of ML datatypes, and we only track which tags are allowed inside which other tags.
adamc@547	1431
adamc@547	1432 The basic XML type family has arguments respectively indicating the \emph{context} of a fragment, the fields that the fragment expects to be bound on entry (and their types), and the fields that the fragment will bind (and their types). Contexts are a record-based ``poor man's subtyping'' encoding, with each possible set of valid tags corresponding to a different context record. The arguments dealing with field binding are only relevant to HTML forms.
adamc@547	1433 $$\begin{array}{l}
adamc@547	1434 \mt{con} \; \mt{xml} :: \{\mt{Unit}\} \to \{\mt{Type}\} \to \{\mt{Type}\} \to \mt{Type}
adamc@547	1435 \end{array}$$
adamc@547	1436
adamc@547	1437 We also have a type family of XML tags, indexed respectively by the record of optional attributes accepted by the tag, the context in which the tag may be placed, the context required of children of the tag, which form fields the tag uses, and which fields the tag defines.
adamc@547	1438 $$\begin{array}{l}
adamc@547	1439 \mt{con} \; \mt{tag} :: \{\mt{Type}\} \to \{\mt{Unit}\} \to \{\mt{Unit}\} \to \{\mt{Type}\} \to \{\mt{Type}\} \to \mt{Type}
adamc@547	1440 \end{array}$$
adamc@547	1441
adamc@547	1442 Literal text may be injected into XML as ``CDATA.''
adamc@547	1443 $$\begin{array}{l}
adamc@547	1444 \mt{val} \; \mt{cdata} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use} ::: \{\mt{Type}\} \to \mt{string} \to \mt{xml} \; \mt{ctx} \; \mt{use} \; []
adamc@547	1445 \end{array}$$
adamc@547	1446
adamc@547	1447 There is a function for producing an XML tree with a particular tag at its root.
adamc@547	1448 $$\begin{array}{l}
adamc@547	1449 \mt{val} \; \mt{tag} : \mt{attrsGiven} ::: \{\mt{Type}\} \to \mt{attrsAbsent} ::: \{\mt{Type}\} \to \mt{ctxOuter} ::: \{\mt{Unit}\} \to \mt{ctxInner} ::: \{\mt{Unit}\} \\
adamc@547	1450 \hspace{.1in} \to \mt{useOuter} ::: \{\mt{Type}\} \to \mt{useInner} ::: \{\mt{Type}\} \to \mt{bindOuter} ::: \{\mt{Type}\} \to \mt{bindInner} ::: \{\mt{Type}\} \\
adamc@547	1451 \hspace{.1in} \to \lambda [\mt{attrsGiven} \sim \mt{attrsAbsent}] \; [\mt{useOuter} \sim \mt{useInner}] \; [\mt{bindOuter} \sim \mt{bindInner}] \Rightarrow \$\mt{attrsGiven} \\
adamc@547	1452 \hspace{.1in} \to \mt{tag} \; (\mt{attrsGiven} \rc \mt{attrsAbsent}) \; \mt{ctxOuter} \; \mt{ctxInner} \; \mt{useOuter} \; \mt{bindOuter} \\
adamc@547	1453 \hspace{.1in} \to \mt{xml} \; \mt{ctxInner} \; \mt{useInner} \; \mt{bindInner} \to \mt{xml} \; \mt{ctxOuter} \; (\mt{useOuter} \rc \mt{useInner}) \; (\mt{bindOuter} \rc \mt{bindInner})
adamc@547	1454 \end{array}$$
adamc@547	1455
adamc@547	1456 Two XML fragments may be concatenated.
adamc@547	1457 $$\begin{array}{l}
adamc@547	1458 \mt{val} \; \mt{join} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use_1} ::: \{\mt{Type}\} \to \mt{bind_1} ::: \{\mt{Type}\} \to \mt{bind_2} ::: \{\mt{Type}\} \\
adamc@547	1459 \hspace{.1in} \to \lambda [\mt{use_1} \sim \mt{bind_1}] \; [\mt{bind_1} \sim \mt{bind_2}] \\
adamc@547	1460 \hspace{.1in} \Rightarrow \mt{xml} \; \mt{ctx} \; \mt{use_1} \; \mt{bind_1} \to \mt{xml} \; \mt{ctx} \; (\mt{use_1} \rc \mt{bind_1}) \; \mt{bind_2} \to \mt{xml} \; \mt{ctx} \; \mt{use_1} \; (\mt{bind_1} \rc \mt{bind_2})
adamc@547	1461 \end{array}$$
adamc@547	1462
adamc@547	1463 Finally, any XML fragment may be updated to ``claim'' to use more form fields than it does.
adamc@547	1464 $$\begin{array}{l}
adamc@547	1465 \mt{val} \; \mt{useMore} : \mt{ctx} ::: \{\mt{Unit}\} \to \mt{use_1} ::: \{\mt{Type}\} \to \mt{use_2} ::: \{\mt{Type}\} \to \mt{bind} ::: \{\mt{Type}\} \to \lambda [\mt{use_1} \sim \mt{use_2}] \\
adamc@547	1466 \hspace{.1in} \Rightarrow \mt{xml} \; \mt{ctx} \; \mt{use_1} \; \mt{bind} \to \mt{xml} \; \mt{ctx} \; (\mt{use_1} \rc \mt{use_2}) \; \mt{bind}
adamc@547	1467 \end{array}$$
adamc@547	1468
adamc@547	1469 We will not list here the different HTML tags and related functions from the standard library. They should be easy enough to understand from the code in \texttt{basis.urs}. The set of tags in the library is not yet claimed to be complete for HTML standards.
adamc@547	1470
adamc@547	1471 One last useful function is for aborting any page generation, returning some XML as an error message. This function takes the place of some uses of a general exception mechanism.
adamc@547	1472 $$\begin{array}{l}
adamc@547	1473 \mt{val} \; \mt{error} : \mt{t} ::: \mt{Type} \to \mt{xml} \; [\mt{Body}] \; [] \; [] \to \mt{t}
adamc@547	1474 \end{array}$$
adamc@547	1475
adamc@549	1476
adamc@701	1477 \subsection{Client-Side Programming}
adamc@659	1478
adamc@701	1479 Ur/Web supports running code on web browsers, via automatic compilation to JavaScript.
adamc@701	1480
adamc@701	1481 \subsubsection{The Basics}
adamc@701	1482
adamc@701	1483 Clients can open alert dialog boxes, in the usual annoying JavaScript way.
adamc@701	1484 $$\begin{array}{l}
adamc@701	1485 \mt{val} \; \mt{alert} : \mt{string} \to \mt{transaction} \; \mt{unit}
adamc@701	1486 \end{array}$$
adamc@701	1487
adamc@701	1488 Any transaction may be run in a new thread with the $\mt{spawn}$ function.
adamc@701	1489 $$\begin{array}{l}
adamc@701	1490 \mt{val} \; \mt{spawn} : \mt{transaction} \; \mt{unit} \to \mt{transaction} \; \mt{unit}
adamc@701	1491 \end{array}$$
adamc@701	1492
adamc@701	1493 The current thread can be paused for at least a specified number of milliseconds.
adamc@701	1494 $$\begin{array}{l}
adamc@701	1495 \mt{val} \; \mt{sleep} : \mt{int} \to \mt{transaction} \; \mt{unit}
adamc@701	1496 \end{array}$$
adamc@701	1497
adamc@701	1498 \subsubsection{Functional-Reactive Page Generation}
adamc@701	1499
adamc@701	1500 Most approaches to ``AJAX''-style coding involve imperative manipulation of the DOM tree representing an HTML document's structure. Ur/Web follows the \emph{functional-reactive} approach instead. Programs may allocate mutable \emph{sources} of arbitrary types, and an HTML page is effectively a pure function over the latest values of the sources. The page is not mutated directly, but rather it changes automatically as the sources are mutated.
adamc@659	1501
adamc@659	1502 $$\begin{array}{l}
adamc@659	1503 \mt{con} \; \mt{source} :: \mt{Type} \to \mt{Type} \\
adamc@659	1504 \mt{val} \; \mt{source} : \mt{t} ::: \mt{Type} \to \mt{t} \to \mt{transaction} \; (\mt{source} \; \mt{t}) \\
adamc@659	1505 \mt{val} \; \mt{set} : \mt{t} ::: \mt{Type} \to \mt{source} \; \mt{t} \to \mt{t} \to \mt{transaction} \; \mt{unit} \\
adamc@659	1506 \mt{val} \; \mt{get} : \mt{t} ::: \mt{Type} \to \mt{source} \; \mt{t} \to \mt{transaction} \; \mt{t}
adamc@659	1507 \end{array}$$
adamc@659	1508
adamc@659	1509 Pure functions over sources are represented in a monad of \emph{signals}.
adamc@659	1510
adamc@659	1511 $$\begin{array}{l}
adamc@659	1512 \mt{con} \; \mt{signal} :: \mt{Type} \to \mt{Type} \\
adamc@659	1513 \mt{val} \; \mt{signal\_monad} : \mt{monad} \; \mt{signal} \\
adamc@659	1514 \mt{val} \; \mt{signal} : \mt{t} ::: \mt{Type} \to \mt{source} \; \mt{t} \to \mt{signal} \; \mt{t}
adamc@659	1515 \end{array}$$
adamc@659	1516
adamc@659	1517 A reactive portion of an HTML page is injected with a $\mt{dyn}$ tag, which has a signal-valued attribute $\mt{Signal}$.
adamc@659	1518
adamc@659	1519 $$\begin{array}{l}
adamc@701	1520 \mt{val} \; \mt{dyn} : \mt{use} ::: \{\mt{Type}\} \to \mt{bind} ::: \{\mt{Type}\} \to \mt{unit} \\
adamc@701	1521 \hspace{.1in} \to \mt{tag} \; [\mt{Signal} = \mt{signal} \; (\mt{xml} \; \mt{body} \; \mt{use} \; \mt{bind})] \; \mt{body} \; [] \; \mt{use} \; \mt{bind}
adamc@659	1522 \end{array}$$
adamc@659	1523
adamc@701	1524 Transactions can be run on the client by including them in attributes like the $\mt{Onclick}$ attribute of $\mt{button}$, and GUI widgets like $\mt{ctextbox}$ have $\mt{Source}$ attributes that can be used to connect them to sources, so that their values can be read by code running because of, e.g., an $\mt{Onclick}$ event.
adamc@701	1525
adamc@701	1526 \subsubsection{Asynchronous Message-Passing}
adamc@701	1527
adamc@701	1528 To support asynchronous, ``server push'' delivery of messages to clients, any client that might need to receive an asynchronous message is assigned a unique ID. These IDs may be retrieved both on the client and on the server, during execution of code related to a client.
adamc@701	1529
adamc@701	1530 $$\begin{array}{l}
adamc@701	1531 \mt{type} \; \mt{client} \\
adamc@701	1532 \mt{val} \; \mt{self} : \mt{transaction} \; \mt{client}
adamc@701	1533 \end{array}$$
adamc@701	1534
adamc@701	1535 \emph{Channels} are the means of message-passing. Each channel is created in the context of a client and belongs to that client; no other client may receive the channel's messages. Each channel type includes the type of values that may be sent over the channel. Sending and receiving are asynchronous, in the sense that a client need not be ready to receive a message right away. Rather, sent messages may queue up, waiting to be processed.
adamc@701	1536
adamc@701	1537 $$\begin{array}{l}
adamc@701	1538 \mt{con} \; \mt{channel} :: \mt{Type} \to \mt{Type} \\
adamc@701	1539 \mt{val} \; \mt{channel} : \mt{t} ::: \mt{Type} \to \mt{transaction} \; (\mt{channel} \; \mt{t}) \\
adamc@701	1540 \mt{val} \; \mt{send} : \mt{t} ::: \mt{Type} \to \mt{channel} \; \mt{t} \to \mt{t} \to \mt{transaction} \; \mt{unit} \\
adamc@701	1541 \mt{val} \; \mt{recv} : \mt{t} ::: \mt{Type} \to \mt{channel} \; \mt{t} \to \mt{transaction} \; \mt{t}
adamc@701	1542 \end{array}$$
adamc@701	1543
adamc@701	1544 The $\mt{channel}$ and $\mt{send}$ operations may only be executed on the server, and $\mt{recv}$ may only be executed on a client. Neither clients nor channels may be passed as arguments from clients to server-side functions, so persistent channels can only be maintained by storing them in the database and looking them up using the current client ID or some application-specific value as a key.
adamc@701	1545
adamc@701	1546 Clients and channels live only as long as the web browser page views that they are associated with. When a user surfs away, his client and its channels will be garbage-collected, after that user is not heard from for the timeout period. Garbage collection deletes any database row that contains a client or channel directly. Any reference to one of these types inside an $\mt{option}$ is set to $\mt{None}$ instead. Both kinds of handling have the flavor of weak pointers, and that is a useful way to think about clients and channels in the database.
adamc@701	1547
adamc@659	1548
adamc@549	1549 \section{Ur/Web Syntax Extensions}
adamc@549	1550
adamc@549	1551 Ur/Web features some syntactic shorthands for building values using the functions from the last section. This section sketches the grammar of those extensions. We write spans of syntax inside brackets to indicate that they are optional.
adamc@549	1552
adamc@549	1553 \subsection{SQL}
adamc@549	1554
adamc@549	1555 \subsubsection{Queries}
adamc@549	1556
adamc@550	1557 Queries $Q$ are added to the rules for expressions $e$.
adamc@550	1558
adamc@549	1559 $$\begin{array}{rrcll}
adamc@550	1560 \textrm{Queries} & Q &::=& (q \; [\mt{ORDER} \; \mt{BY} \; (E \; [o],)^+] \; [\mt{LIMIT} \; N] \; [\mt{OFFSET} \; N]) \\
adamc@549	1561 \textrm{Pre-queries} & q &::=& \mt{SELECT} \; P \; \mt{FROM} \; T,^+ \; [\mt{WHERE} \; E] \; [\mt{GROUP} \; \mt{BY} \; p,^+] \; [\mt{HAVING} \; E] \\
adamc@549	1562 &&& \mid q \; R \; q \\
adamc@549	1563 \textrm{Relational operators} & R &::=& \mt{UNION} \mid \mt{INTERSECT} \mid \mt{EXCEPT}
adamc@549	1564 \end{array}$$
adamc@549	1565
adamc@549	1566 $$\begin{array}{rrcll}
adamc@549	1567 \textrm{Projections} & P &::=& \ast & \textrm{all columns} \\
adamc@549	1568 &&& p,^+ & \textrm{particular columns} \\
adamc@549	1569 \textrm{Pre-projections} & p &::=& t.f & \textrm{one column from a table} \\
adamc@558	1570 &&& t.\{\{c\}\} & \textrm{a record of columns from a table (of kind $\{\mt{Type}\}$)} \\
adamc@549	1571 \textrm{Table names} & t &::=& x & \textrm{constant table name (automatically capitalized)} \\
adamc@549	1572 &&& X & \textrm{constant table name} \\
adamc@549	1573 &&& \{\{c\}\} & \textrm{computed table name (of kind $\mt{Name}$)} \\
adamc@549	1574 \textrm{Column names} & f &::=& X & \textrm{constant column name} \\
adamc@549	1575 &&& \{c\} & \textrm{computed column name (of kind $\mt{Name}$)} \\
adamc@549	1576 \textrm{Tables} & T &::=& x & \textrm{table variable, named locally by its own capitalization} \\
adamc@549	1577 &&& x \; \mt{AS} \; t & \textrm{table variable, with local name} \\
adamc@549	1578 &&& \{\{e\}\} \; \mt{AS} \; t & \textrm{computed table expression, with local name} \\
adamc@549	1579 \textrm{SQL expressions} & E &::=& p & \textrm{column references} \\
adamc@549	1580 &&& X & \textrm{named expression references} \\
adamc@549	1581 &&& \{\{e\}\} & \textrm{injected native Ur expressions} \\
adamc@549	1582 &&& \{e\} & \textrm{computed expressions, probably using $\mt{sql\_exp}$ directly} \\
adamc@549	1583 &&& \mt{TRUE} \mid \mt{FALSE} & \textrm{boolean constants} \\
adamc@549	1584 &&& \ell & \textrm{primitive type literals} \\
adamc@549	1585 &&& \mt{NULL} & \textrm{null value (injection of $\mt{None}$)} \\
adamc@549	1586 &&& E \; \mt{IS} \; \mt{NULL} & \textrm{nullness test} \\
adamc@549	1587 &&& n & \textrm{nullary operators} \\
adamc@549	1588 &&& u \; E & \textrm{unary operators} \\
adamc@549	1589 &&& E \; b \; E & \textrm{binary operators} \\
adamc@549	1590 &&& \mt{COUNT}(\ast) & \textrm{count number of rows} \\
adamc@549	1591 &&& a(E) & \textrm{other aggregate function} \\
adamc@549	1592 &&& (E) & \textrm{explicit precedence} \\
adamc@549	1593 \textrm{Nullary operators} & n &::=& \mt{CURRENT\_TIMESTAMP} \\
adamc@549	1594 \textrm{Unary operators} & u &::=& \mt{NOT} \\
adamc@549	1595 \textrm{Binary operators} & b &::=& \mt{AND} \mid \mt{OR} \mid \neq \mid < \mid \leq \mid > \mid \geq \\
adamc@549	1596 \textrm{Aggregate functions} & a &::=& \mt{AVG} \mid \mt{SUM} \mid \mt{MIN} \mid \mt{MAX} \\
adamc@550	1597 \textrm{Directions} & o &::=& \mt{ASC} \mid \mt{DESC} \\
adamc@549	1598 \textrm{SQL integer} & N &::=& n \mid \{e\} \\
adamc@549	1599 \end{array}$$
adamc@549	1600
adamc@549	1601 Additionally, an SQL expression may be inserted into normal Ur code with the syntax $(\mt{SQL} \; E)$ or $(\mt{WHERE} \; E)$.
adamc@549	1602
adamc@550	1603 \subsubsection{DML}
adamc@550	1604
adamc@550	1605 DML commands $D$ are added to the rules for expressions $e$.
adamc@550	1606
adamc@550	1607 $$\begin{array}{rrcll}
adamc@550	1608 \textrm{Commands} & D &::=& (\mt{INSERT} \; \mt{INTO} \; T^E \; (f,^+) \; \mt{VALUES} \; (E,^+)) \\
adamc@550	1609 &&& (\mt{UPDATE} \; T^E \; \mt{SET} \; (f = E,)^+ \; \mt{WHERE} \; E) \\
adamc@550	1610 &&& (\mt{DELETE} \; \mt{FROM} \; T^E \; \mt{WHERE} \; E) \\
adamc@550	1611 \textrm{Table expressions} & T^E &::=& x \mid \{\{e\}\}
adamc@550	1612 \end{array}$$
adamc@550	1613
adamc@550	1614 Inside $\mt{UPDATE}$ and $\mt{DELETE}$ commands, lone variables $X$ are interpreted as references to columns of the implicit table $\mt{T}$, rather than to named expressions.
adamc@549	1615
adamc@551	1616 \subsection{XML}
adamc@551	1617
adamc@551	1618 XML fragments $L$ are added to the rules for expressions $e$.
adamc@551	1619
adamc@551	1620 $$\begin{array}{rrcll}
adamc@551	1621 \textrm{XML fragments} & L &::=& \texttt{<xml/>} \mid \texttt{<xml>}l^*\texttt{</xml>} \\
adamc@551	1622 \textrm{XML pieces} & l &::=& \textrm{text} & \textrm{cdata} \\
adamc@551	1623 &&& \texttt{<}g\texttt{/>} & \textrm{tag with no children} \\
adamc@551	1624 &&& \texttt{<}g\texttt{>}l^*\texttt{</}x\texttt{>} & \textrm{tag with children} \\
adamc@559	1625 &&& \{e\} & \textrm{computed XML fragment} \\
adamc@559	1626 &&& \{[e]\} & \textrm{injection of an Ur expression, via the $\mt{Top}.\mt{txt}$ function} \\
adamc@551	1627 \textrm{Tag} & g &::=& h \; (x = v)^* \\
adamc@551	1628 \textrm{Tag head} & h &::=& x & \textrm{tag name} \\
adamc@551	1629 &&& h\{c\} & \textrm{constructor parameter} \\
adamc@551	1630 \textrm{Attribute value} & v &::=& \ell & \textrm{literal value} \\
adamc@551	1631 &&& \{e\} & \textrm{computed value} \\
adamc@551	1632 \end{array}$$
adamc@551	1633
adamc@552	1634
adamc@553	1635 \section{The Structure of Web Applications}
adamc@553	1636
adamc@553	1637 A web application is built from a series of modules, with one module, the last one appearing in the \texttt{.urp} file, designated as the main module. The signature of the main module determines the URL entry points to the application. Such an entry point should have type $\mt{unit} \to \mt{transaction} \; \mt{page}$, where $\mt{page}$ is a type synonym for top-level HTML pages, defined in $\mt{Basis}$. If such a function is at the top level of main module $M$, it will be accessible at URI \texttt{/M/f}, and so on for more deeply-nested functions, as described in Section \ref{tag} below.
adamc@553	1638
adamc@553	1639 When the standalone web server receives a request for a known page, it calls the function for that page, ``running'' the resulting transaction to produce the page to return to the client. Pages link to other pages with the \texttt{link} attribute of the \texttt{a} HTML tag. A link has type $\mt{transaction} \; \mt{page}$, and the semantics of a link are that this transaction should be run to compute the result page, when the link is followed. Link targets are assigned URL names in the same way as top-level entry points.
adamc@553	1640
adamc@553	1641 HTML forms are handled in a similar way. The $\mt{action}$ attribute of a $\mt{submit}$ form tag takes a value of type $\$\mt{use} \to \mt{transaction} \; \mt{page}$, where $\mt{use}$ is a kind-$\{\mt{Type}\}$ record of the form fields used by this action handler. Action handlers are assigned URL patterns in the same way as above.
adamc@553	1642
adamc@558	1643 For both links and actions, direct arguments and local variables mentioned implicitly via closures are automatically included in serialized form in URLs, in the order in which they appear in the source code.
adamc@553	1644
adamc@660	1645 Ur/Web programs generally mix server- and client-side code in a fairly transparent way. The one important restriction is that mixed client-server code must encapsulate all server-side pieces within named functions. This is because execution of such pieces will be implemented by explicit calls to the remote web server, and it is useful to get the programmer's help in designing the interface to be used. For example, this makes it easier to allow a client running an old version of an application to continue interacting with a server that has been upgraded to a new version, if the programmer took care to keep the interfaces of all of the old remote calls the same. The functions implementing these services are assigned names in the same way as normal web entry points, by using module structure.
adamc@660	1646
adamc@553	1647
adamc@552	1648 \section{Compiler Phases}
adamc@552	1649
adamc@552	1650 The Ur/Web compiler is unconventional in that it relies on a kind of \emph{heuristic compilation}. Not all valid programs will compile successfully. Informally, programs fail to compile when they are ``too higher order.'' Compiler phases do their best to eliminate different kinds of higher order-ness, but some programs just won't compile. This is a trade-off for producing very efficient executables. Compiled Ur/Web programs use native C representations and require no garbage collection.
adamc@552	1651
adamc@552	1652 In this section, we step through the main phases of compilation, noting what consequences each phase has for effective programming.
adamc@552	1653
adamc@552	1654 \subsection{Parse}
adamc@552	1655
adamc@552	1656 The compiler reads a \texttt{.urp} file, figures out which \texttt{.urs} and \texttt{.ur} files it references, and combines them all into what is conceptually a single sequence of declarations in the core language of Section \ref{core}.
adamc@552	1657
adamc@552	1658 \subsection{Elaborate}
adamc@552	1659
adamc@552	1660 This is where type inference takes place, translating programs into an explicit form with no more wildcards. This phase is the most likely source of compiler error messages.
adamc@552	1661
adamc@552	1662 \subsection{Unnest}
adamc@552	1663
adamc@552	1664 Named local function definitions are moved to the top level, to avoid the need to generate closures.
adamc@552	1665
adamc@552	1666 \subsection{Corify}
adamc@552	1667
adamc@552	1668 Module system features are compiled away, through inlining of functor definitions at application sites. Afterward, most abstraction boundaries are broken, facilitating optimization.
adamc@552	1669
adamc@552	1670 \subsection{Especialize}
adamc@552	1671
adamc@552	1672 Functions are specialized to particular argument patterns. This is an important trick for avoiding the need to maintain any closures at runtime.
adamc@552	1673
adamc@552	1674 \subsection{Untangle}
adamc@552	1675
adamc@552	1676 Remove unnecessary mutual recursion, splitting recursive groups into strongly-connected components.
adamc@552	1677
adamc@552	1678 \subsection{Shake}
adamc@552	1679
adamc@552	1680 Remove all definitions not needed to run the page handlers that are visible in the signature of the last module listed in the \texttt{.urp} file.
adamc@552	1681
adamc@661	1682 \subsection{Rpcify}
adamc@661	1683
adamc@661	1684 Pieces of code are determined to be client-side, server-side, neither, or both, by figuring out which standard library functions might be needed to execute them. Calls to server-side functions (e.g., $\mt{query}$) within mixed client-server code are identified and replaced with explicit remote calls. Some mixed functions may be converted to continuation-passing style to facilitate this transformation.
adamc@661	1685
adamc@661	1686 \subsection{Untangle, Shake}
adamc@661	1687
adamc@661	1688 Repeat these simplifications.
adamc@661	1689
adamc@553	1690 \subsection{\label{tag}Tag}
adamc@552	1691
adamc@552	1692 Assign a URL name to each link and form action. It is important that these links and actions are written as applications of named functions, because such names are used to generate URL patterns. A URL pattern has a name built from the full module path of the named function, followed by the function name, with all pieces separated by slashes. The path of a functor application is based on the name given to the result, rather than the path of the functor itself.
adamc@552	1693
adamc@552	1694 \subsection{Reduce}
adamc@552	1695
adamc@552	1696 Apply definitional equality rules to simplify the program as much as possible. This effectively includes inlining of every non-recursive definition.
adamc@552	1697
adamc@552	1698 \subsection{Unpoly}
adamc@552	1699
adamc@552	1700 This phase specializes polymorphic functions to the specific arguments passed to them in the program. If the program contains real polymorphic recursion, Unpoly will be insufficient to avoid later error messages about too much polymorphism.
adamc@552	1701
adamc@552	1702 \subsection{Specialize}
adamc@552	1703
adamc@558	1704 Replace uses of parameterized datatypes with versions specialized to specific parameters. As for Unpoly, this phase will not be effective enough in the presence of polymorphic recursion or other fancy uses of impredicative polymorphism.
adamc@552	1705
adamc@552	1706 \subsection{Shake}
adamc@552	1707
adamc@558	1708 Here the compiler repeats the earlier Shake phase.
adamc@552	1709
adamc@552	1710 \subsection{Monoize}
adamc@552	1711
adamc@552	1712 Programs are translated to a new intermediate language without polymorphism or non-$\mt{Type}$ constructors. Error messages may pop up here if earlier phases failed to remove such features.
adamc@552	1713
adamc@552	1714 This is the stage at which concrete names are generated for cookies, tables, and sequences. They are named following the same convention as for links and actions, based on module path information saved from earlier stages. Table and sequence names separate path elements with underscores instead of slashes, and they are prefixed by \texttt{uw\_}.
adamc@664	1715
adamc@552	1716 \subsection{MonoOpt}
adamc@552	1717
adamc@552	1718 Simple algebraic laws are applied to simplify the program, focusing especially on efficient imperative generation of HTML pages.
adamc@552	1719
adamc@552	1720 \subsection{MonoUntangle}
adamc@552	1721
adamc@552	1722 Unnecessary mutual recursion is broken up again.
adamc@552	1723
adamc@552	1724 \subsection{MonoReduce}
adamc@552	1725
adamc@552	1726 Equivalents of the definitional equality rules are applied to simplify programs, with inlining again playing a major role.
adamc@552	1727
adamc@552	1728 \subsection{MonoShake, MonoOpt}
adamc@552	1729
adamc@552	1730 Unneeded declarations are removed, and basic optimizations are repeated.
adamc@552	1731
adamc@552	1732 \subsection{Fuse}
adamc@552	1733
adamc@552	1734 The compiler tries to simplify calls to recursive functions whose results are immediately written as page output. The write action is pushed inside the function definitions to avoid allocation of intermediate results.
adamc@552	1735
adamc@552	1736 \subsection{MonoUntangle, MonoShake}
adamc@552	1737
adamc@552	1738 Fuse often creates more opportunities to remove spurious mutual recursion.
adamc@552	1739
adamc@552	1740 \subsection{Pathcheck}
adamc@552	1741
adamc@552	1742 The compiler checks that no link or action name has been used more than once.
adamc@552	1743
adamc@552	1744 \subsection{Cjrize}
adamc@552	1745
adamc@552	1746 The program is translated to what is more or less a subset of C. If any use of functions as data remains at this point, the compiler will complain.
adamc@552	1747
adamc@552	1748 \subsection{C Compilation and Linking}
adamc@552	1749
adamc@552	1750 The output of the last phase is pretty-printed as C source code and passed to GCC.
adamc@552	1751
adamc@552	1752
adamc@524	1753 \end{document}

Mercurial > urweb

annotate doc/manual.tex @ 783:ec0a0dd0ca12