adamc@524: \documentclass{article} adamc@554: \usepackage{fullpage,amsmath,amssymb,proof,url} rmbruijn@1568: \usepackage[T1]{fontenc} vshabanoff@1765: \usepackage{ae,aecompl} adamc@524: \newcommand{\cd}[1]{\texttt{#1}} adamc@524: \newcommand{\mt}[1]{\mathsf{#1}} adamc@524: adamc@524: \newcommand{\rc}{+ \hspace{-.075in} + \;} adam@2107: \newcommand{\rcut}{\; \texttt{-{}-} \;} adam@2107: \newcommand{\rcutM}{\; \texttt{-{}-{}-} \;} adamc@524: julian@2134: \usepackage{hyperref} julian@2134: adamc@524: \begin{document} adamc@524: adamc@524: \title{The Ur/Web Manual} adamc@524: \author{Adam Chlipala} adamc@524: adamc@524: \maketitle adamc@524: adamc@540: \tableofcontents adamc@540: adamc@554: adamc@554: \section{Introduction} adamc@554: adam@1797: \emph{Ur} is a programming language designed to introduce richer type system features into functional programming in the tradition of ML and Haskell. Ur is functional, pure, statically typed, and strict. Ur supports a powerful kind of \emph{metaprogramming} based on \emph{type-level computation with type-level records}. adamc@554: adamc@554: \emph{Ur/Web} is Ur plus a special standard library and associated rules for parsing and optimization. Ur/Web supports construction of dynamic web applications backed by SQL databases. The signature of the standard library is such that well-typed Ur/Web programs ``don't go wrong'' in a very broad sense. Not only do they not crash during particular page generations, but they also may not: adamc@554: adamc@554: \begin{itemize} adamc@554: \item Suffer from any kinds of code-injection attacks adamc@554: \item Return invalid HTML adamc@554: \item Contain dead intra-application links adamc@554: \item Have mismatches between HTML forms and the fields expected by their handlers adamc@652: \item Include client-side code that makes incorrect assumptions about the ``AJAX''-style services that the remote web server provides adamc@554: \item Attempt invalid SQL queries adamc@652: \item Use improper marshaling or unmarshaling in communication with SQL databases or between browsers and web servers adamc@554: \end{itemize} adamc@554: adamc@554: This type safety is just the foundation of the Ur/Web methodology. It is also possible to use metaprogramming to build significant application pieces by analysis of type structure. For instance, the demo includes an ML-style functor for building an admin interface for an arbitrary SQL table. The type system guarantees that the admin interface sub-application that comes out will always be free of the above-listed bugs, no matter which well-typed table description is given as input. adamc@554: adamc@652: The Ur/Web compiler also produces very efficient object code that does not use garbage collection. These compiled programs will often be even more efficient than what most programmers would bother to write in C. The compiler also generates JavaScript versions of client-side code, with no need to write those parts of applications in a different language. adamc@554: adamc@554: \medskip adamc@554: adamc@554: The official web site for Ur is: adamc@554: \begin{center} adamc@554: \url{http://www.impredicative.com/ur/} adamc@554: \end{center} adamc@554: adamc@555: adamc@555: \section{Installation} adamc@555: adamc@555: If you are lucky, then the following standard command sequence will suffice for installation, in a directory to which you have unpacked the latest distribution tarball. adamc@555: adamc@555: \begin{verbatim} adamc@555: ./configure adamc@555: make adamc@555: sudo make install adamc@555: \end{verbatim} adamc@555: adam@1523: Some other packages must be installed for the above to work. At a minimum, you need a standard UNIX shell, with standard UNIX tools like sed and GCC (or an alternate C compiler) in your execution path; MLton, the whole-program optimizing compiler for Standard ML; and the development files for the OpenSSL C library. As of this writing, in the ``testing'' version of Debian Linux, this command will install the more uncommon of these dependencies: adamc@896: \begin{verbatim} adam@1368: apt-get install mlton libssl-dev adamc@896: \end{verbatim} adamc@555: adam@2016: Note that, like the Ur/Web compiler, MLton is a whole-program optimizing compiler, so it frequently requires much more memory than old-fashioned compilers do. Expect building Ur/Web with MLton to require not much less than a gigabyte of RAM. If a \texttt{mlton} invocation ends suspiciously, the most likely explanation is that it has exhausted available memory. adam@2016: adamc@896: To build programs that access SQL databases, you also need one of these client libraries for supported backends. adamc@555: \begin{verbatim} adam@1960: apt-get install libpq-dev libmysqlclient-dev libsqlite3-dev adamc@555: \end{verbatim} adamc@555: adamc@555: It is also possible to access the modules of the Ur/Web compiler interactively, within Standard ML of New Jersey. To install the prerequisites in Debian testing: adamc@555: \begin{verbatim} adamc@555: apt-get install smlnj libsmlnj-smlnj ml-yacc ml-lpt adamc@555: \end{verbatim} adamc@555: adam@2016: To begin an interactive session with the Ur compiler modules, run \texttt{make smlnj}, and then, from within an \texttt{sml} session, run \texttt{CM.make "src/urweb.cm";}. The \texttt{Compiler} module is the main entry point, and you can find its signature in \texttt{src/compiler.sig}. adamc@555: adamc@896: To run an SQL-backed application with a backend besides SQLite, you will probably want to install one of these servers. adamc@555: adamc@555: \begin{verbatim} adam@1960: apt-get install postgresql mysql-server adamc@555: \end{verbatim} adamc@555: adamc@555: To use the Emacs mode, you must have a modern Emacs installed. We assume that you already know how to do this, if you're in the business of looking for an Emacs mode. The demo generation facility of the compiler will also call out to Emacs to syntax-highlight code, and that process depends on the \texttt{htmlize} module, which can be installed in Debian testing via: adamc@555: adamc@555: \begin{verbatim} adamc@555: apt-get install emacs-goodies-el adamc@555: \end{verbatim} adamc@555: adam@1441: If you don't want to install the Emacs mode, run \texttt{./configure} with the argument \texttt{--without-emacs}. adam@1441: adam@1523: Even with the right packages installed, configuration and building might fail to work. After you run \texttt{./configure}, you will see the values of some named environment variables printed. You may need to adjust these values to get proper installation for your system. To change a value, store your preferred alternative in the corresponding UNIX environment variable, before running \texttt{./configure}. For instance, here is how to change the list of extra arguments that the Ur/Web compiler will pass to the C compiler and linker on every invocation. Some older GCC versions need this setting to mask a bug in function inlining. adamc@555: adamc@555: \begin{verbatim} adam@1523: CCARGS=-fno-inline ./configure adamc@555: \end{verbatim} adamc@555: adam@1523: Since the author is still getting a handle on the GNU Autotools that provide the build system, you may need to do some further work to get started, especially in environments with significant differences from Linux (where most testing is done). The variables \texttt{PGHEADER}, \texttt{MSHEADER}, and \texttt{SQHEADER} may be used to set the proper C header files to include for the development libraries of PostgreSQL, MySQL, and SQLite, respectively. To get libpq to link, one OS X user reported setting \texttt{CCARGS="-I/opt/local/include -L/opt/local/lib/postgresql84"}, after creating a symbolic link with \texttt{ln -s /opt/local/include/postgresql84 /opt/local/include/postgresql}. adamc@555: adamc@555: The Emacs mode can be set to autoload by adding the following to your \texttt{.emacs} file. adamc@555: adamc@555: \begin{verbatim} adamc@555: (add-to-list 'load-path "/usr/local/share/emacs/site-lisp/urweb-mode") adamc@555: (load "urweb-mode-startup") adamc@555: \end{verbatim} adamc@555: adamc@555: Change the path in the first line if you chose a different Emacs installation path during configuration. adamc@555: adamc@555: adamc@556: \section{Command-Line Compiler} adamc@556: adam@1604: \subsection{\label{cl}Project Files} adamc@556: adamc@556: The basic inputs to the \texttt{urweb} compiler are project files, which have the extension \texttt{.urp}. Here is a sample \texttt{.urp} file. adamc@556: adamc@556: \begin{verbatim} adamc@556: database dbname=test adamc@556: sql crud1.sql adamc@556: adamc@556: crud adamc@556: crud1 adamc@556: \end{verbatim} adamc@556: adamc@556: The \texttt{database} line gives the database information string to pass to libpq. In this case, the string only says to connect to a local database named \texttt{test}. adamc@556: adamc@556: The \texttt{sql} line asks for an SQL source file to be generated, giving the commands to run to create the tables and sequences that this application expects to find. After building this \texttt{.urp} file, the following commands could be used to initialize the database, assuming that the current UNIX user exists as a Postgres user with database creation privileges: adamc@556: adamc@556: \begin{verbatim} adamc@556: createdb test adamc@556: psql -f crud1.sql test adamc@556: \end{verbatim} adamc@556: adam@1331: A blank line separates the named directives from a list of modules to include in the project. Any line may contain a shell-script-style comment, where any suffix of a line starting at a hash character \texttt{\#} is ignored. adamc@556: adamc@556: For each entry \texttt{M} in the module list, the file \texttt{M.urs} is included in the project if it exists, and the file \texttt{M.ur} must exist and is always included. adamc@556: adamc@783: Here is the complete list of directive forms. ``FFI'' stands for ``foreign function interface,'' Ur's facility for interaction between Ur programs and C and JavaScript libraries. adamc@783: \begin{itemize} adam@1799: \item \texttt{[allow|deny] [url|mime|requestHeader|responseHeader|env] PATTERN} registers a rule governing which URLs, MIME types, HTTP request headers, HTTP response headers, or environment variable names are allowed to appear explicitly in this application. The first such rule to match a name determines the verdict. If \texttt{PATTERN} ends in \texttt{*}, it is interpreted as a prefix rule. Otherwise, a string must match it exactly. adam@1400: \item \texttt{alwaysInline PATH} requests that every call to the referenced function be inlined. Section \ref{structure} explains how functions are assigned path strings. adam@1462: \item \texttt{benignEffectful Module.ident} registers an FFI function or transaction as having side effects. The optimizer avoids removing, moving, or duplicating calls to such functions. Every effectful FFI function must be registered, or the optimizer may make invalid transformations. This version of the \texttt{effectful} directive registers that this function only has side effects that remain local to a single page generation. adamc@783: \item \texttt{clientOnly Module.ident} registers an FFI function or transaction that may only be run in client browsers. adam@1881: \item \texttt{clientToServer Module.ident} adds FFI type \texttt{Module.ident} to the list of types that are OK to marshal from clients to servers. Values like XML trees and SQL queries are hard to marshal without introducing expensive validity checks, so it's easier to ensure that the server never trusts clients to send such values. The file \texttt{include/urweb/urweb\_cpp.h} shows examples of the C support functions that are required of any type that may be marshalled. These include \texttt{attrify}, \texttt{urlify}, and \texttt{unurlify} functions. adam@1816: \item \texttt{coreInline TREESIZE} sets how many nodes the AST of a function definition may have before the optimizer stops trying hard to inline calls to that function. (This is one of two options for one of two intermediate languages within the compiler.) adamc@783: \item \texttt{database DBSTRING} sets the string to pass to libpq to open a database connection. adamc@783: \item \texttt{debug} saves some intermediate C files, which is mostly useful to help in debugging the compiler itself. adam@1878: \item \texttt{effectful Module.ident} registers an FFI function or transaction as having side effects. The optimizer avoids removing, moving, or duplicating calls to such functions. This is the default behavior for \texttt{transaction}-based types. adam@2046: \item \texttt{exe FILENAME} sets the filename to which to write the output executable. The default for file \texttt{P.urp} is \texttt{P.exe}. adam@2046: \item \texttt{file URI FILENAME} asks for the application executable to respond to requests for \texttt{URI} by serving a snapshot of the contents of \texttt{FILENAME} as of compile time. That is, the file contents are baked into the executable. System file \texttt{/etc/mime.types} is consulted (again, at compile time) to figure out the right MIME type to suggest in the HTTP response. adam@1881: \item \texttt{ffi FILENAME} reads the file \texttt{FILENAME.urs} to determine the interface to a new FFI module. The name of the module is calculated from \texttt{FILENAME} in the same way as for normal source files. See the files \texttt{include/urweb/urweb\_cpp.h} and \texttt{src/c/urweb.c} for examples of C headers and implementations for FFI modules. In general, every type or value \texttt{Module.ident} becomes \texttt{uw\_Module\_ident} in C. adam@1956: \item \texttt{html5} activates work-in-progress support for generating HTML5 instead of XHTML. For now, this option only affects the first few tokens on any page, which are always the same. adamc@1099: \item \texttt{include FILENAME} adds \texttt{FILENAME} to the list of files to be \texttt{\#include}d in C sources. This is most useful for interfacing with new FFI modules. adam@2198: \item \texttt{jsFile FILENAME} asks to serve the contents of a file as JavaScript. All such content is concatenated into a single file, included via a \texttt{}. adam@1926: adam@1556: \subsubsection{Node IDs} adam@1556: adam@1556: There is an abstract type of node IDs that may be assigned to \cd{id} attributes of most HTML tags. adam@1556: adam@1556: $$\begin{array}{l} adam@1556: \mt{type} \; \mt{id} \\ adam@1556: \mt{val} \; \mt{fresh} : \mt{transaction} \; \mt{id} adam@1556: \end{array}$$ adam@1556: adam@1785: The \cd{fresh} function is allowed on both server and client, but there is no other way to create IDs, which includes lack of a way to force an ID to match a particular string. The main semantic importance of IDs within Ur/Web is in uses of the HTML \cd{