changeset 897:2faf558b2d05

FFI manual section
author Adam Chlipala <adamc@hcoop.net>
date Sat, 18 Jul 2009 15:08:21 -0400
parents 0ae8894d5c97
children d1d0b18afd3d
files CHANGELOG doc/manual.tex src/cjr_print.sml
diffstat 3 files changed, 90 insertions(+), 5 deletions(-) [+]
line wrap: on
line diff
--- a/CHANGELOG	Sat Jul 18 13:46:22 2009 -0400
+++ b/CHANGELOG	Sat Jul 18 15:08:21 2009 -0400
@@ -1,3 +1,12 @@
+========
+20090718
+========
+
+- New application protocols: CGI and FastCGI
+- New database backends: MySQL and SQLite
+- More JavaScript events added to tags in standard library
+- New manual section on using the foreign function interface (FFI)
+
 ========
 20090623
 ========
--- a/doc/manual.tex	Sat Jul 18 13:46:22 2009 -0400
+++ b/doc/manual.tex	Sat Jul 18 15:08:21 2009 -0400
@@ -1940,6 +1940,82 @@
 Ur/Web includes a kind of automatic protection against cross site request forgery attacks.  Whenever any page execution can have side effects and can also read at least one cookie value, all cookie values must be signed cryptographically, to ensure that the user has come to the current page by submitting a form on a real page generated by the proper server.  Signing and signature checking are inserted automatically by the compiler.  This prevents attacks like phishing schemes where users are directed to counterfeit pages with forms that submit to your application, where a user's cookies might be submitted without his knowledge, causing some undesired side effect.
 
 
+\section{The Foreign Function Interface}
+
+It is possible to call your own C and JavaScript code from Ur/Web applications, via the foreign function interface (FFI).  The starting point for a new binding is a \texttt{.urs} signature file that presents your external library as a single Ur/Web module (with no nested modules).  Compilation conventions map the types and values that you use into C and/or JavaScript types and values.
+
+It is most convenient to encapsulate an FFI binding with a new \texttt{.urp} file, which applications can include with the \texttt{library} directive in their own \texttt{.urp} files.  A number of directives are likely to show up in the library's project file.
+
+\begin{itemize}
+\item \texttt{clientOnly Module.ident} registers a value as being allowed only in client-side code.
+\item \texttt{clientToServer Module.ident} declares a type as OK to marshal between clients and servers.  By default, abstract FFI types are not allowed to be marshalled, since your library might be maintaining invariants that the simple serialization code doesn't check.
+\item \texttt{effectful Module.ident} registers a function that can have side effects.  It is important to remember to use this directive for each such function, or else the optimizer might change program semantics.
+\item \texttt{ffi FILE.urs} names the file giving your library's signature.  You can include multiple such files in a single \texttt{.urp} file, and each file \texttt{mod.urp} defines an FFI module \texttt{Mod}.
+\item \texttt{header FILE} requests inclusion of a C header file.
+\item \texttt{jsFunc Module.ident=name} gives a mapping from an Ur name for a value to a JavaScript name.
+\item \texttt{link FILE} requests that \texttt{FILE} be linked into applications.  It should be a C object or library archive file, and you are responsible for generating it with your own build process.
+\item \texttt{script URL} requests inclusion of a JavaScript source file within application HTML.
+\item \texttt{serverOnly Module.ident} registers a value as being allowed only in server-side code.
+\end{itemize}
+
+\subsection{Writing C FFI Code}
+
+A server-side FFI type or value \texttt{Module.ident} must have a corresponding type or value definition \texttt{uw\_Module\_ident} in C code.  With the current Ur/Web version, it's not generally possible to work with Ur records or complex datatypes in C code, but most other kinds of types are fair game.
+
+\begin{itemize}
+  \item Primitive types defined in \texttt{Basis} are themselves using the standard FFI interface, so you may refer to them like \texttt{uw\_Basis\_t}.  See \texttt{include/types.h} for their definitions.
+  \item Enumeration datatypes, which have only constructors that take no arguments, should be defined using C \texttt{enum}s.  The type is named as for any other type identifier, and each constructor \texttt{c} gets an enumeration constant named \texttt{uw\_Module\_c}.
+  \item A datatype \texttt{dt} (such as \texttt{Basis.option}) that has one non-value-carrying constructor \texttt{NC} and one value-carrying constructor \texttt{C} gets special treatment.  Where \texttt{T} is the type of \texttt{C}'s argument, and where we represent \texttt{T} as \texttt{t} in C, we represent \texttt{NC} with \texttt{NULL}.  The representation of \texttt{C} depends on whether we're sure that we don't need to use \texttt{NULL} to represent \texttt{t} values; this condition holds only for strings and complex datatypes.  For such types, \texttt{C v} is represented with the C encoding of \texttt{v}, such that the translation of \texttt{dt} is \texttt{t}.  For other types, \texttt{C v} is represented with a pointer to the C encoding of v, such that the translation of \texttt{dt} is \texttt{t*}.
+\end{itemize}
+
+The C FFI version of a Ur function with type \texttt{T1 -> ... -> TN -> R} or \texttt{T1 -> ... -> TN -> transaction R} has a C prototype like \texttt{R uw\_Module\_ident(uw\_context, T1, ..., TN)}.  Only functions with types of the second form may have side effects.  \texttt{uw\_context} is the type of state that persists across handling a client request.  Many functions that operate on contexts are prototyped in \texttt{include/urweb.h}.  Most should only be used internally by the compiler.  A few are useful in general FFI implementation:
+\begin{itemize}
+  \item \begin{verbatim}
+void uw_error(uw_context, failure_kind, const char *fmt, ...);
+  \end{verbatim}
+  Abort the current request processing, giving a \texttt{printf}-style format string and arguments for generating an error message.  The \texttt{failure\_kind} argument can be \texttt{FATAL}, to abort the whole execution; \texttt{BOUNDED\_RETRY}, to try processing the request again from the beginning, but failing if this happens too many times; or \texttt{UNLIMITED\_RETRY}, to repeat processing, with no cap on how many times this can recur.
+
+  \item \begin{verbatim}
+void uw_push_cleanup(uw_context, void (*func)(void *), void *arg);
+void uw_pop_cleanup(uw_context);
+  \end{verbatim}
+  Manipulate a stack of actions that should be taken if any kind of error condition arises.  Calling the ``pop'' function both removes an action from the stack and executes it.
+
+  \item \begin{verbatim}
+void *uw_malloc(uw_context, size_t);
+  \end{verbatim}
+  A version of \texttt{malloc()} that allocates memory inside a context's heap, which is managed with region allocation.  Thus, there is no \texttt{uw\_free()}, but you need to be careful not to keep ad-hoc C pointers to this area of memory.
+
+  For performance and correctness reasons, it is usually preferable to use \texttt{uw\_malloc()} instead of \texttt{malloc()}.  The former manipulates a local heap that can be kept allocated across page requests, while the latter uses global data structures that may face contention during concurrent execution.
+
+  \item \begin{verbatim}
+typedef void (*uw_callback)(void *);
+void uw_register_transactional(uw_context, void *data, uw_callback commit,
+                               uw_callback rollback, uw_callback free);
+  \end{verbatim}
+  All side effects in Ur/Web programs need to be compatible with transactions, such that any set of actions can be undone at any time.  Thus, you should not perform actions with non-local side effects directly; instead, register handlers to be called when the current transaction is committed or rolled back.  The arguments here give an arbitary piece of data to be passed to callbacks, a function to call on commit, a function to call on rollback, and a function to call afterward in either case to clean up any allocated resources.  A rollback handler may be called after the associated commit handler has already been called, if some later part of the commit process fails.
+
+  To accommodate some stubbornly non-transactional real-world actions like sending an e-mail message, Ur/Web allows the \texttt{rollback} parameter to be \texttt{NULL}.  When a transaction commits, all \texttt{commit} actions that have non-\texttt{NULL} rollback actions are tried before any \texttt{commit} actions that have \texttt{NULL} rollback actions.  Thus, if a single execution uses only one non-transactional action, and if that action never fails partway through its execution while still causing an observable side effect, then Ur/Web can maintain the transactional abstraction.
+\end{itemize}
+
+
+\subsection{Writing JavaScript FFI Code}
+
+JavaScript is dynamically typed, so Ur/Web type definitions imply no JavaScript code.  The JavaScript identifier for each FFI function is set with the \texttt{jsFunc} directive.  Each identifier can be defined in any JavaScript file that you ask to include with the \texttt{script} directive.
+
+In contrast to C FFI code, JavaScript FFI functions take no extra context argument.  Their argument lists are as you would expect from their Ur types.  Only functions whose ranges take the form \texttt{transaction T} should have side effects; the JavaScript ``return type'' of such a function is \texttt{T}.  Here are the conventions for representing Ur values in JavaScript.
+
+\begin{itemize}
+\item Integers, floats, strings, characters, and booleans are represented in the usual JavaScript way.
+\item Ur functions are represented with JavaScript functions, currying and all.  Only named FFI functions are represented with multiple JavaScript arguments.
+\item An Ur record is represented with a JavaScript record, where Ur field name \texttt{N} translates to JavaScript field name \texttt{\_N}.  An exception to this rule is that the empty record is encoded as \texttt{null}.
+\item \texttt{option}-like types receive special handling similar to their handling in C.  The ``\texttt{None}'' constructor is \texttt{null}, and a use of the ``\texttt{Some}'' constructor on a value \texttt{v} is either \texttt{v}, if the underlying type doesn't need to use \texttt{null}; or \texttt{\{v:v\}} otherwise.
+\item Any other datatypes represent a non-value-carrying constructor \texttt{C} as \texttt{"\_C"} and an application of a constructor \texttt{C} to value \texttt{v} as \texttt{\{n:"\_C", v:v\}}.  This rule only applies to datatypes defined in FFI module signatures; the compiler is free to optimize the representations of other, non-\texttt{option}-like datatypes in arbitrary ways.
+\end{itemize}
+
+It is possible to write JavaScript FFI code that interacts with the functional-reactive structure of a document, but this version of the manual doesn't cover the details.
+
+
 \section{Compiler Phases}
 
 The Ur/Web compiler is unconventional in that it relies on a kind of \emph{heuristic compilation}.  Not all valid programs will compile successfully.  Informally, programs fail to compile when they are ``too higher order.''  Compiler phases do their best to eliminate different kinds of higher order-ness, but some programs just won't compile.  This is a trade-off for producing very efficient executables.  Compiled Ur/Web programs use native C representations and require no garbage collection.
--- a/src/cjr_print.sml	Sat Jul 18 13:46:22 2009 -0400
+++ b/src/cjr_print.sml	Sat Jul 18 15:08:21 2009 -0400
@@ -85,11 +85,11 @@
         (case ListUtil.search #3 (!xncs) of
              NONE => raise Fail "CjrPrint: TDatatype marked Option has no constructor with an argument"
            | SOME t =>
-             case #1 t of
-                 TDatatype _ => p_typ' par env t
-               | TFfi ("Basis", "string") => p_typ' par env t
-               | _ => box [p_typ' par env t,
-                           string "*"])
+             if isUnboxable t then
+                 p_typ' par env t
+             else
+                 box [p_typ' par env t,
+                      string "*"])
       | TDatatype (Default, n, _) =>
         (box [string "struct",
               space,