comparison doc/manual.tex @ 552:215d719836dc

Compiler phases
author Adam Chlipala <adamc@hcoop.net>
date Sun, 07 Dec 2008 14:50:03 -0500
parents 8fb99fec35f6
children effd620d90da
comparison
equal deleted inserted replaced
551:8fb99fec35f6 552:215d719836dc
45 45
46 We write $\ell$ for literals of the primitive types, for the most part following C conventions. There are $\mt{int}$, $\mt{float}$, and $\mt{string}$ literals. 46 We write $\ell$ for literals of the primitive types, for the most part following C conventions. There are $\mt{int}$, $\mt{float}$, and $\mt{string}$ literals.
47 47
48 This version of the manual doesn't include operator precedences; see \texttt{src/urweb.grm} for that. 48 This version of the manual doesn't include operator precedences; see \texttt{src/urweb.grm} for that.
49 49
50 \subsection{Core Syntax} 50 \subsection{\label{core}Core Syntax}
51 51
52 \emph{Kinds} classify types and other compile-time-only entities. Each kind in the grammar is listed with a description of the sort of data it classifies. 52 \emph{Kinds} classify types and other compile-time-only entities. Each kind in the grammar is listed with a description of the sort of data it classifies.
53 $$\begin{array}{rrcll} 53 $$\begin{array}{rrcll}
54 \textrm{Kinds} & \kappa &::=& \mt{Type} & \textrm{proper types} \\ 54 \textrm{Kinds} & \kappa &::=& \mt{Type} & \textrm{proper types} \\
55 &&& \mt{Unit} & \textrm{the trivial constructor} \\ 55 &&& \mt{Unit} & \textrm{the trivial constructor} \\
1327 &&& h\{c\} & \textrm{constructor parameter} \\ 1327 &&& h\{c\} & \textrm{constructor parameter} \\
1328 \textrm{Attribute value} & v &::=& \ell & \textrm{literal value} \\ 1328 \textrm{Attribute value} & v &::=& \ell & \textrm{literal value} \\
1329 &&& \{e\} & \textrm{computed value} \\ 1329 &&& \{e\} & \textrm{computed value} \\
1330 \end{array}$$ 1330 \end{array}$$
1331 1331
1332
1333 \section{Compiler Phases}
1334
1335 The Ur/Web compiler is unconventional in that it relies on a kind of \emph{heuristic compilation}. Not all valid programs will compile successfully. Informally, programs fail to compile when they are ``too higher order.'' Compiler phases do their best to eliminate different kinds of higher order-ness, but some programs just won't compile. This is a trade-off for producing very efficient executables. Compiled Ur/Web programs use native C representations and require no garbage collection.
1336
1337 In this section, we step through the main phases of compilation, noting what consequences each phase has for effective programming.
1338
1339 \subsection{Parse}
1340
1341 The compiler reads a \texttt{.urp} file, figures out which \texttt{.urs} and \texttt{.ur} files it references, and combines them all into what is conceptually a single sequence of declarations in the core language of Section \ref{core}.
1342
1343 \subsection{Elaborate}
1344
1345 This is where type inference takes place, translating programs into an explicit form with no more wildcards. This phase is the most likely source of compiler error messages.
1346
1347 \subsection{Unnest}
1348
1349 Named local function definitions are moved to the top level, to avoid the need to generate closures.
1350
1351 \subsection{Corify}
1352
1353 Module system features are compiled away, through inlining of functor definitions at application sites. Afterward, most abstraction boundaries are broken, facilitating optimization.
1354
1355 \subsection{Especialize}
1356
1357 Functions are specialized to particular argument patterns. This is an important trick for avoiding the need to maintain any closures at runtime.
1358
1359 \subsection{Untangle}
1360
1361 Remove unnecessary mutual recursion, splitting recursive groups into strongly-connected components.
1362
1363 \subsection{Shake}
1364
1365 Remove all definitions not needed to run the page handlers that are visible in the signature of the last module listed in the \texttt{.urp} file.
1366
1367 \subsection{Tag}
1368
1369 Assign a URL name to each link and form action. It is important that these links and actions are written as applications of named functions, because such names are used to generate URL patterns. A URL pattern has a name built from the full module path of the named function, followed by the function name, with all pieces separated by slashes. The path of a functor application is based on the name given to the result, rather than the path of the functor itself.
1370
1371 \subsection{Reduce}
1372
1373 Apply definitional equality rules to simplify the program as much as possible. This effectively includes inlining of every non-recursive definition.
1374
1375 \subsection{Unpoly}
1376
1377 This phase specializes polymorphic functions to the specific arguments passed to them in the program. If the program contains real polymorphic recursion, Unpoly will be insufficient to avoid later error messages about too much polymorphism.
1378
1379 \subsection{Specialize}
1380
1381 Replace uses of parametrized datatypes with versions specialized to specific parameters. As for Unpoly, this phase will not be effective enough in the presence of polymorphic recursion or other fancy uses of impredicative polymorphism.
1382
1383 \subsection{Shake}
1384
1385 Here the compiler repeats the earlier shake phase.
1386
1387 \subsection{Monoize}
1388
1389 Programs are translated to a new intermediate language without polymorphism or non-$\mt{Type}$ constructors. Error messages may pop up here if earlier phases failed to remove such features.
1390
1391 This is the stage at which concrete names are generated for cookies, tables, and sequences. They are named following the same convention as for links and actions, based on module path information saved from earlier stages. Table and sequence names separate path elements with underscores instead of slashes, and they are prefixed by \texttt{uw\_}.
1392 \subsection{MonoOpt}
1393
1394 Simple algebraic laws are applied to simplify the program, focusing especially on efficient imperative generation of HTML pages.
1395
1396 \subsection{MonoUntangle}
1397
1398 Unnecessary mutual recursion is broken up again.
1399
1400 \subsection{MonoReduce}
1401
1402 Equivalents of the definitional equality rules are applied to simplify programs, with inlining again playing a major role.
1403
1404 \subsection{MonoShake, MonoOpt}
1405
1406 Unneeded declarations are removed, and basic optimizations are repeated.
1407
1408 \subsection{Fuse}
1409
1410 The compiler tries to simplify calls to recursive functions whose results are immediately written as page output. The write action is pushed inside the function definitions to avoid allocation of intermediate results.
1411
1412 \subsection{MonoUntangle, MonoShake}
1413
1414 Fuse often creates more opportunities to remove spurious mutual recursion.
1415
1416 \subsection{Pathcheck}
1417
1418 The compiler checks that no link or action name has been used more than once.
1419
1420 \subsection{Cjrize}
1421
1422 The program is translated to what is more or less a subset of C. If any use of functions as data remains at this point, the compiler will complain.
1423
1424 \subsection{C Compilation and Linking}
1425
1426 The output of the last phase is pretty-printed as C source code and passed to GCC.
1427
1428
1332 \end{document} 1429 \end{document}