Mercurial > urweb
comparison doc/manual.tex @ 552:215d719836dc
Compiler phases
author | Adam Chlipala <adamc@hcoop.net> |
---|---|
date | Sun, 07 Dec 2008 14:50:03 -0500 |
parents | 8fb99fec35f6 |
children | effd620d90da |
comparison
equal
deleted
inserted
replaced
551:8fb99fec35f6 | 552:215d719836dc |
---|---|
45 | 45 |
46 We write $\ell$ for literals of the primitive types, for the most part following C conventions. There are $\mt{int}$, $\mt{float}$, and $\mt{string}$ literals. | 46 We write $\ell$ for literals of the primitive types, for the most part following C conventions. There are $\mt{int}$, $\mt{float}$, and $\mt{string}$ literals. |
47 | 47 |
48 This version of the manual doesn't include operator precedences; see \texttt{src/urweb.grm} for that. | 48 This version of the manual doesn't include operator precedences; see \texttt{src/urweb.grm} for that. |
49 | 49 |
50 \subsection{Core Syntax} | 50 \subsection{\label{core}Core Syntax} |
51 | 51 |
52 \emph{Kinds} classify types and other compile-time-only entities. Each kind in the grammar is listed with a description of the sort of data it classifies. | 52 \emph{Kinds} classify types and other compile-time-only entities. Each kind in the grammar is listed with a description of the sort of data it classifies. |
53 $$\begin{array}{rrcll} | 53 $$\begin{array}{rrcll} |
54 \textrm{Kinds} & \kappa &::=& \mt{Type} & \textrm{proper types} \\ | 54 \textrm{Kinds} & \kappa &::=& \mt{Type} & \textrm{proper types} \\ |
55 &&& \mt{Unit} & \textrm{the trivial constructor} \\ | 55 &&& \mt{Unit} & \textrm{the trivial constructor} \\ |
1327 &&& h\{c\} & \textrm{constructor parameter} \\ | 1327 &&& h\{c\} & \textrm{constructor parameter} \\ |
1328 \textrm{Attribute value} & v &::=& \ell & \textrm{literal value} \\ | 1328 \textrm{Attribute value} & v &::=& \ell & \textrm{literal value} \\ |
1329 &&& \{e\} & \textrm{computed value} \\ | 1329 &&& \{e\} & \textrm{computed value} \\ |
1330 \end{array}$$ | 1330 \end{array}$$ |
1331 | 1331 |
1332 | |
1333 \section{Compiler Phases} | |
1334 | |
1335 The Ur/Web compiler is unconventional in that it relies on a kind of \emph{heuristic compilation}. Not all valid programs will compile successfully. Informally, programs fail to compile when they are ``too higher order.'' Compiler phases do their best to eliminate different kinds of higher order-ness, but some programs just won't compile. This is a trade-off for producing very efficient executables. Compiled Ur/Web programs use native C representations and require no garbage collection. | |
1336 | |
1337 In this section, we step through the main phases of compilation, noting what consequences each phase has for effective programming. | |
1338 | |
1339 \subsection{Parse} | |
1340 | |
1341 The compiler reads a \texttt{.urp} file, figures out which \texttt{.urs} and \texttt{.ur} files it references, and combines them all into what is conceptually a single sequence of declarations in the core language of Section \ref{core}. | |
1342 | |
1343 \subsection{Elaborate} | |
1344 | |
1345 This is where type inference takes place, translating programs into an explicit form with no more wildcards. This phase is the most likely source of compiler error messages. | |
1346 | |
1347 \subsection{Unnest} | |
1348 | |
1349 Named local function definitions are moved to the top level, to avoid the need to generate closures. | |
1350 | |
1351 \subsection{Corify} | |
1352 | |
1353 Module system features are compiled away, through inlining of functor definitions at application sites. Afterward, most abstraction boundaries are broken, facilitating optimization. | |
1354 | |
1355 \subsection{Especialize} | |
1356 | |
1357 Functions are specialized to particular argument patterns. This is an important trick for avoiding the need to maintain any closures at runtime. | |
1358 | |
1359 \subsection{Untangle} | |
1360 | |
1361 Remove unnecessary mutual recursion, splitting recursive groups into strongly-connected components. | |
1362 | |
1363 \subsection{Shake} | |
1364 | |
1365 Remove all definitions not needed to run the page handlers that are visible in the signature of the last module listed in the \texttt{.urp} file. | |
1366 | |
1367 \subsection{Tag} | |
1368 | |
1369 Assign a URL name to each link and form action. It is important that these links and actions are written as applications of named functions, because such names are used to generate URL patterns. A URL pattern has a name built from the full module path of the named function, followed by the function name, with all pieces separated by slashes. The path of a functor application is based on the name given to the result, rather than the path of the functor itself. | |
1370 | |
1371 \subsection{Reduce} | |
1372 | |
1373 Apply definitional equality rules to simplify the program as much as possible. This effectively includes inlining of every non-recursive definition. | |
1374 | |
1375 \subsection{Unpoly} | |
1376 | |
1377 This phase specializes polymorphic functions to the specific arguments passed to them in the program. If the program contains real polymorphic recursion, Unpoly will be insufficient to avoid later error messages about too much polymorphism. | |
1378 | |
1379 \subsection{Specialize} | |
1380 | |
1381 Replace uses of parametrized datatypes with versions specialized to specific parameters. As for Unpoly, this phase will not be effective enough in the presence of polymorphic recursion or other fancy uses of impredicative polymorphism. | |
1382 | |
1383 \subsection{Shake} | |
1384 | |
1385 Here the compiler repeats the earlier shake phase. | |
1386 | |
1387 \subsection{Monoize} | |
1388 | |
1389 Programs are translated to a new intermediate language without polymorphism or non-$\mt{Type}$ constructors. Error messages may pop up here if earlier phases failed to remove such features. | |
1390 | |
1391 This is the stage at which concrete names are generated for cookies, tables, and sequences. They are named following the same convention as for links and actions, based on module path information saved from earlier stages. Table and sequence names separate path elements with underscores instead of slashes, and they are prefixed by \texttt{uw\_}. | |
1392 \subsection{MonoOpt} | |
1393 | |
1394 Simple algebraic laws are applied to simplify the program, focusing especially on efficient imperative generation of HTML pages. | |
1395 | |
1396 \subsection{MonoUntangle} | |
1397 | |
1398 Unnecessary mutual recursion is broken up again. | |
1399 | |
1400 \subsection{MonoReduce} | |
1401 | |
1402 Equivalents of the definitional equality rules are applied to simplify programs, with inlining again playing a major role. | |
1403 | |
1404 \subsection{MonoShake, MonoOpt} | |
1405 | |
1406 Unneeded declarations are removed, and basic optimizations are repeated. | |
1407 | |
1408 \subsection{Fuse} | |
1409 | |
1410 The compiler tries to simplify calls to recursive functions whose results are immediately written as page output. The write action is pushed inside the function definitions to avoid allocation of intermediate results. | |
1411 | |
1412 \subsection{MonoUntangle, MonoShake} | |
1413 | |
1414 Fuse often creates more opportunities to remove spurious mutual recursion. | |
1415 | |
1416 \subsection{Pathcheck} | |
1417 | |
1418 The compiler checks that no link or action name has been used more than once. | |
1419 | |
1420 \subsection{Cjrize} | |
1421 | |
1422 The program is translated to what is more or less a subset of C. If any use of functions as data remains at this point, the compiler will complain. | |
1423 | |
1424 \subsection{C Compilation and Linking} | |
1425 | |
1426 The output of the last phase is pretty-printed as C source code and passed to GCC. | |
1427 | |
1428 | |
1332 \end{document} | 1429 \end{document} |