Added references ; tuned README.

Pascal J. Bourguignon [2013-10-20 15:15]
Added references ; tuned README.
diff --git a/README b/README
index a69157c..09c1d94 100644
--- a/README
+++ b/README
@@ -1,6 +1,12 @@
+-*- mode:rst; coding:utf-8 -*-
 LEP - Lisp Empowered Program

+- _``
+- _``
 A Lisp Machine System

@@ -25,7 +31,7 @@ A Lisp System

 When the GNU project started, it was expected to be written (at least
 partially) in lisp.  Notably: "Both C and Lisp will be available as
-system programming languages." [Message-ID: <771@mit-eddie.UUCP>].
+system programming languages." ``[Message-ID: <771@mit-eddie.UUCP>]``.
 Nowadays, the only significant part of the GNU system that's using
 lisp is GNU emacs.

@@ -52,16 +58,20 @@ Let's just do it
 This directory will be filled by lisp system code.

 1- Boot a CL implementation as /bin/init on a unix kernel.  cf.
+   _``

    When Movitz will be advanced enough, we would move to Movitz.
+   See also Hurd or L4.
+   _``

 2- implement services, shells, editors, tools, etc, 100% in lisp.

 3- temporarily, some tools may be implemented in C, eg. X11 server,
-   gcc or clang to compile the linux kernel.  But since the system
+   gcc or clang to compile the linux kernel.
+.. comment:
+   But since the system
    wouldn't contain the usual C libraries set (some exception as
    needed by lisp implementations or lisp programs thru FFI, etc).

@@ -85,14 +95,14 @@ Building blocks:
   may be built upon the code of quicklisp and asdf (quicklisp
   distributions, tar.gz, asdf dependencies, etc).

-  cf.
+  cf. _``

 - user interface:

     - editor (Hemlock, or climacs).

-    - X11 or frame buffer?
+    - X11 or frame buffer? --> CLXS?

     - CLIM?

@@ -112,7 +122,7 @@ Building blocks:
     - mail server

     - dns server
+      _``

     - ddns client/server
       A program to update a dynamic dns with changing IP addresses.
diff --git a/references/ b/references/
new file mode 100644
index 0000000..5815ff3
Binary files /dev/null and b/references/ differ
diff --git a/references/ b/references/
new file mode 100644
index 0000000..af29757
--- /dev/null
+++ b/references/
@@ -0,0 +1,909 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
+<!--Converted with LaTeX2HTML 2002-2-1 (1.71)
+original version by:  Nikos Drakos, CBLU, University of Leeds
+* revised and updated by:  Marcus Hennecke, Ross Moore, Herb Swan
+* with significant contributions from:
+  Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
+<script type="text/javascript">var NREUMQ=NREUMQ||[];NREUMQ.push(["mark","firstbyte",new Date().getTime()]);</script>
+<TITLE>Reverse-Engineering Drivers for Safety and Portability</TITLE>
+<META NAME="description" CONTENT="Reverse-Engineering Drivers for Safety and Portability">
+<META NAME="keywords" CONTENT="hotdep08">
+<META NAME="resource-type" CONTENT="document">
+<META NAME="distribution" CONTENT="global">
+<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
+<META NAME="Generator" CONTENT="LaTeX2HTML v2002-2-1">
+<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">
+<a href=""><img src="../../../../../../../graphics/new_usenix.jpg" width="288" height="232" alt="Check out the new USENIX Web site." align="right"></a>
+<BODY >
+<H1 ALIGN="CENTER">Reverse-Engineering Drivers for Safety and Portability</H1><DIV CLASS="author_info">
+<P ALIGN="CENTER"><STRONG><A HREF="">Vitaly Chipounov</A> and
+<A HREF="">George Candea</A></STRONG></P>
+<P ALIGN="CENTER"><I>&#201;cole Polytechnique F&#233;d&#233;rale de Lausanne (EPFL), Switzerland</I></P>
+<P ALIGN="CENTER">{<A HREF="">vitaly.chipounov</A>, <A HREF="">george.candea</A>}</P>
+Device drivers today lack two important properties: guaranteed
+  safety and cross-platform portability.  We present an approach to
+  incrementally achieving these properties in drivers, without
+  requiring any changes in the drivers or operating system kernels.
+  We describe RevEng, a tool for automatically reverse-engineering a
+  binary driver and synthesizing a new, safe and portable driver that
+  mimics the original one.  The operating system kernel runs the
+  trusted synthetic driver instead of the original, thus avoiding
+  giving untrusted driver code kernel privileges.  Initial results are
+  promising: we reverse-engineered the basic functionality of network
+  drivers in Linux and Windows based solely on their binaries, and we
+  synthesized safe drivers for Linux.  We hope RevEng will eventually
+  persuade hardware vendors to provide verifiable formal
+  specifications instead of binary drivers; such specifications can be
+  used to automatically synthesize safe drivers for every desired
+  platform.
+<H1><A NAME="SECTION00010000000000000000">1. Introduction</A></H1>
+As far as kernel-mode code goes, device drivers are quite buggy.  On
+Linux, for instance, device drivers have 3x to 7x higher
+bug density than the rest of the kernel code&nbsp;[<A
+ HREF="">8</A>].
+Most drivers originate either with hardware manufacturers--sole
+holders of the secrets of a device's internals--or part-time
+open-source contributors.  Writing reliable software and keeping
+abreast of the latest changes in OS interfaces is not the core
+competence of a hardware manufacturer, so writing drivers is typically
+outsourced to third-party companies, which are by now largely
+commoditized and often do not have a quality reputation to uphold.
+Not surprisingly, drivers caused 85% of crashes on Windows
+ HREF="">25</A>] and over one million crashes on Windows
+ HREF="">23</A>].
+It is ironic then that we are comfortable running such code inside our
+kernels, especially if we are at all paranoid about viruses, spyware,
+and other malware.  Buggy drivers not only crash systems, but also
+compromise security.  Last year, for example, a zero-day vulnerability
+was disclosed within a third-party driver shipped with all versions of
+Windows XP: secdrv.sys, developed by Macrovision as part of
+ HREF="">31</A>].  This vulnerability allows non-privileged
+users to elevate their privileges to Local System, leading to complete
+system compromise.
+Driver safety has garnered a lot of merited attention over the years:
+microkernels run drivers in user space&nbsp;[<A
+ HREF="">14</A>],
+virtual machine-based approaches isolate drivers from the OS
+ HREF="">20</A>,<A
+ HREF="">12</A>,<A
+ HREF="">26</A>,<A
+ HREF="">19</A>],
+microdrivers reduce the amount of driver code run in the
+ HREF="">13</A>], and Nooks can mitigate the consequences of
+driver failure by isolating user-mode applications from the
+ HREF="">30</A>].  Many of these approaches are not
+end-all solutions, but mainly intermediate steps--some require driver
+source code modifications, others introduce significant performance
+overheads, etc.  More radical approaches aim for drivers that are safe
+by construction, using domain-specific
+ HREF="">29</A>,<A
+ HREF="">21</A>,<A
+ HREF="">28</A>]; these too
+require changes in how drivers are written and do not offer yet a
+solution for existing drivers.
+Besides being unsafe, drivers are also non-portable, because of the
+close driver/kernel coupling; this hurts both consumers and vendors.
+Consumers are constrained to one or two OSes if they want good device
+support, and they are often forced to upgrade to new versions if they
+want to benefit from new peripherals.  Vendors suffer as well, because
+the cost of porting and supporting drivers on multiple OSes is often
+prohibitive, so they release drivers only for one or two major
+platforms, thus restricting the market reach of their products.
+Portability, like safety, has also received due attention.  Attempts
+like the Uniform Driver Interface&nbsp;[<A
+ HREF="">27</A>] had limited success,
+mainly because they required close cooperation between hardware
+vendors.  Others, like NDISwrapper&nbsp;[<A
+ HREF="">24</A>], were targeted
+only at specific subsystems.
+In this paper we present a new approach to both the safety and
+portability challenges: RevEng automatically extracts from binary
+device drivers the protocol for interacting with hardware and then
+encodes it into a safe driver that can be run in an unmodified kernel.
+Until hardware vendors themselves start providing open specifications,
+reverse-engineering can provide a solution.
+<H1><A NAME="SECTION00020000000000000000">
+2. Reverse-Engineering Device Drivers</A>
+Reverse-engineering consists of distilling from the binary device
+driver its essence: the embedded protocol it uses to interact with
+hardware. This protocol encodes what the driver must do to perform
+tasks like sending or receiving packets, setting screen resolutions,
+etc.  RevEng proceeds in two phases: First, it records traces of
+hardware I/O interactions, memory accesses, and executed instructions.
+Second, it combines the traces with a static analysis of the driver's
+binary to obtain the protocol state machine.  This knowledge is then
+re-encoded into a safe synthetic driver targeted at the same or
+different OS.  For each class of devices, RevEng relies on a driver
+template that contains the platform-specific boilerplate for
+that class; the extracted state machine is then used to ``specialize''
+the boilerplate with the device-specific elements.  Templates can be
+generated with tools like WinDriver&nbsp;[<A
+ HREF="">16</A>].
+Figure&nbsp;<A HREF="index.html#fig:process">1</A> illustrates RevEng's functionality:
+<DIV ALIGN="CENTER"><A NAME="fig:process"></A><A NAME="46"></A>
+Reverse-engineering drivers with RevEng.</CAPTION>
+</DIV>        <IMG
+ SRC="reveng.png"
+ ALT="\includegraphics{gfx/architecture.eps}"></TD></TR>
+<H2><A NAME="SECTION00021000000000000000"></A>
+<A NAME="sec:extraction"></A><BR>
+2.1 Extracting the State Machine
+Device drivers can be viewed as state machines that encapsulate the
+protocol for communicating with hardware; RevEng's goal is to extract
+this state machine.  The states of the automaton are snapshots of
+some of the driver's heap and stack variables.  The transition
+conditions can be predicates on hardware registers or direct kernel
+invocations of the driver entry points, i.e., the driver functions
+visible to the OS.  Finally, the transition actions result in the
+driver generating output values that get written to the hardware
+registers and the kernel.
+The driver's internal organization and specifics of the data
+structures are irrelevant to the protocol state machine.  Network
+driver A may batch incoming packets before delivering them to the
+OS, while driver B may deliver each one upon arrival; yet, both
+drivers implement the same driver/hardware protocol, and the hardware
+cannot distinguish between A and B.  Of course, user-perceived
+performance may differ substantially.
+To trace the binary device driver's states and transitions, we use
+ HREF="">3</A>], an open-source virtual machine monitor.  The
+driver runs inside a virtual machine, and RevEng snoops all
+interactions between the driver and the virtual hardware, traces the
+program counters of instructions executed by the driver, the register
+values involved in function calls, and all memory accesses.
+RevEng then mines the obtained traces for correlations between inputs
+provided to the driver and its actions.  Consider the following very
+simple example: a particular register on the network card always has
+value <TT>0x5b</TT> when a packet is sent, regardless of packet size or
+destination; this value switches to <TT>0x6b</TT> any time a packet is
+received.  RevEng concludes that sending a packet most likely requires
+depositing <TT>0x5b</TT> in that particular register and receiving a
+packet requires value <TT>0x6b</TT>.
+We supplement trace analysis with static analysis of the driver's
+binary.  Besides constant writes, drivers may also compute values for
+the registers, such as a packet length.  To reverse-engineer this
+computation, we find the program slices&nbsp;[<A
+ HREF="">33</A>] for
+those instructions that perform hardware register writes; the slice of
+such an instruction consists of all the instructions that affect its
+operands.  The program counter traces are used to narrow down the
+execution path followed by the driver through the slice.  RevEng then
+extracts the logic that computes the value written to the registers.
+To identify the state (i.e., driver variables) used in the
+computation, RevEng uses the corresponding memory trace.
+ RevEng also tracks calls to specific kernel APIs, in order to infer
+when drivers run asynchronous code via timers, threads, or interrupts.
+All such operations must be registered with the kernel via specific
+APIs, to give the kernel the address of the corresponding handler.  By
+recording these calls in the trace, RevEng has sufficient information
+to identify the asynchronous properties of the original driver and
+reconstruct them in the synthesized driver.  Some asynchronous
+operations might not have an obvious cause-effect relationship; for
+example, a driver might decide to switch from asynchronous I/O to
+polling long after the trigger event occurred.  However, by
+identifying the state that was updated by the trigger event and later
+used in deciding the switch to polling, RevEng is able to correlate
+the trigger events with transitions of the driver's state machine.
+<H2><A NAME="SECTION00022000000000000000">
+2.2 Synthesizing New Drivers</A>
+To obtain the synthetic executable driver, the slices obtained in the
+previous step (&#167;<A HREF="index.html#sec:extraction">2.1</A>) are converted by RevEng into C
+code, similar to how a decompiler would do
+ HREF="">9</A>].  Memory accesses captured in the trace
+are replaced with symbolic names in the generated C code.  Stack
+accesses are replaced by local variables.  Heap accesses are matched
+with the traced memory blocks provided by the kernel or allocated by
+the driver.  Instruction and memory traces help resolve pointer
+aliasing questions and allow memory-mapped I/O to be distinguished
+from normal memory accesses&nbsp;[<A
+ HREF="">10</A>].
+The result is a set of C code blocks that represent the
+reverse-engineered state machine.  Executing these RevEng-generated
+code blocks would result in the same traces as those recorded while
+snooping on the original driver.  The code blocks implement all
+device-specific actions, thus providing the coupling between the
+driver and the device.  For instance, these code sequences indicate
+how to send/receive network packets or how to reset the NIC for the
+network device under study.
+The boilerplate that forms a driver template consists of the
+high-level logic of a driver corresponding to that particular class of
+devices along with the glue code that couples the driver to the
+kernel.  For instance, a network driver must be able to initialize the
+network card, send packets, and receive packets.  For a given
+operating system, these operations are invoked via specific kernel
+functions and follow a specific sequence.  All this code would be
+contained in the network driver template.  Currently, all templates
+are written in C.
+Synthesizing a new driver consists of ``pasting'' the
+reverse-engineered C code blocks into the driver template, to
+specialize the boilerplate into a functional driver specific to the
+device in question.  Currently, this specialization is still done
+manually, but we hope RevEng to eventually do it automatically.
+The reverse-engineering process occurs incrementally.  A given trace
+represents one particular execution path through the original driver
+code, and many basic blocks may not have been exercised.  These result
+in missing blocks of the sate machine; RevEng annotates such
+blocks with special markers (preprocessor macros) indicating that they
+correspond to existing functionality that has not been
+reverse-engineered yet.  For example:
+   if (reg2 &lt; 10) {
+      pktlen = reg2 + 64 + hdrlen ;
+      disable() ;
+      outportb( PADR, pktlen ) ;
+      enable() ;
+   } else
+As additional executions cover previously-unexplored basic blocks,
+the functionality is progressively discovered, and the macros
+are replaced with reverse-engineered code blocks.
+It is also possible to steer the original driver down unexercised
+paths.  For instance, we can compute path constraints using symbolic
+ HREF="">18</A>] and solve them to obtain input values
+that will take the driver down the desired
+ HREF="">4</A>,<A
+ HREF="">17</A>].  RevEng does not
+support such steering yet.  The most difficult paths to exercise are
+error recovery paths, and we intend to use (virtual) hardware fault
+injection to reach them.  We want to employ both types of steering as
+part of a feedback loop to dynamically reverse-engineer unexercised
+<H1><A NAME="SECTION00030000000000000000">
+3. Properties of Synthesized Drivers</A>
+Five properties are of interest to RevEng: equivalence, completeness,
+safety, liveness, and portability.
+<B>Equivalence</B>: To the hardware, I/O operations performed by the
+synthesized driver should be indistinguishable from those performed by
+the original driver.  In our current prototype, this generally holds,
+except we cannot yet reverse-engineer all error recovery paths.  So,
+by generating certain errors, the hardware could tell the
+two drivers apart.  Note that equivalence is not the same as
+completeness, i.e., the property that the synthesized driver can do
+everything the original one did.
+<B>Completeness</B>: It is not always feasible to completely
+reverse-engineer a driver.  Fortunately, partial reverse-engineering
+can be quite useful (e.g., having all 2D acceleration in a graphics
+driver but perhaps not the 3D one).  Nevertheless, a future version of
+RevEng will be able to run the synthesized driver in parallel with the
+original one, the latter suitably sandboxed in a virtual machine.
+Requests that cannot be handled by the synthesized driver are relayed
+to the sandbox; tracing the execution can be used to augment the
+synthesized driver progressively, until it becomes complete.  The
+state of the two drivers has to be kept synchronized; since state
+variables have the same layout in both drivers, state can be
+explicitly copied to the target stack and heap, and execution
+transferred to the not-yet-reverse-engineered blocks.
+This raises the question of when is a synthetic driver ready to
+replace the original?  From a subjective user's perspective, it is
+when all the desired functionality has been reverse-engineered.
+Objectively, completeness can be measured as coverage of the
+original's basic blocks, functions, or code paths.  We must also take
+into account loop and array boundaries, where tracing one iteration or
+one access may not be enough.  For the drivers we reverse-engineered,
+a naive workload was sufficient to obtain a useful network driver
+(&#167;<A HREF="index.html#sec:prelim_results">4</A> has more details).
+<B>Safety</B>: The synthetic driver arises from merging a driver
+template with a reverse-engineered state machine.  We expect the
+driver template, whether generated manually or by tools (e.g.,
+ HREF="">16</A>]), to be checked for correctness using
+formal methods (e.g., with SLAM&nbsp;[<A
+ HREF="">2</A>]).  This is a worthwhile
+investment, because templates can be reused across drivers of the same
+The state machine is assumed safe by construction.  RevEng uses
+recorded traces and any trace that has led to a safety violation
+(e.g., that resulted in a crash because of a bad pointer) are not used
+in the reverse-engineering process.  As long as all ``bad traces'' can
+be excluded as non-safe, the resulting state machine will be safe.
+RevEng must be trusted to generate correct code, much the same way a
+compiler is trusted.  Devices with programmable firmware might be
+sensitive to missing error recovery paths or certain timing
+characteristics.  In general, however, hardware and its drivers are
+indulgent with respect to timing&nbsp;[<A
+ HREF="">34</A>].
+<B>Liveness:</B> Infinite loops and deadlocks in drivers would cause
+the kernel to hang; RevEng ensures that reverse-engineered loops can
+never become infinite and deadlocks are not encountered.  In our
+instruction traces, loops appear as a sequence of duplicated loop
+bodies.  We found that the Linux driver base and the Windows Driver
+ HREF="">22</A>] have only five types of loops: ones with constant
+number of iterations (typically used to initialize registers), polling
+loops, delay loops, data transfer loops, and structure traversal
+loops.  RevEng reverse-engineers the first three automatically,
+although the constant-iterations loops are still kept unrolled.  Data
+transfer loops and structure traversal loops are currently deferred to
+manual inspection, but we are working on automatically generating
+these too.
+<B>Portability:</B> Driver templates are easy to generate for common
+classes of devices, because these devices tend to operate in the same
+manner, e.g., all graphic card drivers set up a framebuffer and
+perform similar operations on it.  Templates are OS-specific.  In some
+cases, it may be worthwhile generating more specialized templates for
+a particular line of devices from the same manufacturer, enabling
+quicker support of new models.  We have not yet worked with highly
+specialized custom devices, which might invalidate some of these
+When a new version of a driver is released, a future version of RevEng will perform a binary diff to identify the added code paths.  Suitable
+workload will then be generated to exercise the modified code paths,
+similarly to automatic patch-based exploit generation&nbsp;[<A
+ HREF="">6</A>].
+<H1><A NAME="SECTION00040000000000000000"></A>
+<A NAME="sec:prelim_results"></A>
+4. Preliminary Results
+We reverse-engineered the Linux NE2000 8390 network device driver and
+generated a synthetic driver that can reliably initialize the network
+interface, set the MAC address, and send/receive packets.  Performance
+overhead is negligible both in terms of throughput and latency.  For
+state machine extraction we used a 500 KB trace obtained with a
+5-second workload consisting of sending and receiving packets of
+different sizes.  Specializing the template was done manually and took
+~4 hours.  The stripped binary of the synthetic driver
+is 12&nbsp;KB compared to 18&nbsp;KB for the original driver; the size
+difference is mostly due to the reduced functionality.
+We also used RevEng to port the Windows NE2000 8029AS driver to Linux,
+using the same workload as above.  Manual specialization of the
+template took considerably longer (~3 days) and was
+error-prone, primarily because of the programmer-unfriendly code
+generated by RevEng and due to the API differences between Linux and
+Windows kernels.  All the integration errors were in the
+hardware-specific portion of the driver and did not affect the safety
+of the driver from the point of view of the OS.  We are currently
+working on generating friendlier code and automating the process.
+Even low coverage turned out to result in a useful driver.  With the
+5-second workload we obtained a basic block coverage of 48% and 56%
+for the Linux and Windows driver, respectively.  In Windows, many
+low-level functions achieved full coverage.  However, more complex
+drivers will likely require higher coverage, if we are to obtain
+useful synthetic drivers.
+<H1><A NAME="SECTION00050000000000000000">
+5. Discussion</A>
+We believe that safe synthetic drivers provide a better way to run
+privileged code that interacts with hardware: they reduce downtime and
+security vulnerabilities, and can help kernels promise higher data
+integrity.  Another advantage is portability: imagine ``instantly
+porting'' drivers from one platform to another, to the benefit of
+consumers, who can use all hardware devices with their favorite OS,
+and the benefit of vendors, who no longer have to invest in providing
+drivers for multiple platforms.  The time and effort savings can be
+used to build better hardware.  All these benefits can be had without
+changes to any OS kernel.  However, there are still a few open
+questions, which we address next.
+<H2><A NAME="SECTION00051000000000000000">
+5.1 When Is A Driver RevEng-able?</A>
+To reverse-engineer a driver, the semantics of its interface with the
+external world must be sufficiently well understood to connect cause
+(e.g., the invocation of an entry point) and effect (e.g, a sequence
+of hardware I/O).
+Operations such as <TT>ioctl</TT> can blur this connection.  For
+instance, user-mode applications are often used to configure graphics
+cards; these applications issue <TT>ioctl</TT>s to the device.  A click
+in the configuration GUI may therefore cause a sequence of <TT>  ioctl</TT>s in a way that is entirely user-dependent.  We intend to
+augment RevEng with data flow analysis that will help track the input
+from the configuration change to the hardware registers, such that we
+synthesize a driver that preserves the <TT>ioctl</TT>-based interface.
+This would enable the reuse of the original proprietary user-mode
+For some devices (e.g., that are not part of a
+commonly used class of hardware), producing a template may seem to require more
+effort than simply writing a driver.  Nevertheless, mandatory
+boilerplate is quite large (e.g., on Windows, power management support
+must be included for all plug-and-play devices, regardless of whether they need it
+or not), so separating driver code into template vs. device-specific
+code is anyway a good idea.
+<H2><A NAME="SECTION00052000000000000000">
+5.2 Extracting Hardware Specifications</A>
+In addition to the hardware protocol, RevEng also obtains a
+specification of the hardware, as encoded in the original device
+driver.  This hardware specification, once translated into a formal
+language, can be used to verify the assumptions of the original driver
+implementation against the hardware's actual specification.
+To recover the semantics of registers, RevEng records multiple traces
+while perturbing input parameters (e.g., which mouse button is
+pressed, the size of data packets, screen resolution).  The I/O
+differences between traces are then correlated to the changing
+parameter, in a way similar to opcode
+ HREF="">15</A>].  Our current technique
+requires refinement, though.  While effective at recovering register
+semantics of simple devices, such as a PS/2 mouse, aligning traces to
+compare registers becomes challenging for complex drivers: the same
+register might have different meanings depending on context (as in the
+case of register banks), and the traces can be polluted by unrelated
+I/O.  We are working both on techniques for better filtering of traces
+and statistical approaches to trace correlation.
+<H2><A NAME="SECTION00053000000000000000">
+5.3 Legal Aspects</A>
+Some parts of RevEng resemble decompilation, because we translate
+original binary code into C.  This may have legal implications, if the
+binary is protected by intellectual property rights, patents, or has
+an otherwise restrictive license.  It could also prevent the use of
+RevEng as a way to generate synthetic drivers for subsequent
+redistribution.  Employing RevEng for private use, however, should not
+be problematic; once RevEng is fully automated, private use could be
+the preferred usage scenario.
+Projects connected to reverse-engineering of proprietary software,
+like Wine&nbsp;[<A
+ HREF="">1</A>] or ReactOS&nbsp;[<A
+ HREF="">11</A>], have had legal
+problems in the past.  We believe this type of challenges can be
+mitigated if the extracted code, which we now use to specialize driver
+templates, is treated as a mere specification of the driver, not as
+raw code to inject into the template.  This would ensure no original
+code leaks from the original driver to the synthetic one--an approach
+that conforms to the clean-room principle&nbsp;[<A
+ HREF="">5</A>].
+<H2><A NAME="SECTION00054000000000000000">
+5.4 Challenges of Reverse-Engineering</A>
+ RevEng's current reliance on QEMU introduces certain
+limitations.  First, it is not possible to recover drivers unless an
+emulation of the corresponding device exists.  Second, QEMU does not
+fully emulate error conditions, like packet transmission errors or
+seek errors on disk drives; to reproduce such conditions, we would
+have to interface QEMU with real hardware.  Third, QEMU's  PCI
+emulation approach&nbsp;[<A
+ HREF="">32</A>] is limited to port I/O and
+interrupts; however, modern PCI hardware also needs support for DMA
+and PCI-Express.  DMA support can be added using IOMMUs, while
+PCI-Express support would require the emulation of a virtual chipset
+Our current tracing infrastructure introduces a two-fold slowdown in
+the driver that is being traced, because of the disk accesses for
+writing the trace; there is no impact on code running outside of the
+driver, since it is not instrumented, and the synthesized driver's
+performance is not affected either (&#167;<A HREF="index.html#sec:prelim_results">4</A>).  However,
+the tracing slowdown could affect RevEng's ability to reverse-engineer
+time-sensitive drivers.  Acquisition devices, like sound
+cards, have certain real-time requirements; if tracing is too slow, the
+driver may end up executing mostly recovery code, due to timeouts.  In such
+cases, we can use hardware tracing solutions&nbsp;[<A
+ HREF="">7</A>]
+instead of virtual machines.
+Many of these challenges arise from the fact that we start by
+reverse-engineering binary device drivers.  However, we view
+reverse-engineering as a stopgap measure, and we hope that a
+successful RevEng will persuade hardware vendors to provide
+specifications in a standardized formal hardware abstraction language
+(HAL) instead of binary drivers.  Some of them already provide
+informal specifications in datasheets.  Others may need to reconsider
+how they develop their hardware interfaces, to prevent competitors
+from inferring sensitive information about their chips.  Once HAL
+specifications are available, RevEng can generate safe, verified
+drivers for any platform of interest.
+<H1><A NAME="SECTION00060000000000000000">
+6. Conclusion</A>
+We proposed a new approach to solving the problem of safety and
+portability of device drivers, without requiring access to source code
+or any modifications to hardware, drivers, or existing operating systems.
+RevEng takes the original binary driver, executes it in a virtual machine, and
+traces its interaction with hardware.  The traces are then used to
+extract the hardware interaction protocol embodied in the driver.
+While RevEng can provide safety and portability today, our hope is
+that eventually hardware vendors will migrate to a model in which they
+release formal specifications of the hardware interaction protocols,
+instead of closed binary drivers.  Once they do, the world will be a
+better place: operating systems will crash less, new devices will be
+supported on all platforms, and the OS playing field will become more
+level. Hardware obsolescence will slow down and there will be fewer forced
+upgrades and unjustified costs.
+<H1><A NAME="SECTION00070000000000000000">
+7. Acknowledgments</A>
+We would like to thank Willy Zwaenepoel, Mike Swift, Aravind Menon,
+the members of DSLab, as well as the anonymous reviewers for their
+feedback on this work.
+<H2><A NAME="SECTION00080000000000000000">
+</H2><DL COMPACT><DD><P></P><DT><A NAME="wine">1</A>
+B.&nbsp;Amstadt, E.&nbsp;Youngdale, and A.&nbsp;Julliard.
+<BR>Wine is not an emulator.
+<BR><A HREF=""></A>.
+<P></P><DT><A NAME="slam">2</A>
+T.&nbsp;Ball, E.&nbsp;Bounimova, B.&nbsp;Cook, V.&nbsp;Levin, J.&nbsp;Lichtenberg, C.&nbsp;McGarvey,
+  B.&nbsp;Ondrusek, S.&nbsp;K. Rajamani, and A.&nbsp;Ustuner.
+<BR>Thorough static analysis of device drivers.
+<BR>In <EM>Proc. ACM EUROSYS Conference</EM>, 2006.
+<P></P><DT><A NAME="bellard:qemu">3</A>
+<BR>QEMU, a fast and portable dynamic translator.
+<BR>In <EM>Proc. USENIX Annual Technical Conference</EM>, 2005.
+<P></P><DT><A NAME="boyapati:korat">4</A>
+C.&nbsp;Boyapati, S.&nbsp;Khurshid, and D.&nbsp;Marinov.
+<BR>Korat: automated testing based on java predicates.
+<BR>In <EM>Proc. ACM SIGSOFT International Symposium on Software
+  Testing and Analysis</EM>, 2002.
+<P></P><DT><A NAME="chinesewalls">5</A>
+D.&nbsp;D.&nbsp;F. Brewer and D.&nbsp;M.&nbsp;J. Nash.
+<BR>The Chinese wall security policy.
+<BR>In <EM>Proc. IEEE Symposium on Security and Privacy</EM>, 1989.
+<P></P><DT><A NAME="apeg">6</A>
+D.&nbsp;Brumley, P.&nbsp;Poosankam, D.&nbsp;Song, and J.&nbsp;Zheng.
+<BR>Automatic patch-based exploit generation is possible: Techniques and
+  implications.
+<BR>In <EM>IEEE Symposium on Security and Privacy</EM>, 2008.
+<P></P><DT><A NAME="chen:logmonitor">7</A>
+S.&nbsp;Chen, B.&nbsp;Falsafi, P.&nbsp;B. Gibbons, M.&nbsp;Kozuch, T.&nbsp;C. Mowry, R.&nbsp;Teodorescu,
+  A.&nbsp;Ailamaki, L.&nbsp;Fix, G.&nbsp;R. Ganger, B.&nbsp;Lin, and S.&nbsp;W. Schlosser.
+<BR>Log-based architectures for general-purpose monitoring of deployed
+  code.
+<BR>In <EM>Proc. 1st Workshop on architectural and system
+  support for improving software dependability</EM>, 2006.
+<P></P><DT><A NAME="chou:empirical">8</A>
+A.&nbsp;Chou, J.-F. Yang, B.&nbsp;Chelf, S.&nbsp;Hallem, and D.&nbsp;Engler.
+<BR>An empirical study of operating systems errors.
+<BR>In <EM>Proc. 18th ACM Symposium on Operating Systems
+  Principles</EM>, 2001.
+<P></P><DT><A NAME="cifuentes:reverse94">9</A>
+<BR><EM>Reverse Compilation Techniques</EM>.
+<BR>PhD thesis, Queensland University of Technology, School of Computing
+  Science, 1994.
+<P></P><DT><A NAME="compsec-decomp">10</A>
+C.&nbsp;Cifuentes, T.&nbsp;Waddington, and M.&nbsp;V. Emmerik.
+<BR>Computer security analysis through decompilation and high-level
+  debugging.
+<BR>In <EM>Proc. 8th Working Conference on Reverse
+  Engineering</EM>, 2001.
+<P></P><DT><A NAME="reactos-legal">11</A>
+<BR>Reset, reboot, restart, legal issues and the long road to 0.3.
+<BR><A HREF=""></A>, 2006.
+<P></P><DT><A NAME="xen-safedriver">12</A>
+K.&nbsp;Fraser, S.&nbsp;Hand, R.&nbsp;Neugebauer, I.&nbsp;Pratt, A.&nbsp;Warfield, and M.&nbsp;Williams.
+<BR>Safe hardware access with the Xen virtual machine monitor.
+<BR>In <EM>Proc. 1st Workshop on Operating System and Architectural
+  Support for the On Demand IT InfraStructure</EM>, 2004.
+<P></P><DT><A NAME="microdrivers">13</A>
+V.&nbsp;Ganapathy, M.&nbsp;J. Renzelmann, A.&nbsp;Balakrishnan, M.&nbsp;M. Swift, and S.&nbsp;Jha.
+<BR>The design and implementation of microdrivers.
+<BR>In <EM>Proc. 13th Intl. Conf. on Architectural Support for
+  Programming Languages and Operating Systems</EM>, 2008.
+<P></P><DT><A NAME="hartig:microkernelperf">14</A>
+H.&nbsp;H&#228;rtig, M.&nbsp;Hohmuth, J.&nbsp;Liedtke, J.&nbsp;Wolter, and S.&nbsp;Sch&#246;nberg.
+<BR>The performance of &mu;-kernel-based systems.
+<BR>In <EM>16th ACM Symposium on Operating Systems
+  Principles</EM>, 1997.
+<P></P><DT><A NAME="reverse-opcodes">15</A>
+W.&nbsp;C. Hsieh, D.&nbsp;R. Engler, and G.&nbsp;Back.
+<BR>Reverse-engineering instruction encodings.
+<BR>In <EM>Proc. USENIX Annual Technical Conference</EM>, 2001.
+<P></P><DT><A NAME="windriver">16</A>
+<BR>WinDriver device driver development tookit, version 9.0.
+<BR><A HREF=""></A>, 2007.
+<P></P><DT><A NAME="kicillov:coverage">17</A>
+N.&nbsp;Kicillof, W.&nbsp;Grieskamp, N.&nbsp;Tillmann, and V.&nbsp;Braberman.
+<BR>Achieving both model and code coverage with automated gray-box
+  testing.
+<BR>In <EM>Proc. 3rd International Workshop on Advances in Model-based
+  Testing</EM>, 2007.
+<P></P><DT><A NAME="king:symbolic">18</A>
+J.&nbsp;C. King.
+<BR>Symbolic execution and program testing.
+<BR><EM>Commun. ACM</EM>, 19(7):385-394, 1976.
+<P></P><DT><A NAME="previrt">19</A>
+J.&nbsp;LeVasseur, V.&nbsp;Uhlig, M.&nbsp;Chapman, P.&nbsp;Chubb, B.&nbsp;Leslie, and G.&nbsp;Heiser.
+<BR>Pre-virtualization: Slashing the cost of virtualization.
+<BR>Technical report, Universit&#228;t Karlsruhe (TH), 2005.
+<P></P><DT><A NAME="levasseur:devdrvreuse">20</A>
+J.&nbsp;LeVasseur, V.&nbsp;Uhlig, J.&nbsp;Stoess, and S.&nbsp;G&#246;tz.
+<BR>Unmodified device driver reuse and improved system dependability via
+  virtual machines.
+<BR>In <EM>Proc. 6th Symposium on Operating Systems Design
+  and Implementation</EM>, 2004.
+<P></P><DT><A NAME="merillon:dsl_devil">21</A>
+F.&nbsp;M&#233;rillon, L.&nbsp;R&#233;veill&#232;re, C.&nbsp;Consel, R.&nbsp;Marlet, , and G.&nbsp;Muller.
+<BR>Devil: An IDL for hardware programming.
+<BR>In <EM>Proc. 4th Symposium on Operating Systems Design
+  and Implementation</EM>, 2000.
+<P></P><DT><A NAME="wdk">22</A>
+<BR>Windows Driver Kit.
+<BR><A HREF=""></A>.
+<P></P><DT><A NAME="vista:crashes">23</A>
+<BR>Microsoft internal memo, provided as public evidence in court case
+  #c07-475mjp.
+<BR><A HREF=""></A>, 2008.
+<P></P><DT><A NAME="ndiswrapper">24</A>
+<BR><A HREF=""></A>, 2008.
+<P></P><DT><A NAME="win-driver-quality">25</A>
+V.&nbsp;Orgovan and M.&nbsp;Tricker.
+<BR>An introduction to driver quality.
+<BR>Microsoft Windows Hardware Engineering Conf., May 2003.
+<BR>Presentation DDT301 (as cited in&nbsp;[<A
+ HREF="">30</A>]).
+<P></P><DT><A NAME="poess:depriv">26</A>
+<BR>Binary device driver reuse.
+<BR>Master's thesis, Universit&#228;t Karlsruhe (TH), 2007.
+<P></P><DT><A NAME="udi">27</A>
+Project UDI.
+<BR>Uniform Driver Interface.
+<BR><A HREF=""></A>, 2008.
+<P></P><DT><A NAME="spear:drivers">28</A>
+M.&nbsp;F. Spear, T.&nbsp;Roeder, O.&nbsp;Hodson, G.&nbsp;C. Hunt, and S.&nbsp;Levi.
+<BR>Solving the starting problem: device drivers as self-describing
+  artifacts.
+<BR>In <EM>Proc. ACM EUROSYS Conference</EM>, 2006.
+<P></P><DT><A NAME="hail">29</A>
+J.&nbsp;Sun, W.&nbsp;Yuan, M.&nbsp;Kallahalla, and N.&nbsp;Islam.
+<BR>HAIL: a language for easy and correct device access.
+<BR>In <EM>Proc. 5th Intl. Conf. on Embedded Software</EM>, 2005.
+<P></P><DT><A NAME="swift:recovery">30</A>
+M.&nbsp;M. Swift, M.&nbsp;Annamalai, B.&nbsp;N. Bershad, and H.&nbsp;M. Levy.
+<BR>Recovering device drivers.
+<BR><EM>ACM Trans. Comput. Syst.</EM>, 24(4):333-360, 2006.
+<P></P><DT><A NAME="zeroday">31</A>
+<BR><A HREF="">
+<P></P><DT><A NAME="qemu-pci-proxy">32</A>
+<BR>QEMU Host PCI Proxy V0.3 patch.
+<BR><A HREF=""></A>.
+<P></P><DT><A NAME="weiser:slicing">33</A>
+<BR>Program slicing.
+<BR>In <EM>Proc. 5th International Conference on Software
+  Engineering</EM>, 1981.
+<P></P><DT><A NAME="williams:ddsafety">34</A>
+D.&nbsp;Williams, P.&nbsp;Reynolds, K.&nbsp;Walsh, E.&nbsp;G. Sirer, and F.&nbsp;B. Schneider.
+<BR>Device driver safety through a reference validation mechanism.
+<BR>In <EM>Proc. 8th USENIX Symposium on Operating Systems Design and
+  Implementation</EM>, 2008.
+<script type="text/javascript">
+var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
+document.write(unescape("%3Cscript src='" + gaJsHost + "' type='text/javascript'%3E%3C/script%3E"));
+<script type="text/javascript">
+var pageTracker = _gat._getTracker("UA-373800-4");
+</script><script type="text/javascript">if(!NREUMQ.f){NREUMQ.f=function(){NREUMQ.push(["load",new Date().getTime()]);var e=document.createElement("script");e.type="text/javascript";e.src=(("http:"===document.location.protocol)?"http:":"https:")+"//"+"";document.body.appendChild(e);if(NREUMQ.a)NREUMQ.a();};NREUMQ.a=window.onload;window.onload=NREUMQ.f;};NREUMQ.push(["nrfj","","d823139095","509444","YVJVZksCXkEEVhIMWFgYdlFNCl9cSkAVAFlfT2hAXAdZQABWEhZoWFhDbV8MRVwB",0,121,new Date().getTime(),"","","","",""]);</script>
diff --git a/references/ b/references/
new file mode 100644
index 0000000..baf77cd
Binary files /dev/null and b/references/ differ