Lisp Operating System
===============================================================================

A Lisp Operating System (LispOS for short) is not just another
operating system that happens to be written in Lisp (although that
would be a good thing in itself). A LispOS is also an operating
system that uses the Lisp interactive environment as an inspiration
for the interface between the user and the system, and between
applications and the system.

Below, we give some ideas on what a LispOS might contain, how it
would be different from existing operating systems, and how such a
system might be created.

The problem with existing systems

The concept of a process

Most popular existing operating systems are derived from Unix which
was written in the 1970s. The computers for which Unix was intended
has a very small address space; too small for most usable end-user
applications. To solve this problem, the creators of Unix used the
concept of a process. A large application was written so that it
consisted of several smaller programs, each of which ran in its own
address space. These smaller programs would communicate by having
one application write text to its output stream for another
application to read. This method of communication was called a pipe
and a sequence of small applications was called a pipeline. As a
typical example of a chain of applications, consider the pipeline
for producing a typeset document (one of the main applications for
which Unix was designed). This chain had a program for creating
tables (called tbl), a program for generating pictures (called
pic), a program for generating equations (called eqn), and of
course the typesetting program itself (called troff).

Using pipes to communicate between different components of an
application has several disadvantages:

  * To communicate complex data structures (such as trees or
    graphs), they must be converted to a stream of bytes by the
    creating component, and it must be analyzed and parsed into an
    equivalent data structure by the using component. Not only is
    this unparsing/parsing inefficient in terms of computing
    resources, but it is also problematic from a
    software-engineering point of view, because the external format
    must be specified and maintained as a separate aspect of each
    component.
  * An artificial order between the different components is
    imposed, so that components can not work as libraries that
    other components can use in any order. Sometimes (as in the
    example of the troff chain) the end result of a computation
    depends in subtle ways on the order between the components of
    the chain. Introducing a new component may require other
    components to be modified.

It is an interesting observation that in most text books on
operating systems, the concept of a process is presented as playing
a central role in operating-system design, whereas it ought to be
presented as an unfortunate necessity due to the limited address
space of existing minicomputers in the 1970s. It is also presented
as the method for obtaining some kind of security, preventing one
application from intentionally or accidentally modifying the data
of some other application. In reality, there are several ways of
obtaining such security, and separate address spaces should be
considered to be a method with too many disadvantages.

Nowadays, computers have addresses that are 64 bit wide, making it
possible to address almost 20 exabytes of data. To get an idea of
the order of magnitude of such a number, consider a fairly large
disc that can hold a terabyte of data. Then 20 million such discs
can be directly addressed by the processor. We can thus consider
the problem of too small an address space to be solved.

Hierarchical file systems

Existing operating system come with a hierarchical file system.
There are two significant problems, namely hierarchical and file.

The hierarchy is also a concept that dates back to the 1970s, and
it was considered a vast improvement on flat file systems. However,
as this article clearly explains, most things are not naturally
hierarchical. A hierarchical organization imposes an artificial
order between names. Whether a document is called Lisp/Programs/
2013/stuff, Programs/Lisp/2013/stuff, 2013/Programs/Lisp/stuff,
etc, is usually not important.

The problem with a file is that it is only a sequence of bytes with
no structure. This lack of structure fits the Unix pipe model very
well, because intermediate steps between individual software
components can be saved to a file without changing the result. But
it also means that in order for complex data structures to be
stored in the file system, they have to be transformed into a
sequence of bytes. And whenever such a structure needs to be
modified by some application, it must again be parsed and
transformed into an in-memory structure.

Distinction between primary and secondary memory

Current system (at least for desktop computers) make a very clear
distinction between primary and secondary memory. Not only are the
two not the same, but they also have totally different semantics:

  * Primary memory is volatile. When power is turned off, whatever
    was in primary memory is lost.
  * Secondary memory is permanent. Stored data will not disappear
    when power is turned off.

This distinction coupled with the semantics of the two memories
creates a permanent conundrum for the user of most applications, in
that if current application data is not saved, then it will be lost
in case of power loss, and if it is saved, then previously saved
data is forever lost.

Techniques were developed as early in the 1960s for presenting
primary and secondary memory as a single abstraction to the user.
For example, the Multics system had a single hierarchy of
fixed-size byte arrays (called segments) that served as permanent
storage, but that could also be treated as any in-memory array by
applications. As operating systems derived from Unix became
widespread, these techniques were largely forgotten.

Objectives for a Lisp operating system

The three main objectives of a Lisp operating system correspond to
solutions to the two main problems with exiting systems as
indicated in the previous section.

Single address space

Instead of each application having its own address space, we
propose that all applications share a single large address space.
This way, applications can share data simply by passing pointers
around, because a pointer is globally valid, unlike pointers in
current operating systems.

Clearly, if there is a single address space shared by all
applications, there needs to be a different mechanism to ensure
protection between them so that one application can not
intentionally or accidentally destroy the data of another
application. Most high-level programming languages (in particular
Lisp, but also Java, and many more) propose a solution to this
problem by simply not allowing users to execute arbitrary machine
code. Instead, they allow only code that has been produced from the
high-level notation of the language and which excludes arbitrary
pointer arithmetic so that the application can only address its own
data. This technique is sometimes called "trusted compiler".

It might sometimes be desirable to write an application in a
low-level language like C or even assembler, or it might be
necessary to run applications that have been written for other
systems. Such applications could co-exist with the normal ones, but
they would have to work in their own address space as with current
operating systems, and with the same difficulties of communicating
with other applications.

Object store based on tags

Instead of a hierarchical file system, we propose an object store
which can contain any objects. If a file (i.e. a sequence of bytes)
is desired, it would be stored as an array of bytes.

Instead of organizing the objects into a hierarchy, objects in the
store can optionally be associated with an arbitrary number of tags
. These tags are key/value pairs, such as for example the date of
creation of the archive entry, the creator (a user) of the archive
entry, and the access permissions for the entry. Notice that tags
are not properties of the objects themselves, but only of the
archive entry that allows an object to be accessed. Some tags might
be derived from the contents of the object being stored such as the
sender or the date of an email message. It should be possible to
accomplish most searches of the store without accessing the objects
themselves, but only the tags. Occasionally, contents must be
accessed such as when a raw search of the contents of a text is
wanted.

It is sometimes desirable to group related objects together as with
directories of current operating systems. Should a user want such a
group, it would simply be another object (say instances of the
class directory) in the store. Users who can not adapt to a
non-hierarchical organization can even store such directories as
one of the objects inside another directory.

Here are some examples of possible keyword/value pairs, how they
might be used, and what kinds of values are permitted:

+-----------------------------------------------------------------+
|Keyword             |Possible values                             |
|--------------------+--------------------------------------------|
|                    |The nature of the object such as movie,     |
|                    |music, article, book, user manual,          |
|category            |dictionary, course, lecture, recipe, program|
|                    |, bank statement, email. These would be     |
|                    |chosen from an editable set that is defined |
|                    |per user.                                   |
|--------------------+--------------------------------------------|
|                    |A string that is displayed with the object, |
|name                |such as "A Dramatic Turn of Events", "Three |
|                    |seasons", "Alternative energy".             |
|--------------------+--------------------------------------------|
|author              |An object identifying a person, an          |
|                    |organization, a company, etc.               |
|--------------------+--------------------------------------------|
|genre. Can be used  |progressive metal, science, algorithms,     |
|for movies, music   |garbage collection, game, programming       |
|albums, programs,   |language implementation, operating system.  |
|articles, etc.      |These would be chosen from an editable set  |
|                    |that is defined per user.                   |
|--------------------+--------------------------------------------|
|                    |This tag can be used to identify the file   |
|                    |type of documents such as PDF, ogg/vorbis,  |
|                    |MPEG4 PNG, in which case the tag can be     |
|                    |assigned automatically, but also to identify|
|format              |the source format of files in a directory   |
|                    |containing things like articles or user     |
|                    |manuals, for example LaTeX, Texinfo, HTML.  |
|                    |These would be chosen from an editable set  |
|                    |that is defined per user.                   |
|--------------------+--------------------------------------------|
|date of creation    |A date interval.                            |
|--------------------+--------------------------------------------|
|composer. Used for  |An object representing a person. On a       |
|music.              |compilation album there can be more than one|
|                    |tag of this kind.                           |
|--------------------+--------------------------------------------|
|                    |An object representing a natural language   |
|                    |such as English, Vietnamese, or a           |
|                    |programming languages such as Lisp, Python. |
|                    |These would be chosen from an editable set  |
|language.           |that is defined per user. If appropriate, a |
|                    |document can have several of these tags, for|
|                    |instance if some program uses multiple      |
|                    |programming languages, or if a document is  |
|                    |written using several languages, such as a  |
|                    |dictionary.                                 |
|--------------------+--------------------------------------------|
|duration. Can be    |                                            |
|used for things like|An object representing a duration.          |
|movies or music     |                                            |
|albums.             |                                            |
|--------------------+--------------------------------------------|
|source control. Can |                                            |
|be used for any     |GIT, SVN, CVS, darks, etc. These would be   |
|documents that use  |chosen from an editable set that is defined |
|source control such |per user.                                   |
|a programs, etc.    |                                            |
+-----------------------------------------------------------------+

When (a pointer to) an object is returned to a user as a result of
a search of the object store, it is actually similar to what is
called a "capability" in the operating-system literature. Such a
capability is essentially only a pointer with a few bits indicating
what access rights the user has to the objects. Each creator may
interpret the contents of those bits as he or she likes, but
typically they would be used to restrict access, so that for
instance executing a reader method is allowed, but executing a
writer method is not.

Single memory abstraction

Instead of two different memory abstractions (primary and
secondary), the Lisp operating system would contain a single
abstraction which looks like any interactive Lisp system, except
that data is permanent.

Since data is permanent, application writers are encouraged to
provide a sophisticated undo facility.

The physical main (semiconductor) memory of the computer simply
acts as a cache for the disk(s), so that the address of an object
uniquely determines where on the disk it is stored. The cache is
managed as an ordinary virtual memory with existing algorithms.

Other features

Crash proof (maybe)

There is extensive work on crash-proof systems, be it operating
systems or data base systems. In our opinion, this work is
confusing in that the objective is not clearly stated.

Sometimes the objective is stated as the desire that no data be
lost when power is lost. But the solution to that problem already
exists in every laptop computer; it simply provides a battery that
allow the system to continue to work, or to be shut down in a
controlled way.

Other times, the objective is stated as a protection against
defective software, so that data is stored at regular intervals
(checkpointing) perhaps combined with a transaction log so that the
state of the system immediately before a crash can always be
recovered. But it is very hard to protect oneself against defective
software. There can be defects in the checkpointing code or in the
code for logging transactions, and there can be defects in the
underlying file system. We believe that it is a better use of
developer time to find and eliminate defects than to aim for a
recovery as a result of existing defects.

Multiple simultaneous environments

To allow for a user to add methods to standard generic functions
(such as print-object) without interfering with other users, we
suggest that each user gets a different global environment. The
environment maps names to objects such as functions, classes,
types, packages, and more. Immutable objects (such as the
common-lisp package) can exist in several different environments
simultaneously, but objects (such as the generic function
print-object would be different in different environments.

Multiple environments would also provide more safety for users in
that if a user inadvertently removes some system feature, then it
can be recovered from a default environment, and in the worst case
a fresh default environment could be installed for a user who
inadvertently destroyed large parts of his or her environment.

Finally, multiple environments would simplify experimentation with
new features without running the risk of destroying the entire
system. Different versions of a single package could exist in
different environments.

How to accomplish it

The most important aspect of a Lisp operating system is not that
all the code be written in Lisp, but rather to present a Lisp-like
interface between users and the system and between applications and
the system. It is therefore legitimate to take advantage of some
existing system (probably Linux or some BSD version) in order to
provide services such as device drivers, network communication,
thread scheduling, etc.

Create a Lisp system to be used as basis

The first step is to create a Common Lisp system that can be used
as a basis for the Lisp operating system. It should already allow
for multiple environments, and it should be available on 64-bit
platforms. Preferably, this system should use as little C code as
possible and interact directly with the system calls of the
underlying kernel.

Create a single-user system as a Unix process

The next step is to transform the Common Lisp system into an
operating system in the sense of the API for users and
applications. This system would contain the object store, but
perhaps not access control functionality.

When this step is accomplished, it is possible to write or adapt
applications such as text editors, inspectors, debuggers, GUI
interface libraries, etc. for the system.

Create device drivers

The final step is to replace the temporary Unix kernel with native
device drivers for the new system and to turn the system into a
full multi-user operating system.

--
robert.strandh@gmail.com
ViewGit