Most computer users nowadays are nontechnical people and have a mental model of what they expect from a computer based on their experience with TV sets and stereos: you buy it, plug it in, and it works perfectly for the next 10 years. Unfortunately, they are often disappointed as computers are not very reliable when measured against the standards of other consumer electronics devices.
A large part of the problem is the operating system, which is often millions of lines of kernel code, each of which can potentially bring the system down. The worst offenders are the device drivers, which have been shown to have bug rates 3-7x more than the rest of the system. As long as we maintain the current structure of the operating system as a huge single monolithic program full of foreign code and running in kernel mode, the situation will only get worse. While there have been ad hoc attempts to patch legacy systems, what is needed is a different approach.
In an attempt to provide much higher reliability, we have created a new multiserver operating system with only 15,000 lines in kernel and the rest of the operating system split up into small components each running as a separate user-mode process. For example, each device driver runs as a separate process and is rigidly controlled by the kernel to give it the absolute minimum amount of power to prevent bugs in it from damaging other system components. A reincarnation server periodically tests each user-mode component and automatically replaces failed or failing components on the fly, without bringing the system down and in some cases without affecting user processes.
The talk will discuss the architecture of this system, called MINIX 3. The system can be downloaded for free from www.minix3.org