Red Echo

September 27, 2015

Deep Playa 2015

Well, that was a fun weekend, out in the trees near Sedro-Woolley. This was apparently the fourth year of the Deep Playa campout and it looked to be around 300 people this time. There were interesting art projects, fun activities, decent music, and overall a happy burnery festival vibe despite the cold damp weather.

AJ and I camped out in our big truck, as is becoming usual, and while it really needs a heater, at least it’s insulated and we had a generator powering the electric blanket. We also hung a big tarp off the side of the truck and made a shaded area where we could set up the propane camp fire – and lo there was much gathering and enjoying on Saturday night, as everyone was pretty much clustered up around one fire or another.

I brought the small version of my sound system and set it up by our camp, renegade-style. Two 15″ subs and two Mackie 450 tops – it was more sound than we needed, honestly, and I had a great time rocking the neighborhood with it. I played an electroswing set on Friday afternoon, and three psytrance sets at various other times when the mood struck me. I also got to play glitch-hop on the big main stage sound system Saturday night – it was a little challenging, perhaps due to the cold, but it went well anyway and I’m glad I did it.

Tomorrow it’ll be time to unpack; tonight I’m making an early night of it.

September 24, 2015

I did a little research and the pieces of this plan are becoming clear. Virtio appears to be a totally reasonable platform abstraction API, and KVM will do the job as a hypervisor. I’ll set up an x86_64-elf gcc crosscompiler and use newlib as the C library. Each executable will have its own disk image, and exec will function by spawning a new VM and booting it with the target executable.

The missing piece, so far as I can tell, is a proxy representation of the hypervisor’s management interface which can be provided to a guest OS, so that our VMs can virtualize themselves – and pass on a proxy for their own hypervisor proxy, so that subprocesses can virtualize themselves in turn, recursively. This would enable the construction of a guest-OS shell managing an array of processes which are themselves independent guest-OS machines. Current thought: define the ‘virsh’ terminal interface as a serial protocol, then write a linux-side launcher process that creates a pipe-based virtual serial device and hands it off when starting up the first guest process.

With the launcher and the multitasking shell in place, a toolchain targeting this baremetal environment, and an array of virtio device drivers in the form of static libs you can link in, the platform would be ready to go.

September 23, 2015

To simplify a bit further: I want to throw away the traditional “operating system” entirely, use the hypervisor as a process manager, use virtual device IO for IPC, and implement programs as unikernels.

I think this could all be done inside Linux, using KVM or Lguest, constructing the secure new world inside the creaky, complex old one.

September 22, 2015

Perhaps the reason I can’t sell myself on a specific minimal microkernel interface is that the system I want to build is not a microkernel at all. What I really want is no interface, no API, but an exokernel system where every program is written as though it were the only occupant of a single machine.

The interior space of a POSIX machine is so complex I’ve given up on the prospect of securing it, but hypervisors seem to have accomplished the job of secure isolation well enough to make the whole “cloud computing” business work. What if processes in this hypothetical environment were merely paravirtualized machines? Each executable would be a single-purpose “operating system” for a virtual machine.

A hypervisor takes the place of the traditional kernel, VirtIO devices stand in for the usual device-manipulation syscalls, and the shell becomes a HID multiplexer. Since each process sees itself as a separate machine, there is no longer any requirement for a shared mutable filesystem; instead of communicating by manipulating shared resources, processes must share resources by communicating.

From this perspective it is no longer important to know whether the system is running on bare metal or within some other host OS. Each process merely interacts with some array of devices to accomplish some defined task. An instance of this system built for a bare-metal environment would have to include drivers for actual devices so that they can be represented as virtio elements, but from the perspective of a program, inside its paravirtual machine, it simply doesn’t matter how many layers of emulation are stacked up above.

This offers a lovely progressive path toward implementation of the various components necessary for a useful operating system, since they can be implemented one by one as QEMU guests. In effect, it’s a redefinition of the API: instead of looking at the traditional POSIX style syscall interfaces as the OS API, we simply define the notional standard PC implied by virtio as the system interface, and anything capable of running on such machine becomes a valid element of the overall system.

In effect, this means that KVM becomes the kernel, and my project would be a shell program which can multiplex a set of interface devices among an array of VMs containing the actual programs I want to use.

September 18, 2015

Now THAT’S a 3D printer

I’ve been reluctant to get on the 3D-printing hype train since I have trouble thinking of anything I would actually want to make with one – who needs more cheap plastic crap cluttering up their lives? But this is a 3D printing technology that seems like it might actually be useful – Hershey has announced a chocolate printer:

“We are now using 3-D technology to bring Hershey goodness to consumers in unanticipated and exciting ways,” said Will Papa, Chief Research and Development Officer, The Hershey Company. “3-D printing gives consumers nearly endless possibilities for personalizing their chocolate, and our exhibit will be their first chance to see 3-D chocolate candy printing in action.”

September 15, 2015

“Interim OS” project for ARM

Simple OS project for the Raspberry Pi with information about getting a kernel to boot.

September 4, 2015

Things I’d still like to improve in this hypothetical kernel interface:

– access() and measure() are blatantly inefficient and really kind of terrible; you should just get that information for free when the message comes in, and if you want to inquire about object state, the call should let you ask about a whole batch of objects at once, to reduce the impact of syscall overhead.

– the mailbox design is sort of excessively clever, not likely to survive contact with the real world. I should just make different structs for incoming and outgoing messages.

– the idea of using a single syscall for all interactions with the outside world feels really nice, but I’m not sure I’ve gotten it right yet.

– I have a strong hunch that it will be important to resize queues some time.

– It feels wrong that there’s no way to cancel a message read and send some kind of fail signal back to the sender. Perhaps the solution would be to process send errors asynchronously, as messages received? But then you would need a bidirectional pipe, which I’ve been doing my best to avoid so far.

– extend() is the wrong name but I haven’t thought of the right one yet.

– every process can currently allocate memory willy-nilly, which feels like a contradiction with the overall exokernel style. Perhaps you should have to request a block of address space from a specific allocator… This would make an address space hierarchy easier, and would make it possible to provide feedback about memory pressure. right now it’s impossible to impose policy

– the previous draft, which I didn’t publish, had a notion I liked called a “bundle” – you could pack an array of objects up as a single object , send it around as an indivisible unit, and unpack it again later. It occurred to me that queues are not entirely dissimilar: what if you could create a pipe, push a bunch of stuff into it, then send the whole pipe with all of its contents to some other object? On receipt it would be a pipe with both send and receive permission.

– I still think there ought to be a way to share writable memory through some kind of transactional key-value mechanism.

– It makes me really happy that there is no file system.

I have no idea whether I’ll actually implement any of this, but I have three specific implementation concepts in mind providing constraints as I work on the design.

The first is naturally the idea of building out a full scale desktop/laptop computer operating system, suitable for all my daily computing activities – doesn’t every systems developer fantasize about throwing it all away and starting over? The capability / exokernel strategy has some significant security benefits, and the lack of a global filesystem, or any way to implement global mutations at all, means that every layer of the system can insulate itself against the layers underneath. It also provides a mechanism allowing the user’s shell to lie, cheat, and manipulate programs to make them do what the user wants, whether they like it or not, which makes me happy when I swear at stupid javascript crap.

Of course this will never happen. An embedded RTOS for microcontroller projects is small enough that I could feasibly implement it on my own, however, and I’ve actually done so in the past – in a limited, ad-hoc way – when I worked at Synapse.

This is the second project I think about as I consider the kernel architecture: a small, efficient kernel suitable for embedded realtime applications. There are several actions which can take advantage of an MMU’s virtual addressing features if present, but Trindle will get by just fine without it – while benefiting greatly from the kind of simple memory protection features found on high-end microcontrollers.

The third and simplest project would implement the Trindle kernel as a user-space library for Unix systems, which could help an application manage its parallel data processing needs by spawning a fleet of worker threads and managing their interactions. In this environment, there is no MMU, but we can still get basically what we need through judicious use of mmap/mprotect/munmap.

I don’t really know yet how useful this would be as an actual tool, but it seems like it would be easy enough to try it out and see what happened.

Another Trindle draft

I had trouble sleeping last night so I spent a couple of hours writing up another draft of the Trindle kernel system call interface. I’ve managed to knock the complexity down a bit further without losing any functionality. Still has some issues to noodle over, but they’re growing increasingly minor and I think it’s at the point now where I could build it and it might actually work.

Every kernel-managed entity visible in user space is an object. Every object has a globally unique address. This value is only useful within a process which has access permission for that object.

typedef void *object_t;

What is the current process allowed to do with the object at this address? The result will be a bitmask of the relevant access rights from the enum.

	ACCESS_READ = 1,		// can read from this segment
	ACCESS_WRITE = 2,		// can write to this segment
	ACCESS_EXECUTE = 4, 	// can execute code inside this segment
	ACCESS_SEND = 8,		// can transfer messages into this pipe
	ACCESS_RECEIVE = 16,	// can receive messages from this pipe
int access(object_t);

How large is this object? For a memory segment, this is its size in bytes; for a pipe, this is a lower bound on the number of objects in its queue.

size_t measure(object_t);

A segment is a contiguous block of memory with a common access right. The object address is a pointer to the first byte in the block. Create a new segment by concatenating some arbitrary number of source buffers together. The kernel may zerofill the buffer up to a more convenient size. A source buffer with an address of NULL represents zerofill, not an actual copy. A new segment will have ACCESS_READ|ACCESS_WRITE.

typedef object_t segment_t;
struct buffer_t
	size_t bytes;
	uint8_t *address;
segment_t allocate(size_t, const buffer_t[]);

Processes send and receive messages through fixed-length queues called pipes. Any number of processes may send messages to a single pipe, but only one process may read from it at a time. A pipe is an abstract object, not a memory segment. A new pipe will have ACCESS_SEND|ACCESS_RECEIVE.

typedef object_t pipe_t;
pipe_t pipe(size_t queue_items);

A process communicates with the rest of the world by sending and receiving messages. A message describes a state change involving an object and/or a communication pipe.

struct message_t
	pipe_t address;
	object_t content;

For efficiency, messages are exchanged in batches, sending and receiving as many at a time as possible. A batch of messages is called a mailbox.

struct mailbox_t
	size_t count;
	message_t *address;

An outgoing message can accomplish three different jobs, depending on which fields you populate with non-NULL values.

  • both populated: share the content object by sending it through the pipe
  • address only, content NULL: receive messages from the specified pipe
  • content only, address NULL: release access to the specified object

Prepare a list of outgoing messages: the outbox. Fill out an array of message_t, then provide the address of the array base and the item count. Allocate a second array of message_t for incoming messages: the inbox. Provide the address of this array and the maximum number of messages the array can hold. Then call sync to let the system transfer as many messages as it can manage.

void sync(mailbox_t *out, mailbox_t *in);

On return, the outbox will have been sorted, grouping all of the failed messages at the beginning of the buffer, updating out->count with the number of messages which could not be sent (hopefully zero).

When a send fails, it is either because the recipient pipe has closed or because its queue was temporarily full. You can determine which it was by checking to see whether you still have ACCESS_SEND for the pipe specified in the failed message’s address.

On return, the inbox may also have been populated with incoming messages, and in->count will have been changed to reflect the number of messages that were received. The content of the remaining array items is undefined.

An incoming message can communicate several different changes of state depending on which fields are populated with non-NULL values.

  • Both address and content: we received a message from an input pipe.
  • content only: we now have exclusive ownership of this object.
  • address only: the receiver has released this pipe and it is now closed.

What does it mean to have exclusive access to an object, and why would you
want to release it?

A segment can only be safely modified when there is exactly one process with access to its contents. If one process shares a segment object with another, the sender will lose ACCESS_WRITE and the receiver will gain only ACCESS_READ.

Should the sender later release its access to the segment, however, such that there remained exactly one process with access, the one remaining process would then gain ACCESS_WRITE for that segment, whether or not it
had anything to do with the segment’s original creation.

A process can therefore transfer read/write access to a segment in one sync by sending the segment through a pipe and then by releasing its own access. When the last process releases the resource, so nobody has access to it any longer, the kernel will delete it.

Pipes work differently: any number of processes can have ACCESS_SEND, but only the creating process can ever have ACCESS_RECEIVE. When the creating process releases its access to the pipe, the pipe goes dead and all the other processes will instantly lose ACCESS_SEND.

Every process has ACCESS_EXECUTE to the segment which contains its machine code. ACCESS_EXECUTE and ACCESS_WRITE are mutually exclusive, so code segments are read-only whether they are owned by one process or many.

Each new process starts up with an input queue providing access to whatever resources the launching process has chosen to share with it.

typedef void (*entrypoint_t)(pipe_t input);

The entrypoint function cannot return since it lives at the base of the thread stack. When it’s done with its work it should call exit, passing in whichever object represents its output. If something goes horribly wrong, it can bail out on the error path instead.

void exit(object_t);
void abort(object_t);

If you have loaded or generated some code and you want to execute it, you can acquire execute permission for a segment. Once executable, a segment cannot be made writable again; you must release and recreate it if you want to change it.

void extend(segment_t);

An existing process may launch a new one, specifying an entrypoint and a pair of completion pipes that will be notified when the process terminates. The entrypoint MUST be located inside a segment with ACCESS_READ. The launch function returns the ACCESS_SEND end of a pipe representing the new process’ main input queue. The out pipe will receive the process’ final output object when it exits; if it aborts, the err pipe will get the report instead.

pipe_t launch(entrypoint_t, size_t queue_count, pipe_t out, pipe_t err);