This originally started out as a way to associate I/O commands with applications; what this ended up being was a long journey into computing history. Because of its length, I'll just present the design first and explain in the hopes of catching any questions.
In a normal computer, the system architecture puts the CPU and the RAM together, and any other devices attached to the system are either controlled by the CPU or perform DMA and interrupt it. Frequently the interrupt mechanism is limited to device-to-CPU communication, and it communicates nothing about the completed I/O operation's context.
The SViSSA (Synchronizing Virtual Storage System Architecture) is a systematic approach to system architecture. The SViSSA is based on two principles:
A PC developed with SViSSA could have a chipset divided into two main units: the CPU and the IO unit. Each has its own MMU, but the MMUs, per SViSSA, have compatible address formats, so addresses can be passed from one to the other. The IO unit would contain the various controllers, including mouse, keyboard, permanent storage and NIC, as well as attaching heavier peripherals such as the graphics card and others.
Let us assume a command-line shell running atop a TTY on this computer. The TTY controls input and output ring buffers in memory, and the shell accesses them via shared memory. We do not need to assume that the shell is a process, or that the TTY is in kernel mode; such assumptions concern the software interfaces, which we are not focusing on here.
When the user hits a key for a printing character, the keyboard controller (on the mainboard) sends the resulting character code to the terminal emulator's buffer via DMA, and also interrupts the GPU by passing it the address of the ring buffer, so that it may render the character. When the user hits Enter, the keyboard controller would then interrupt the CPU; what the interrupt entry points to can vary, but one entirely valid possibility in this scenario is that the interrupt entry points to the address of an event handler for the shell to handle the buffer input, in the shell's address space. In this case, the CPU can either jump directly into that shell event handler and start processing the data immediately, or jump into an intermediate handler to schedule execution of the shell's handler for later (effectively becoming an IO thread).
The advantage of this is that many layers of device handling code can be bypassed. Indeed, in such a PC, the kernel would be limited to merely configuring the controller's interrupt port, and the TTY would have the actual responsibility of "hooking up" the keyboard controller to the GPU and the applications requesting its input.
At the time I was struggling to understand how real-world computers did it. I knew that interrupts only signaled that a device had completed an I/O operation, but I didn't know how the I/O operation was linked to the application that had issued it. I then turned to the related problem of transmitting virtual addresses to devices, and then eventually hit on the idea of sending back the application's context as part of the interrupt. I eventually grasped how it is done, but I felt that this had to be shared, especially since there doesn't seem to be mention of this anywhere. Even IBM mainframes don't do this AFAIK.
Actually, we used to: Micro Channel Architecture (on the PS/2) specified a way for interrupts to be intercepted by any device on the bus, allowing something similar to assigning interrupt ports to any device in the SViSSA. Interrupts still don't carry information in MCA, however.
That said, the short explanation is that hardware manufacturers don't bear the costs of using a bad hardware interface, so they have no incentive to provide a better interface for a better system. Combined with a tremendous pressure towards lowering initial costs, this results in the abhorrent interfaces which firmware deals with, and part of the reason for the development of hardware abstraction layers.
Partly; HSA has unified virtual memory for the CPU and GPU, but the interrupt mechanism doesn't use the virtual memory system. So applications can pass virtual addresses to devices, but interrupts aren't passed straight to applications.
This is partly why this is here in the 'On the drawing board' section; even though the idea itself is done, there is little that I can do for now. That said, one possibility is to create an emulator for testing purposes. QEMU would be interesting, but I still haven't managed to understand its architecture enough to add other systems.