80386 Protection
nand2mario.github.io111 points by nand2mario 3 days ago
111 points by nand2mario 3 days ago
I've wondered for a long time if we would have been able to make do without protected mode (or hardware protection in general) if user code was verified/compiled at load, e.g. the way the JVM or .NET do it...Could the shift on transistor budget have been used to offset any performance losses?
Microsoft Research had an experimental OS project at one point that does just that with everything running in ring 0 in the same address space:
https://en.wikipedia.org/wiki/Singularity_(operating_system)
Managed code, the properties of their C# derived programming language, static analysis and verification were used rather than hardware exception handling.
Fil-C vs CHERI vs SeL4 vs YOLO
I think hardware protection is usually easier to sell but it isn't when it is slower or more expensive than the alternative.
"Operating System Principles" (1973) by Per Brinch Hansen. A full microkernel OS (remake of RC-4000 from 1967) written in a concurrent dialect of Pascal, that also manages to make do without hardware protection support.
I think TempleOS also worked like this, though its certainly better known for its "other" features.
edit: I missed it was linked on the above page
In TempleOS, everything runs in ring 0, but that's not the same as doing protection in software (which would require disallowing any native code not produced by some trusted translator). It simply means there's no protection at all.
I looked into that, concluded the spoiler is Specter.
Basically, you have to have out of order/speculative execution if you ultimately want the best performance on general/integer workloads. And once you have that, timing information is going to leak from one process into another, and that timing information can be used to infer the contents of memory. As far as I can see, there is no way to block this in software. No substitute for the CPU knowing 'that page should not be accessible to this process, activate timing leak mitigation'.
OTOH, out of order/speculative execution only amounts to information disclosure. And general purpose OS's (without mandatory access control or multilevel security, which are of mere academic interest) were never designed to protect against that.
A far greater problem is that until very recently, practical memory safety required the use of inefficient GC. Even a largely memory-safe language like Rust actually requires runtime memory protection unless stack depth requirements can be fully determined at compile time (which they generally can't, especially if separately-provided program modules are involved).
I think the interesting thing about having protection in software is you can do things differently, and possibly better. Computers of yesteryear had protection at the individual object level (eg https://en.wikipedia.org/wiki/Burroughs_Large_Systems). This was too expensive to do in 1970s hardware and so performance sucked. Maybe it could be done in software better with more modern optimizing compilers and perhaps a few bits of hardware acceleration here and there? There's definitely an interesting research project to be done.
Sadly, even software-filled TLBs look to be a thing of the past. Apparently a hardware page-table walker is just that much faster? I’m not sure.
Why is that surprising? The trap into kernel mode alone would already take more cycles than dedicated hardware needs for the full page table walk.
Since we're talking about defining our own processor, that means we need to define one with cheaper traps.
Expanding on what I wrote above about "bits of hardware acceleration", maybe adding a few primitives to the instruction set that make page table walking easier would help.
And with a trusted compiler architecture you don't need to keep the ISA stable between iterations, since it's assumed that all code gets compiled at the last minute for the current ISA.
Lots of fun things to experiment with.
ah, PDE/PTE A/D writes... what a source of variety over the decades!
some chips set them step by step, as shown in the article
others only set them at them very end, together
and then there are chips which follow the read-modify-write op with another read, to check if the RMW succeeded... which promptly causes them to hang hard when the page tables live in read-only memory i.e. ROM... fun fun fun!
as for segmentation fun... think about CS always being writeable in real mode... even though the access rights only have a R but no W bit for it...
That's because CS in real/V86 mode is actually a writable data segment. Most protection checks work exactly the same in any mode, but the "is this a code segment?" check is only done when CS is loaded in protected mode, and not on any subsequent code fetch.
Using a non-standard mechanism of loading CS (LOADALL or RSM), it's possible to have a writable CS in protected mode too, at least on these older processors.
There's actually a slight difference in the access rights byte that gets loaded into the hidden part of a segment register (aka "descriptor cache") between real and protected mode. I first noticed this on the 80286, and it looks to be the same on the 386:
- In protected mode, the byte always matches that from the GDT/LDT entry: bit 4 (code/data segment vs. system) must be set, the segment load instruction won't allow otherwise, bit 0 (accessed) is set automatically (and written back to memory).
- In real and V86 mode, both of these bits are clear. So in V86 mode the value is 0xE2 instead of the "correct" 0xF3 for a ring 3 data segment, and similarly in real mode it's 0x82 (ring 0).
The hardware seems to simply ignore these bits, but they still exist in the register, unlike other "useless" bits. For example, LDT only has bit 7 (present), and GDT/IDT/TSS have no access rights byte at all - they're always assumed to be present, and the access rights byte reads as 0xFF. At least on the 286 that was the case, I've read that on the Pentium you can even mark GDT as not-present, and then get a triple fault on any access to it.
Keeping these bits, and having them different between modes might have been an intentional choice, making it possible to determine (by ICE monitor software) in what mode a segment got loaded. Maybe even the two other possible combinations (where bit4 != bit0) have some use to mark a "special" segment type that is never set by hardware?
Interesting to see how hardware designers of yesteryear did things, and why CPUs are so complicated and have so many bugs.
Article states that win 3.0 used 32-bit flat addressing mode, but when win 95 launched ms said win 3.0 didn’t (in 386 mode).
Pretty sure Enhanced Mode, that only came later in Windows 3.11 for Workgroup, is the one that supported the flat addressing mode.
Enhanced mode was already in 3.0 (and I think allowed for flat addressing)
However, Win32s was introduced in 3.11 which a subset of the Windows 32-bit API from NT.
3.11 also introduced 32-bit disk access and 32-bit drivers.
Microsoft did 32-bit in steps -- it was confusing already back then.
I remember I started my internship in June 1995. We were doing stuff with this brand new thing called the World Wide Web.
They gave us a win3.1 computer and Spyglass Mosaic which required the Win32s susbsystem.
http://www.win3x.org/win3board/viewtopic.php?t=4971&view=min
The full time guys all had a Sun on their desk next to their PC. We also had to run an IBM 3270 terminal emulator and X server to connect to the Suns. It was all so unstable. I rememember a bunch of "Win32s error" popups.
The other intern and I found a room full of decommissioned 486 machines, installed Linux and didn't tell anyone for a month. Everything worked great and then we started an assembly line of installing Linux on those old machines for all the older coworkers to take home.
> 3.11 also introduced 32-bit disk access and 32-bit drivers.
IIRC a lot of it wasn't turned on by default due to hardware/driver compatability concerns, and there were articles all over the place about how to turn it on for extra performance. Essentially they used optimising tech-heads the world over as a giant beta-test group for parts of Win95's IO subsystem.
It used segmented 32-bit mode. Flat mode doesn’t support virtual addressing which was accomplished with the descriptor tables (and the ES register) if I recall correctly. lol it’s been 33 years since I wrote windows drivers. Had to use masm to compile the 16-bit segments to thunk to the kernel
Made me think of the old Desqview
> These features made possible Windows 3.0, OS/2, and early Linux.
And also--before Linux--SCO Xenix and then SCO Unix. It was finally possible to run a real Unix on a desktop or home PC. A real game changer. I paid big $$$ (for me at the time) to get SCO Xenix for my 386 so I could have my own Unix system.
Xenix 2.1 could run on the IBM PC XT with Intel 8088 in late 1983, IIRC, and even before that on the Altos 586 which had MMU as an external chip.
For that matter, the "second" version of UNIX ran on a PDP-11/20 with no memory protection or MMU, and there were a few versions after intended to run on similar hardware (LSX, MINI-UNIX).
The PDP-11's MMU option was closer to the 8088's segmentation model I think, but I've never coded either, so dunno really. It does seem like it was possible to port "PDP-11 UNIX" to a lot more platforms than would get "VMUNIX".