The B Programming Language
This document describes the implementation of B on the PDP-7 and PDP-11. The compilers seem to be lost, but there is enough code left to reconstruct how (almost) everything worked.
The implementation of the threaded code (the B interpreter) can be found in bilib for the PDP-11, which I have disassembled and somewhat commented, and bi.s for the PDP-7, which survived in source form.
The standard library is in libb for the PDP-11, which I have partially disassembled and commented. It is mostly wrappers around syscalls. Interestingly printn and printf actually contain compiled B code, so check it out to see how the threaded code looks like. bl.s contains the (much smaller) equivalent for the PDP-7.
B is implemented in threaded code on the PDP-7 and PDP-11. The way it is implemented differs a bit however. This is because a word on the PDP-7 is 18 bits but an address is only 13 bits wide. On the PDP-11 a word and address are both 16 bits.
For the PDP-7 this means a B instruction word can have rather the same format as a native PDP-7 instruction: 4 bits opcode and 13 bits address (PDP-7 instructions have an additional indirection bit, this is unused by B). The opcode bits are used to index a dispatch table, The address portion of a word then contains an address or any address sized number.
On the PDP-11 all B opcodes are stored as the address of the subroutine that handles it. If an address is needed, it is stored in the following word.
The threaded program counter is word 17 on the PDP-7 (an auto-increment location) and the register r3 on the PDP-11. As an implementation detail, r2 points to the MQ register of the KE-11 extended arithmetic unit.
This only concerns the PDP-11.
The B runtime expects word addresses
but the linker can only generate byte addresses for global variables.
To work around this the runtime has to patch the addresses
of all global addresses at runtime before calling into B code (
The way this is done is -- according to Steve Johnson --
a marvellous hack by Dennis Ritchie.
The first instruction in each object file jumps to the end of the file
and there calls a subroutine
whose job it is to patch all the addresses of that object file.
After the return it falls off the end of the file and into the next one,
which does its patching the same way.
The B program has to be linked such that there is a runtime file
to begin the chain (to fall into the first B object file)
and one to end the chain (to catch the last B object file).
To see how this looks, check
which are compiled and disassembled B files.
They do not make use of the chain mechanism because they are not
linked to the proper place in the chain,
but the code is there nonetheless.
The code for
chain is missing unfortunately but its purpose is clear.
All values are stored on a stack, which grows upwards. Two pointers, sp and dp, are used to address it. sp is the stack pointer and points to the next free word. dp is the display pointer for the current function's stack frame. Both are regular words on the PDP-7 and r5 and r4 resp. on the PDP-11.
In the decision how values are actually stored on the stack, the PDP-7 and PDP-11 differ quite a bit.
On the PDP-7 all values are stored as lvalues (i.e. their addresses) on the stack. For values that don't logically have an lvalue (i.e. constants and temporary results) the lvalue word points to the next word on the stack where the actual rvalue is stored. All others (i.e. lvalues of variables) also have this second word but don't point to or use it.
On the PDP-11 every value is stored with only one word and it is the compilers job to push lvalue and rvalues as needed. To do that, some more threaded instructions are introduced. This also means that the unary & operator cannot be implemented at runtime anymore.
In the tables below
the name is that of the PDP-7 function that implements it.
An 'x' means it is implemented,
a '-' means it should be implemented but isn't,
a blank means it's not part of the language.
These are the main instruction of B. They have either an address-sized argument or specify a secondary function code.
The former is the case for all instruction but b, n and u. On the PDP-7 the address is stored in the instruction word. On the PDP-11 the address is stored in the next word.
For the b, n, and u instructions, the address field contains a secondary code on the PDP-7 (see table below). On the PDP-11 all b, n, u instructions are distinct functions.
name code PDP-7 description autop a 040000 push automatic variable binop b 100000 binary operator (see table below) consop c 140000 push address sized constant ifop f 200000 jump if stack value is zero etcop n 240000 misc. function (see table below) setop s 300000 set stack pointer traop t 340000 jump unaop u 400000 unary operator (see table below) extop x 440000 push external variable aryop y 500000 define automatic vector z switch statement
As an example, consider the expression 'a + 3' where 'a' is an automatic variable at stack location 4. On the PDP-7, this can be compiled to:
a 4 c 3 b 14
On the PDP-11 it looks like this:
a; 4 c; 3 b14
PDP-11 specific instructions
As explained above, there are different functions to push lvalues and rvalues on the PDP-11. Also there are variants to pop the last value and ignore it.
auto ext const action a x c push rvalue ia ix ic pop; push rvalue va vx push lvalue iva ivx pop; push lvalue
The function prefix is 'b'.
op name code PDP-7 PDP-11 = basg 1 x x | bor 2 x x & band 3 x x == beq 4 x x != bne 5 x x <= ble 6 x x < blt 7 x x >= bge 10 x x > bgt 11 x x >> brsh 12 - x << blsh 13 - x + badd 14 x x - bmin 15 x x % bmod 16 x x * bmul 17 x x / bdiv 20 x x =| 102 x =& 103 x === 104 - =!= 105 - =<= 106 - =< 107 - =>= 110 - => 111 - =>> 112 x =<< 113 x =+ 114 x =- 115 x =% 116 x =* 117 x =/ 120 x
The function prefix is 'u'
op name code PDP-7 PDP-11 & uadr 1 x - umin 2 x x * uind 3 x x ! unot 4 x x ++x 5 x --x 6 x x++ 7 x x-- 10 x
The function prefix is 'n'
name code PDP-7 PDP-11 description mcall 1 x x call function without arguments mark 2 x x mark stack location for pushing arguments call 3 x x return to mark, call function with arguments vector 4 x x address vector litrl 5 x put next instruction word on stack goto 6 x x go to address retrn 7 x x return from function with value escp 10 x escape to machine code two words after instruction 11 x return from function without value