The B Programming Language

This document describes the implementation of B on the PDP-7 and PDP-11. The compilers seem to be lost, but there is enough code left to reconstruct how (almost) everything worked.

The implementation of the threaded code (the B interpreter) can be found in bilib for the PDP-11, which I have disassembled and somewhat commented, and bi.s for the PDP-7, which survived in source form.

The standard library is in libb for the PDP-11, which I have partially disassembled and commented. It is mostly wrappers around syscalls. Interestingly printn and printf actually contain compiled B code, so check it out to see how the threaded code looks like. bl.s contains the (much smaller) equivalent for the PDP-7.

Threaded code

B is implemented in threaded code on the PDP-7 and PDP-11. The way it is implemented differs a bit however. This is because a word on the PDP-7 is 18 bits but an address is only 13 bits wide. On the PDP-11 a word and address are both 16 bits.

For the PDP-7 this means a B instruction word can have rather the same format as a native PDP-7 instruction: 4 bits opcode and 13 bits address (PDP-7 instructions have an additional indirection bit, this is unused by B). The opcode bits are used to index a dispatch table, The address portion of a word then contains an address or any address sized number.

On the PDP-11 all B opcodes are stored as the address of the subroutine that handles it. If an address is needed, it is stored in the following word.

The threaded program counter is word 17 on the PDP-7 (an auto-increment location) and the register r3 on the PDP-11. As an implementation detail, r2 points to the MQ register of the KE-11 extended arithmetic unit.

Word addresses

This only concerns the PDP-11. The B runtime expects word addresses but the linker can only generate byte addresses for global variables. To work around this the runtime has to patch the addresses of all global addresses at runtime before calling into B code (main). The way this is done is -- according to Steve Johnson -- a marvellous hack by Dennis Ritchie. The first instruction in each object file jumps to the end of the file and there calls a subroutine chain, whose job it is to patch all the addresses of that object file. After the return it falls off the end of the file and into the next one, which does its patching the same way. The B program has to be linked such that there is a runtime file to begin the chain (to fall into the first B object file) and one to end the chain (to catch the last B object file). To see how this looks, check printf.s and printn.s, which are compiled and disassembled B files. They do not make use of the chain mechanism because they are not linked to the proper place in the chain, but the code is there nonetheless. The code for chain is missing unfortunately but its purpose is clear.

Stack

All values are stored on a stack, which grows upwards. Two pointers, sp and dp, are used to address it. sp is the stack pointer and points to the next free word. dp is the display pointer for the current function's stack frame. Both are regular words on the PDP-7 and r5 and r4 resp. on the PDP-11.

In the decision how values are actually stored on the stack, the PDP-7 and PDP-11 differ quite a bit.

On the PDP-7 all values are stored as lvalues (i.e. their addresses) on the stack. For values that don't logically have an lvalue (i.e. constants and temporary results) the lvalue word points to the next word on the stack where the actual rvalue is stored. All others (i.e. lvalues of variables) also have this second word but don't point to or use it.

On the PDP-11 every value is stored with only one word and it is the compilers job to push lvalue and rvalues as needed. To do that, some more threaded instructions are introduced. This also means that the unary & operator cannot be implemented at runtime anymore.

Main functions

In the tables below the name is that of the PDP-7 function that implements it.
An 'x' means it is implemented,
a '-' means it should be implemented but isn't,
a blank means it's not part of the language.

These are the main instruction of B. They have either an address-sized argument or specify a secondary function code.

The former is the case for all instruction but b, n and u. On the PDP-7 the address is stored in the instruction word. On the PDP-11 the address is stored in the next word.

For the b, n, and u instructions, the address field contains a secondary code on the PDP-7 (see table below). On the PDP-11 all b, n, u instructions are distinct functions.

name    code    PDP-7   description
autop   a       040000  push automatic variable
binop   b       100000  binary operator (see table below)
consop  c       140000  push address sized constant
ifop    f       200000  jump if stack value is zero
etcop   n       240000  misc. function (see table below)
setop   s       300000  set stack pointer
traop   t       340000  jump
unaop   u       400000  unary operator (see table below)
extop   x       440000  push external variable
aryop   y       500000  define automatic vector
        z               switch statement

As an example, consider the expression 'a + 3' where 'a' is an automatic variable at stack location 4. On the PDP-7, this can be compiled to:

a 4
c 3
b 14

On the PDP-11 it looks like this:

a; 4
c; 3
b14

PDP-11 specific instructions

As explained above, there are different functions to push lvalues and rvalues on the PDP-11. Also there are variants to pop the last value and ignore it.

auto     ext   const      action
  a       x       c           push rvalue
 ia      ix      ic      pop; push rvalue
 va      vx                   push lvalue
iva     ivx              pop; push lvalue

Binary Operators

The function prefix is 'b'.

op      name    code    PDP-7   PDP-11
=       basg    1       x       x
|       bor     2       x       x
&       band    3       x       x
==      beq     4       x       x
!=      bne     5       x       x
<=      ble     6       x       x
<       blt     7       x       x
>=      bge     10      x       x
>       bgt     11      x       x
>>      brsh    12      -       x
<<      blsh    13      -       x
+       badd    14      x       x
-       bmin    15      x       x
%       bmod    16      x       x
*       bmul    17      x       x
/       bdiv    20      x       x
=|              102             x
=&              103             x
===             104             -
=!=             105             -
=<=             106             -
=<              107             -
=>=             110             -
=>              111             -
=>>             112             x
=<<             113             x
=+              114             x
=-              115             x
=%              116             x
=*              117             x
=/              120             x

Unary operators

The function prefix is 'u'

op      name    code    PDP-7   PDP-11
&       uadr    1       x        
-       umin    2       x       x
*       uind    3       x       x
!       unot    4       x       x
++x             5               x
--x             6               x
x++             7               x
x--             10              x

Miscellaneous functions

The function prefix is 'n'

name    code    PDP-7   PDP-11  description
mcall   1       x       x       call function without arguments
mark    2       x       x       mark stack location for pushing arguments
call    3       x       x       return to mark, call function with arguments
vector  4       x       x       address vector
litrl   5       x               put next instruction word on stack
goto    6       x       x       go to address
retrn   7       x       x       return from function with value
escp    10      x               escape to machine code two words after instruction
        11              x       return from function without value

squoze.net