The Time[…] keyword is quite different in character then the other keywords insofar as its implementation is intrinsically hardware dependent. Writing the code that realizes Time[…] can be completely different from platform to platform. The following is a description of a way to count executable machine instructions on the PowerPC. The references are all from The PowerPC™ Architecture.

Implementing Cardinal[Time][…] on the PowerPC

A Cardinal TimeValue measures time in instruction cycles.

Cardinal[Time][expr]'s realization is completely dependent on processor architecture. Modern processors have a high degree of parallel processing, that nonetheless adhere to the Von Neuman Machine model. Cardinal[Time][expr] will be implemented differently, processor to processor…

On the PowerPC the Cardinal[Time][expr] instruction can be realized by modifying the execution code to keep track of all instructions executed. Suppose exprStream is an array of machine instructions implementing the execution of Cardinal[Time][expr]. Each Branch instruction encountered in exprStream is "fortified" with code that keeps an account of the machine instructions without interfering with the evaluation of expr. Let's call the result of this code "fortification", timableExpr. It is a machine instruction sequence with each branch replaced with address accounting routines.

The program strategy is based on the following two facts:

There are no computational instructions that modify storage. To use a storage operand in a computation and then modify the same or another storage location, the content of storage must be loaded into a register, modified, and then stored back to the target location.
The sequence of instruction execution can be changed by the Branch instructions. This is not true of instructions executed by the FixedPointProcessor or the FloatingPointProcessor. A consequence of this architecture is that an account of the execution sequence can be had by tallying instruction addresses at the beginning, at each branch, and at the end of the execution code.

Suppose AddressTally is an array of instruction addresses. Its first element is the address of the first instruction in timableExpr. As noted above, each Branch instruction is fortified with instruction address accounting routines. Branch instructions are further divided into those that link to the address in the LinkRegister, and those that link to the Branch's coded EA.

If the Branch instruction has LK = 1

If the Branch instruction has LK = 1, then at that instruction, the machine instructions that effect the following process are spliced into the original code. This is called the instruction splice.

lastLink = LinkRegister
LinkRegister = LinkedSoftTraceAddress
lastAddress = (* Address immediately before this instruction splice. *)
Execute original branch instruction.
LinkRegister = lastLink

[end code splice]

A consequence of the above instruction splice, is that if the executed Branch Instruction branches, it goes to SoftTrace which does the following:

(* LinkedSoftTrace …starts at LinkedSoftTraceAddress. *)

Stuff lastAddress and lastLink into AddressTally.

(* lastAddress is the last of an unbroken sequence, and lastLink is the address of a new sequence.

In principle, anything could be done at this stage rather then merely accounting addresses.

Unconditionally branch to lastLink.

If the Branch instruction has LK = 0

If the Branch instruction has LK = 0 in the exprStream, then it is edited in timableExpr so that LK = 1. Let's call the edited branch, newBranch. Here is the process that gets spliced in where the original Branch instruction was:

lastLink = LinkRegister
lastAddress = (* Address immediately before this instruction splice. *)
LinkRegister = NoLinkSoftTraceAddress
newBranch
subtract splice code length from AddressTally. (This could be done by putting the beginning and end of the splice in the address sequence…)

(* NoLinkSoftTrace …starts at NoLinkSoftTraceAddress. *)

BranchAddress = EAFromBranch[lastAddress, original Branch instruction]

(* EAFromBranch[lastAddress, original Branch instruction] extracts branch target EA from the original Branch instruction. This calculation must account for the various means of generating the EA, and the code offsets produced by the spliced code segments. See effective address from Branch Instructions.*)

Stuff lastAddress and BranchAddress into AddressTally.

(* lastAddress is the last of an unbroken sequence, and BranchAddress begin the new branch sequence.

In principle, anything could be done at this stage rather then merely accounting addresses.

Unconditionally branch to lastLink.

Writing an instruction parser that will generate timableExprs should be relatively straightforward from the above description. The final implementation of Cardinal[Time][expr] puts everything together to do something very simple. While this may not seem like an important procedure, the processes developed here have wide ranging code processing potential. All instruction Timed expressions will have to use something like the procedures above to keep account of the instructions executed. Many applications such as source listing or tracing and debugging applications come to mind as well…

Appendix

The Appendix gives more relevant details specific to the PowerPC.

These document scraps and the glossary were compiled for the purpose of developing the foregoing Cardinal[Time][…] program strategy.

Effective Address Calculation

The 64- or 32-bit address computed by the processor when executing a Storage Access or Branch instruction or when fetching the next sequential instruction is called the "effective address", and specifies a byte in storage.

For a Storage Access instruction, if the sum of the effective address and the operand length exceeds the maximum effective address, the storage operand is considered to wrap around from the maximum effective address to effective address 0, as described below [not].

Effective addresses are computed as follows. In the descriptions below, it should be understood that "the contents of a GPR" refers to the entire 64-bit contents, independent of mode, but that in 32-bit mode, only bits 32:63 of the 64-bit result of the computation are used to address storage.

With X-form instructions, in computing the effective address of a data element, the contents of the GPR designated by RB are added to the contents of the GPR designated by RA or to zero if RA=0.

With D-form instructions, the 16 bit D field is sign-extended to form a 64-bit address component. In computing the effective address of a data element, this address component is added to the contents of the GPR designated by RA or to zero if RA = 0.

With DS-form instructions, the 14-bit DS field is concatenated on the right with 0b00 and sign-extended to form a 64-bit address, this address component is added to the contents of the GPR designated by RA or to zero if RA = 0.

With I-form Branch instructions, the 24-bit LI filed is concatenated on the right with 0b00 and sign-extended to form a 64-bit address component. If AA=0, this address component is added to the address of the branch instruction to form the effective address of the next instruction. If AA =1, this address component is the effective address of the next instruction.

Effective Address Calculation for Branch Instructions:
With B-form Branch instructions, the 14-bit BD field is concatenated on the right with 0b00 and sign-extended to form a 64-bit address component. If AA = 0, this address component is added to the address of the branch instruction to form the effective address of the next instruction. If AA = 1, this address component is the effective address of the next instruction.

With XL-form Branch instructions, bits 0:61 of the Link Register or the Count Register are concatenated on the right with 0b00 to form the effective address of the next instruction.

With sequential instruction fetching, the value 4 is added to the address of the current instruction to form the effective address of the next instruction.

Notes & Glossary

MachineSize adj.

A MachineSize Cardinal is the IntegerProcessor's standard Integer-Register size.

Modern Central Processor Units (CPUs) divide processing into various register-defined processors. For example, the PowerPC divides machine function into BranchProcessor, FixedPointProcessor, and a FloatingPointProcessor. The "G-5" series adds yet another category: a so-called VectorProcessor. The following is from "The PowerPC Architecture", p. 11 under "1.6 Processor Overview".

The processor implements the instruction set, the storage model, and other facilities defined in Book I, Instructions that the processor can execute fall into three classes:

• branch instructions,

• fixed-point instructions, and

• floating-point instructions.

Branch instructions are described in Section 2.4, "Branch Processor Instructions," on page 35.

Fixed-Point Processor Instructions on page 49. Floating–point instructions are described in Section 4.6, "Floating-Point Processor Instructions," on page 167.

…

There are no computational instructions that modify storage. To use a storage operand in a computation and then modify the same or another storage location, the content of storage must be loaded into a register, modified, and then stored back to the target location. Figure 1 (page 12) is a logical representation of instruction processing. Figure 2 on page 13 shows the registers of the PowerPC User Instruction Set Architecture.

The "Logical Processing Model", (Figure 1 on page 12) shows the BranchProcessor as something that gets instructions or data from Storage, executes an instruction, that loads the FixedPointProcessor or the FloatingPointProcessor with instructions. The BranchProcessor's ConditionRegister can be effected by instructions in the FixedPointProcessor and the FloatingPointProcessor.

Both the FixedPointProcessor and the FloatingPointProcessor can put or get data from Storage. From this model, it is clear that the BranchProcessor interprets code…

BranchProcessor n.

The sequence of instruction execution can be changed by the Branch instructions…

The Branch instructions compute the effective address (EA) of the target (next instruction address) in one of the following ways…

Adding a displacement to the address of the branch instruction (Branch or Branch Conditional) with AA = 0).

Specifying an absolute address (Branch or Branch Conditional with AA = 1).

Using the address contained in the Link Register (Branch Conditional to Link Register).

Using the address contained in the Count Register (Branch Conditional to Count Register).

FixedPointProcessor n.

1. General Purpose Registers (GPRs)

All manipulation of information is done in registers internal to the FixedPointProcessor. The principal storage internal to the FixedPointProcessor is a set of 32 general purpose registers (GPRs).

Each GPR is a 64-bit (32 –bit) register.

GPR 00

GPR 01

…

GPR 31

2. FixedPointExceptionRegister

…

BI (11:15) Field used to specify a bit in the CR to be used as the condition of a Branch Conditional instruction.

BO (6:10) Field used to specify options for the Branch Conditional instructions. The encoding is described in section 2.4, "Branch Processor Instructions" on p. 35.

EA When the processor branches or proceeds to the next instruction, it is said to go to the effective address. Effective address computations, for both data and instruction accesses, use 64(32)-bit unsigned binary arithmetic regardless of mode. See Effective Address Calculation.

LK (31) Link bit.

0 Do not set the Link Register.

1 Set the Link Register. If the instruction is a Branch instruction, the address of the instruction following the Branch instruction is placed into the Link Register.

Link Register

LR The Link Register (LR) is a 64-bit (32-bit) register. It can be used to provide the branch target address for the Branch Conditional to Link Register instruction, and it holds the return address after Branch and Link instructions.

RA (11:15) Field used to specify a GPR to be used as a source or as a target.

RB(16:20) Field used to specify a GPR to be used as a source.

Grok32`

by John Van Wie Bergamini.