|
|
The AMD K10 is AMD's next generation of processors.
It had previously been reported as a cancelled project by tech tabloid The Inquirer [1],
but was declared by AMD officials that the AMD K10 processor series is the immediate
successor to the AMD K8 series of processors (Athlon 64, Opteron, Sempron 64 respectively,
and sharing technologies with the Socket S1 Turion 64). Specific models have not yet been officially announced.
Characteristics of the microarchitecture
-
Form factors
- Socket AM2+ for Athlon 64 X2, Phenom X2 and Phenom X4 processors as well as single-socket Opterons and
Socket F+ for Phenom FX processors targeted at the AMD Quad FX platform as well as multi-socket Opterons,
supporting HyperTransport 3.0 with the use of DDR2 DIMMs [40].
- Backward-compatible with existing Socket AM2 and Socket F motherboards.
Instruction set additions and extensions
- New bit-manipulation instructions: Leading Zero Count (LZCNT) and
Population Count (POPCNT)
- New SSE instructions named as SSE4a: combined mask-shift instructions (EXTRQ/INSERTQ) and
scalar streaming store instructions (MOVNTSD/MOVNTSS), a subset of SSE4
- Support for unaligned SSE load-operation instructions (which formerly required 16-byte alignment)[41]
Execution pipeline enhancements
- 128-bit wide SSE units
- Wider L1 data cache interface allowing for two 128-bit loads per cycle
(as opposed to two 64-bit loads per cycle with K8)
- Lower integer divide latency
- 512-entry indirect branch predictor and a larger return stack (size doubled from K8) and
branch target buffer
- Side-Band Stack Optimizer, dedicated to perform increment/decrement of
register stack pointer
- Fastpathed CALL and RET-Imm instructions (formerly microcoded) as well as MOVs from
SIMD registers to general purpose registers
Integration of new technologies onto CPU die:
- Four processor cores (Quad-core)
- Split power planes for CPU core and memory controller/northbridge for more
effective power management, first dubbed Dynamic Independent Core Engagement or
D. I. C. E. by AMD and now known as Enhanced PowerNow!, allowing the cores and
northbridge (integrated memory controller) to scale power consumption up or down
independently [42].
Improvements in the memory subsystem:
- Improvements in access latency:
- Support for re-ordering loads ahead of other loads and stores
- More aggressive instruction prefetching, 32 bytes instruction prefetch as
opposed to 16 bytes in K8
- DRAM prefetcher for buffering reads
- Buffered burst writeback to RAM in order to reduce contention
- Changes in memory hierarchy:
- Prefetch directly into L1 cache as opposed to L2 cache with K8 family
- 32-way set associative L3 victim cache sized at least 2 MiB, shared between processing
cores on a single die (each with 512 KiB of independent exclusive L2 cache), with
a sharing-aware replacement policy.
- Extensible L3 cache design, with 6 MiB planned for 45 nm process node,
with the chips codenamed Shanghai.
- Changes in address space management:
- Two 64-bit independent memory controllers, each with its own physical address space;
this provides an opportunity to better utilize the available bandwidth in case of
random memory accesses occurring in heavily multi-threaded environments.
This approach is in contrast to the previous "interleaved" design, where
the two 64-bit data channels were bounded to a single common address space.
- Larger Tagged Lookaside Buffers; support for 1 GiB page entries and
a new 128-entry 2 MiB page TLB
- 48-bit memory addressing to allow for 256 TiB memory subsystems
- Memory mirroring, data poisoning support and Enhanced RAS
- Nested page tables for AMD-V virtualization technology, claimed to have
decreasing world switch time by 25%.
Improvements in system interconnect:
- HyperTransport retry support
- Support for HyperTransport 3.0, with HyperTransport Link unganging which creates
8 point-to-point links per socket.
Platform-level enhancements with additional functionality:
- Five p-states allowing for automatic clock rate modulation
- Increased clock gating
- Official support for coprocessors via HTX slots and vancant CPU sockets through
HyperTransport: Torrenza initiative.
|