256 application registers:
64 predicate registers (PR0 - PR63): contain predicate test (compare) results, for conditional execution of instructions,
first 16 registers static: available to all programs,
rest rotating: can be renamed to accelerate loops.
8 branch registers (BR0 - BR7).
128 application registers (AR0 - AR127): special-purpose data and control registers.
4 Privilege Levels (PL): 0-3.
Current Privilege Level (CPL) in PSR.cpl (Processor Status Register, PSR).
Bi-endian memory access: controlled by UM.be bit (User Mask, UM).
Memory mapped I/O.
Processor virtualization: enabled by PSR.vm bit, managed by PAL.
Virtual Machine Monitor (VMM): managing and virtualizing system resources, creating a virtual environment (Virtual Processor Descriptor, VPD).
IA-32 compatibility mode: IA-32 System Environment, i.e. Pentium III.
16 bit Real Mode, 16 bit VM86, 16/32 bit Protected Mode, memory segmentation.
Multimedia instruction sets: MMX, SSE.
Switch between Itanium and IA-32 instruction sets using JMPE, br.ia, and rtfi.
All interruptions handled by Itanium instruction set code.
Current execution mode in PSR.is.
PA-RISC supported through Aries emulator.
Operating system: supported through Extensible Firmware Interface (EFI).
System Abstraction Layer (SAL): firmware providing platform initialization, configuration, and test, operating system boot, run-time functionality (i.e. BIOS (Basic Input Output System), Machine Checks, and Platform Management Interruptions (PMI, successor IA-32 System Management Mode (SMM))).
Processor Abstraction Layer (PAL): firmware providing processor specific Machine Checks, initialization, PMI, power management, configuration, and error recovery.
Developer's Interface Guide for IA-64 Servers (DIG64): design guidelines for building blocks and interfaces of IA-64 systems, providing an interoperable and stable baseline hardware interface for software developers.
On-die L1 cache (Harvard architecture):
On-die, unified L2 cache:
6-way set-associative, 64 byte line size,
6 cycles minimum latency for integers, 9 cycles minimum latency for FPs,
max. 2 requests per clock (banks).
Cache coherency through MESI protocol.
2 or 4 Mbyte, apart in package, connected through Front Side Bus (FSB),
4-way set-associative, 64 byte line size,
21 cycles minimum latency for integers, 24 cycles minimum latency for FPs,
bandwidth 16 bytes per core cycle (64 bit DDR, 128 bit bus to core),
12 Gbyte/s max. throughput.
Cache coherency through MESI protocol.
Translation Look-aside Buffer (TLB) and Virtual Hash Page Table (VHPT):
64 entry, fully associative,
32 entry, fully associative,
10 cycles penalty at miss,
96 entry, fully associative,
page size 4 kbyte - 256 Mbyte supported.
Advanced Load Address Table (ALAT): between L1 data cache (L1D) and DTLB, keeps track of speculative data loads,
32 entry, two-way set-associative.
Double pipeline: 10 stage in-order, 6 instructions wide.
Split issue dispersal: three instructions (16 bytes) per bundle.
Scoreboarding, non-blocking caches (for compile-time non-determinism).
17 execution units:
9 issue ports:
Dynamic prefetch, branch prediction, speculative execution.
512 entry, two-level.
Branch Target Address Cache (BTAC): 64 entry.
Interval Time Counter (ITC): register for timing ticks.
In 32 bit compatibility mode: Time Stamp Counter (TSC).
Streamlined Advanced PIC (SAPIC): based on IA-32 APIC (Advanced Programmable Interrupt Controller),
for Aborts, Interrupts, Faults, and Traps:
Virtual address space: 64 bit, no segmentation.
Multiple Address Space (MAS): each process has its own unique Virtual Region (flat linear address space).
8 61 bit Virtual Regions (Virtual Region Number, VRN; Region Identifier, RID), 224 Virtual Address Spaces of 261 bits.
4 kbyte - 256 Mbyte pages (Virtual Page Number, VPN).
Physical address space: 63 bit.
Up to 50 bits supported in page tables.
Write Coalescing (WC): streams of non-cachable writes can be combined into a single bus write transaction.
WC Buffer (WCB): two-entry, 64 byte.
Enhanced Machine Check Architecture (EMCA): parity and ECC (Error-Correcting Code) on all major address and data busses.
44 bit address bus.
133 MHz DDR bus (Merced bus): 64 bit data.
Source Synchronous Signaling (SSS).
2.1 Gbyte/s max. throughput.
Assisted Gunning Transceiver Logic signaling (AGTL+),
based on GTL+ bus of Intel Pentium III and Pentium III Xeon processors.
1.5 V ± 1.5 %.
Power pod connector.
Processor performance monitoring and profiling:
SMP (Symmetric Multi-Processing): glueless up to four processors (max. 16 in IA-32 compatibility mode).
Shared memory, cache coherency through MESI protocol.
Multiplier (Phase Lock Loop, PLL):
set through pins during reset:
|multiplier\pin||LINT LINT IGNNE# A20M#|
Power and performance management:
Thermal management: via on-die thermal diode:
System management: System Management Bus (SMBus):
CPUID: 8 byte registers:
CPUID return values:
|0x10||L1D: 16 kbyte, 4-way set-associative, 32 byte line size|
|0x15||L1I: 16 kbyte, 4-way set-associative, 32 byte line size|
|0x1A||L2: 96 kbyte, 6-way set-associative, 64 byte line size|
|0x88||L3: 2 Mbyte, 4-way set-associative, 64 byte line size|
|0x89||L2: 4 Mbyte, 4-way set-associative, 64 byte line size|
|0x8A||L2: 8 Mbyte, 4-way set-associative, 64 byte line size|
|0x90||ITLB: 64 entry, fully associative, 4 kbyte - 256 Mbyte pages|
|0x96||DTLB0: 32 entry, fully associative, 4 kbyte - 256 Mbyte pages|
|0x9B||DTLB1: 96 entry, fully associative, 4 kbyte - 256 Mbyte pages|
Set EAX register to 2, then returned in EAX, EBX, ECX, EDX registers (MSB - LSB):
Used by HP as PA-RISC replacement, and in High Performance Computing (HPC).
Only a few thousand delivered.
Succeeded by Itanium 2 in 2002.
Page viewed 13006 times since Sun 1 Mar 2009, 0:00.