SiFive’s P550 microarchitecture represents a critical development in the RISC-V landscape, offering a glimpse into the ongoing evolution of open-source instruction sets. SiFive plays a pivotal role in moving RISC-V towards higher performance applications, analogous to Arm’s position in the CPU design space. By designing CPU blocks, both SiFive and Arm enable implementers to craft complete chips, lowering the cost of entry into high-performance designs.

P550 Overview and Performance

The P550 is a 3-wide out-of-order core with a 13-stage pipeline, designed to achieve “30% higher performance in less than half the area of a comparable Arm Cortex A75.” This core is implemented in the Eswin EC7700X SoC, featuring a quad-core P550 cluster operating at 1.4 GHz with a shared 4 MB cache. It is manufactured on TSMC’s 12nm FFC process, with the Premier P550 Dev Board hosting 16 GB of LPDDR5-6400 memory.

Architecture and Branch Prediction

P550’s design focuses on out-of-order execution, crucial for extracting instruction level parallelism, especially given the constraints of cache and memory latency. Unlike SiFive’s first out-of-order design, the U87, the P550 is a more mature core, comparable to Arm’s Cortex A75 in its 3-wide out-of-order architecture.

Branch prediction is vital for performance and power efficiency. SiFive equipped the P550 with a 9.1 KiB branch history table, offering reasonable pattern recognition. While it handles longer patterns better with fewer branches than the Cortex A75, its capacity still trails high-performance cores. The P550 also incorporates a 32-entry BTB, handling taken branches efficiently within its capacity but lacking additional BTB levels for more complex predictions.

Frontend, Decode, and Execution

The P550’s frontend features a parity-protected 32 KB instruction cache, supporting effective instruction delivery. It can maintain up to 3 IPC as long as the code fits within the L1 instruction cache. Instruction misses cause a bandwidth drop, but the design still manages reasonable IPC levels from L2 and L3 caches.

Out-of-order execution capabilities are slightly higher than the Cortex A75’s, but with less versatility in execution buffers due to the absence of out-of-order retirement features. The P550 handles scalar integer and floating-point operations with a balance of power efficiency and performance, though it lacks the vector execution support found in the A75.

Memory Subsystem

The P550’s memory subsystem prioritizes latency over bandwidth. It follows a similar hierarchical cache structure as many mainstream designs, with core-private L1 and L2 caches. The L1 data cache services loads and stores with minimal latency but struggles with unaligned accesses. The L2 cache provides a balance of size and speed, retaining efficiency in catching L1 misses.

Interconnect and L3 Cache

The interconnect system is modular, catering to varied market needs with scalable core clusters and shared L3 cache banks. The EIC7700X implementation includes a 4 MB L3 cache across four banks, with decent bandwidth and latency relative to its design goals.

Future Prospects for SiFive and RISC-V

SiFive’s P550 is a significant step forward in bringing RISC-V to higher performance applications, aiming for a balance of performance within constrained power and area footprints. While challenges remain, particularly in clock speeds, unaligned access handling, and the absence of vector support, the P550 sets a foundation for future development.

As SiFive continues to develop more sophisticated cores, such as the announced P870, there is excitement about RISC-V achieving parity with established architectures. The progress demonstrated by the P550 suggests a promising future for RISC-V as a competitive player in the CPU market.

Consider supporting Chips and Cheese through Patreon or PayPal, and join the community on Discord for more updates and discussions.