Phase 3 — Branch predictor (--mode predictor)¶
Goal: lift the four direction predictors from project2 into a
polymorphic BranchPredictor interface, drive them from a ChampSim
trace via --mode predictor, and pin the resulting accuracy numbers
against project2's own reference output.
This phase is the smallest in lines of code (~600 LOC of algorithm plus ~400 LOC of plumbing) but the most algorithmically dense. Each of the four predictors is a textbook microarchitecture algorithm; the work is in re-expressing them clean enough that they can be both re-read as an explanatory artifact and reused unchanged by Phase 4's OoO core.
What was ported¶
| project2 piece | Where it lives now | Notes |
|---|---|---|
Counter saturating counter |
include/comparch/predictor/saturating_counter.hpp | Header-only, modern-C++. Replaces project2's C-bindings counter; same Smith-counter semantics |
| Always-Taken predictor | src/predictor/always_taken.cpp | Trivial baseline |
| Yeh-Patt two-level adaptive | src/predictor/yeh_patt.cpp | History table + pattern table; FIFO-based pipelining stripped out |
| Perceptron | src/predictor/perceptron.cpp | Per-PC weight vectors, global history register, online training with theta = floor(1.93 G + 14) |
| Hybrid (tournament) | src/predictor/hybrid.cpp | Composes Yeh-Patt + Perceptron, picks per-PC via a 4-bit tournament selector table |
BranchPredictor interface |
include/comparch/predictor/predictor.hpp | Replaces project2's function-pointer table with a virtual class |
| Stats / accuracy | src/predictor/predictor_mode.cpp | Single Stats struct, MPKI / accuracy / mispredict counts |
| Project2 trace converter | tools/proj2_to_champsim/ | One-shot: reads the project2 11-field text format, emits ChampSim binary; used to build the regression fixture |
What was dropped: project2's RUN_BRANCHSIM / RUN_BRANCHSIM_PIPELINED
modes (the standalone branch-only and 4-stage-pipelined drivers), the
per-predictor FIFOs (YP_FIFO, PCT_fifo, TNMT_fifo) used to delay
training across pipeline stages. --mode predictor predicts and
trains synchronously per branch; Phase 4 will re-introduce delay where
the OoO pipeline actually needs it.
Algorithm intuition (skim if you already know these)¶
Always-Taken¶
Predict every branch will be taken. Useful as a baseline floor — non-trivial predictors should beat ~55-65% on real workloads.
Yeh-Patt two-level adaptive¶
Two tables:
- History Table (HT) —
2^Hentries. HT[i] holds a P-bit shift register recording recent taken/not-taken outcomes for every PC that hashes to slot i. Indexed by the low H bits of(PC >> 2). - Pattern Table (PT) —
2^Pentries, each a 2-bit saturating counter. Indexed by the P-bit history pulled from HT.
So the predictor effectively asks: "the last time this PC had history pattern X, was it taken?" — and saturating counters smooth out noise. Captures any pattern with period ≤ 2^P. Reference: Yeh & Patt, MICRO-24, 1991.
Perceptron¶
For each PC slot (2^N of them), keep a vector of G+1 signed
integer weights. Inputs to the perceptron are the bias (constant +1)
plus the global history register (G bits, encoded as +1 / -1 per
position). Output = dot product. Sign of output is the prediction.
Train online: when wrong, or when the output magnitude is below a
confidence threshold theta = floor(1.93 G + 14), nudge each weight
toward agreement with the actual outcome and clamp to [-theta,
theta]. Captures linearly separable patterns over G history bits
— including a lot of long-range correlations Yeh-Patt's bounded
history can't see. Reference:
Jiménez & Lin, HPCA-7, 2001.
Hybrid (tournament)¶
Run Yeh-Patt and Perceptron in parallel on every branch. Use a
per-PC tournament selector — 4-bit saturating counters in a
2^TI-entry table — to pick which one's prediction to trust.
Train the selector only when the two sub-predictors disagree:
- Yeh-Patt right, Perceptron wrong → counter decrements (more YP)
- Perceptron right, Yeh-Patt wrong → counter increments (more PCT)
- Agree → no movement (no signal)
Both sub-predictors keep training on every branch regardless of which was selected. Reference: McFarling, DEC WRL TN-36, 1993.
The interface¶
include/comparch/predictor/predictor.hpp
namespace comparch::predictor {
struct Branch {
std::uint64_t ip = 0;
bool taken = false; // ground truth — for update only
std::uint64_t inst_num = 0; // dynamic instruction count (debug)
};
class BranchPredictor {
public:
virtual ~BranchPredictor() = default;
virtual bool predict(const Branch& b) = 0;
virtual void update(const Branch& b, bool prediction) = 0;
virtual std::string_view name() const = 0;
};
std::unique_ptr<BranchPredictor> make(const PredictorConfig&);
} // namespace
The lifecycle is predict, then update, then move to next branch.
Phase 4 will need to keep predicted state alive until a branch
resolves at retire — easy to add, but the v1 interface doesn't force
it on --mode predictor callers.
Configuration¶
"predictor": {
"type": "yeh_patt",
"history_bits": 10, // Yeh-Patt H
"pattern_bits": 5, // Yeh-Patt P
"perceptron_history_bits": 9, // Perceptron G
"perceptron_index_bits": 7, // Perceptron N
"hybrid_init": 2, // Tournament initial state, 0..3
"tournament_index_bits": 12, // Hybrid selector table size = 1 << this
"tournament_counter_bits": 4 // Hybrid selector counter width
}
The block sits at the top level of the config (alongside l1 / l2
/ memory) — that's what --mode predictor reads. The same struct
also lives nested inside core.predictor, where Phase 4's per-core
OoO pipeline will read it; for now Phase 4 hasn't shipped so the
nested copy is a placeholder.
Defaults match project2's defaults: H=10 P=5 G=9 N=7 T=2.
Verification: dual regression strategy¶
Phase 3 pins predictor accuracy two ways. Both are required to pass before the phase ships.
1. Synthetic patterns with analytically-derivable accuracy¶
tests/predictor/test_predictors.cpp
Each predictor is fed a deterministic synthetic outcome stream (all taken, alternating, period-N loop, etc.) where the expected accuracy can be derived by hand from how the algorithm works. For example:
- Yeh-Patt on all-taken with
H=10 P=5: exactly 6 mispredictions total, because the history register fills one bit at a time and each transitional history pattern indexes a fresh PT slot. Test assertscorrect == 1000 - 6exactly. - Perceptron on alternating: weight on
h_1becomes strongly negative, accuracy converges to >97%.
These tests catch coarse algorithm bugs (off-by-one, wrong table indexing, wrong update rule) without needing any external reference.
2. Cross-validation against project2's proj2sim binary¶
tests/predictor/test_proj2_regression.cpp
The strongest signal is matching project2's own output bit-for-bit. The recipe:
- Generate a deterministic 8000-record project2-format text trace with tests/predictor/fixtures/proj2/make_trace.py. It mixes four branch PCs: one always-taken, one always-not-taken, one alternating, one period-5 (TTTTN).
- Build project2 natively (no Docker — its Makefile builds clean):
make -C ../project2_v2.1.0_all proj2sim. - Run project2 against the text trace with each of the four predictors and the default parameters; capture the output.
- Convert the same text trace into ChampSim binary using
tools/proj2_to_champsim. Save under tests/predictor/fixtures/proj2/branchsim.champsimtrace. - Save the captured numbers in tests/predictor/fixtures/proj2/branchsim.expected.txt.
- The Catch2 regression test loads the binary trace, runs each of
the four predictors, and asserts the (correct, mispredicted) pair
matches
expected.txtexactly.
The numbers we're pinned against:
| Predictor | Correct | Mispredicted | Accuracy |
|---|---|---|---|
| always_taken | 2300 | 1700 | 57.50% |
| yeh_patt | 3784 | 216 | 94.60% |
| perceptron | 3587 | 413 | 89.67% |
| hybrid | 3782 | 218 | 94.55% |
(out of 4000 branches in the fixture trace.)
These match project2's proj2sim -x exactly, with no tolerance —
all four algorithms are deterministic and no float math drifts.
The fixture's make_trace.py and README.md are checked in too, so
if the trace ever needs regenerating, the recipe is self-contained.
Running it manually¶
# Build
cmake --build --preset default --target sim
# Run with the default (yeh_patt) predictor on the regression fixture
./build/default/sim --mode predictor \
--config configs/baseline.json \
--trace tests/predictor/fixtures/proj2/branchsim.champsimtrace
Output:
[INFO] sim 0.0.1 (mode=predictor)
[INFO] predictor mode: yeh_patt (history=10 pattern=5)
[INFO] walked 8000 records (4000 branches)
==== branch predictor stats ====
predictor yeh_patt
records 8000
branches 4000 (50.00 %)
correct 3784 (94.60 %)
mispredicted 216 (5.40 %)
MPKI 27.000
Switch predictor by editing the JSON or by writing a one-line override file:
jq '.predictor.type = "perceptron"' configs/baseline.json > /tmp/p.json
./build/default/sim --mode predictor --config /tmp/p.json \
--trace tests/predictor/fixtures/proj2/branchsim.champsimtrace
Code commentary policy (this phase)¶
Defaulting to "no comments" felt wrong here — the predictor algorithms are textbook pieces a future me (or anyone landing on the GitHub repo) will want to re-read for understanding, not just to fix bugs. So this phase is the one big exception:
- Every predictor source file has a top-of-file paragraph describing the algorithm in prose, plus the canonical paper citation.
- Inside
predict()andupdate(), every logical step has a one-line comment explaining the math (// fold low-order PC bits with global history to index the perceptron table, etc.). saturating_counter.hppdocuments the wrap/clamp semantics on the class itself.- Hardcoded constants in algorithm code carry the name they have in
the original paper or in project2 (e.g.
// theta = floor(1.93*G + 14)from Jiménez 2001).
The bar was: would a reader who knows C++ but not branch prediction learn something from this comment?
Other files (config plumbing, CMake, mode dispatch, tests) follow the project's normal sparse-comment style.
What's next¶
The four predictor classes are designed so Phase 4 can drop them into
the OoO core's fetch stage essentially unchanged. The one thing that
will need to grow is the BranchPredictor interface: speculative
predictions need to be checkpointed at predict-time and either
committed or rolled back at branch resolve. The plan is to add
something like:
virtual Checkpoint predict_speculative(const Branch&) = 0;
virtual void resolve(Checkpoint, bool actual) = 0;
without breaking the v1 predict() / update() calls — Phase 4 will
get the heavier API; --mode predictor keeps using the simpler one.
Until then, --mode predictor is the second canonical regression
target alongside --mode cache. Phase 4 changes that touch shared
plumbing get cross-checked against both modes.