Problem Statements
Given a trace T of load and store instructions
find an algorithm that changes the memory addresses in the instructions such that
the resulting trace T′ is semantically equivalent to T but
performs equal or better than T for a given cache model C.
Given a trace T of load and store instructions
find metrics that characterize the trace performance for a given cache model C and
implement an execution engine for computing their quantities.
Hardware Model
Performance
Memory Access Type |
Costs in Cycles |
---|---|
cache load | 1 |
cache store | 1 |
memory load | 5 |
memory store | 5 |
Temporal Performance
(cycles per access [CPA])
T ... trace and C ... cache
$$ \text{CPA}(T,C) = \frac{\text{Sum of cycles}(T, C)}{\text{Sum of accesses}(T)} $$Memory Access Trace
// sum three numbers, sum = x + y + z
void main() {
int sum, x, y, z;
sum = 0;
x = 1;
y = 2;
sum = sum + x;
z = 3;
sum = sum + y;
sum = sum + z;
}
// memory access trace (length 13)
store &sum // sum = 0
store &x // x = 1
store &y // y = 2
load &sum // sum = sum + x
load &x
store &sum
store &z // z = 3
load &sum // sum = sum + y
load &y
store &sum
load &sum // sum = sum + z
load &z
store &sum
Liveness
Trace Transformation
Original Trace
$$ CPA(T_{original}, C_{LRU}) =$$
$$\frac{1*(5 + 4) + 5*(4 + 5)}{13} = 4.15$$
Single Assignment Trace (SA)
$$ CPA(T_{sa}, C_{LRU}) =$$
$$\frac{1*(2 + 7) + 5*(4 + 7)}{13} = 4.92$$
Compact Trace
$$ CPA(T_{compact}, C_{LRU}) =$$
$$\frac{1*(2 + 6) + 5*(4 + 4)}{13} = 3.69$$
cache loads cache stores memory loads memory stores
Experiments
cache size=32KB, cache line size=64byte