

### Model predictive control and selflearning of thermal models for multi-core platforms\*

Luca Benini Luca.benini@unibo.it

\*Work supported by Intel Labs Braunschweig

ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA

IL PRESENTE MATERIALE È RISERVATO AL PERSONALE DELL'UNIVERSITÀ DI BOLOGNA E NON PUÒ ESSERE UTILIZZATO AI TERMINI DI LEGGE DA ALTRE PERSONE O PER FINI NON ISTITUZIONAL



ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA



# The Thermal Crisis

• Never-ending shrinking: smaller, faster...



• Thermalessouses postsispots, thermal gradients...





### 3D-SoCs are even worse





## A System-level View

• Heat density trend 2005-2010 (systems)



#### Cooling and hot spot avoidance is an open issue!

ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA



# Multi-scale Problem

- Increasing power density
- Thermal issues at multiple levels
  - Chip / component level
  - Server/board level
  - Rack level
  - Room level



#### **Today's focus: Chip level**

ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA



### **Thermal Management**

Vecnology scaling High performace requirements

> High power densities

software

Spatial and tempora workload variation

Limitated

#### **Dynamic Approach:**

# on-line tuning of system performance and temperature through closed-loop control

Leakage current

Hot spots, thermal gradients and cycles

Reliability lost, Aging

## Management Loop: Holistic view





#### Outline

- Introduction
- Energy Controller
- Thermal Controller architecture
- Learning (self-calibration)
- Scalability
- Simulation Infrastructure
- Results
- Conclusion



### **DRM - General Architecture**



ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA



### **Energy Controller**





### **Energy Controller**



ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA



#### Outline

- Introduction
- Energy Controller
- Thermal Controller architecture
- Learning (self-calibration)
- Scalability
- Simulation Infrastructure
- Results
- Conclusion



### **Thermal Controller**





### **MPC Robustness**





#### Outline

- Introduction
- Energy Controller
- Thermal Controller architecture
- Learning (self-calibration)
- Scalability
- Simulation Infrastructure
- Results
- Conclusion



#### **Thermal Model & Power Model**





ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA



#### **Model Structure**



ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA



#### **LS System Identification**



ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA



#### **Experimental setup**





#### **Workload & Temperature**



Pseudorandom workload pattern



#### **Black-box Identification**

#### Identification based on pure LS fitting

#### MEASURED vs. SIMULATED TEMPERATURE



# DIORUM

### Partially unobservable model



ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA



### **Multi-step Identification**

Power model P=g(w,f) initially unknown



**1° STEP**: set *f*=const, set *w* as  $[0|1]^{N}$  sequence  $\rightarrow [P1|P0]^{N}$  with P1, P0 pre-measured in steady state, we measure T to obtain  $A_0$  by LS

**2° STEP**: A is known, we set f, w, we measure T, we invert A and we obtain P

**3° STEP**: P is known, we now generate richer sequence w,f and we re-calibrate A by LS

Iterate until convergence



#### Validation

**Problem 3: Model is not physical** 

#### Identification algorithm must be aware of physical properties to avoid over-fitting

ALMA MATER STUDIORUM

– UNIVERSITA DI BOLOGNA



#### **Constrained Identification**



# S DIORUM

#### **Quasi-steady-state accuracy**



Possible causes:

- Package thermal inertia?
- Environment inertia (Air)?
- P<sub>LEAK</sub> temperature dependency?

Identification with pseudorandom trace:

• Too many samples, huge LS computation



### Addressing models stiffnes

- Modelling the third time constant as heat sink temperature variation
- One-pole model identification





#### Outline

- Introduction
- Energy Controller
- Thermal Controller architecture
- Learning (self-calibration)
- Scalability
- Simulation Infrastructure
- Results
- Conclusion



### **MPC Scalability**





### **Addressing Scalability**





### **Distributed Control**







**Distributed Controller** 

ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA

# **Distributed Thermal Controller**



# **Explicit Distributed Controller**



# O.S.Implementation – Linux SMP

- Controller routines
  - Scheduler Routine Extension
    - Is distributed
    - · Executes on the core it relies on
  - Timing: Scheduler tick (1-10ms)
- CPI estimation
  - Performance counters:
    - Clock expired
    - Instructions retired
- Energy Controller
  - Look-up-table:
    - f<sub>EC</sub> = LuT [ CPI ]
- Thermal Controller
  - Core Temperature Sensors
  - Matrix Multiplication & Look-up-table:
    - f<sub>TC</sub> = LuT [ M\*[T<sub>CORE</sub>, T<sub>NEIGHBOURS</sub>] ]



#### DIORUM DIORUM

#### **Model Learning Scalability**





#### Outline

- Introduction
- Energy Controller
- Thermal Controller architecture
- Learning (self-calibration)
- Scalability
- Simulation Infrastructure
- Results
- Conclusion



# **Simulation Strategy**

#### Trace driven Simulator [1]:

- Not suitable for full system simulation (How to simulate O.S.?)
- looses information on cross-dependencies
  - $\rightarrow$  resulting in degraded simulation accuracy
- Close loop simulator:
- Cycle accurate simulators [2] :
  - High modeling accuracy
  - support well-established power and temperature co-simulation based on analytical models and system micro-architectural knowledge
  - Low simulation speed
  - Not suitable for full-system simulation
- Functional and instruction set simulators:
  - allow full system simulation
  - less internal precision
  - less detailed data  $\rightarrow$  no micro-architectural model
  - introduces the challenge of having accurate power and temperature physical models



[1] P Chaparro et al. Understanding the thermal implications of multi-core architectures. 2007 [2] Benini L. et al. MPARM: Exploring the multi-processor SoC design space with SystemC 2005



#### **Virtual Platform**



#### Simics by Virtutech:

- full system functional simulator
- models the entire system: peripherals, BIOS, network interfaces, cores, memories
- allows booting full OS, such as Linux SMP
- supports different target CPU (arm, sparc, x86)
- x86 model:
  - in-order
  - all instruction are retired in 1 cycle
  - does not account for memory latency

#### [1] Martin Milo M. K. et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset 2005

#### Memory timing model

- RUBY GEMS (University of Wisconsin)[1]
  - Public cycle-accurate memory timing model
  - Different target memory architectures
  - fully integrated with Virtutech Simics
  - written in C++
  - we use it as skeleton to apply our addons (as C++ object)



#### Performance koobte(19 VIFS) unterdule:

- Nereated to Speice range post of the posticy ange at run-time
- WREBUICE work of the support it would be a support it
  - edpress to ato has a internapipking a video difference of puer notivities:
- · We add the evul Db/ FiSomostulections upport it clock cycles and stall cycles expired,
  - ensumed the started as d. DRAM to have a constant clock frequency
  - L1 latency scale with Simics processor clock frequency





### **Virtual Platform**

#### Power model module:

- At run-time estimate the power consumption of the target architecture
- Core model  $P_T = [P_D(f, CPI) + P_S(T, VDD)] * (1 idleness) + idleness * (P_{IDLE})$
- P<sub>D</sub> experimentally calibrated analytical power model
- Cache and memory power access cost estimated with CACTI [1]



[1] Thoziyoor Shyamkumar et al. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. 2008



#### **Power Model**



# Modeling Real Platform – Power



 $P_D = k_A \cdot V_{DD}^2 \cdot f_{CK} + k_B + (k_C + k_D \cdot f_{CK}) \cdot CPI^{k_E}$ 

• We relate the static power with the operating point by using an analytical model



### **Virtual Platform**

#### Temperature model module:

- we integrate our virtual platform with a thermal simulator [1]
- Input: power dissipated by the main functional units composing the target platform
- Output: Provides the temperature distribution along the simulated multicore die area as output



[1] Paci G. et al. Exploring "temperature-aware" design in low-power MPSoCs



### **Thermal Model**



# Modeling Real Platform– Thermal

- Thermal Model Calibration :
  - Derived from Intel® Core™ 2 Duo layout
  - · We calibrate the model parameter to simulate real HW transient
  - High accuracy (error < 1%) and same transient behavior







## Virtual Platform Performance

- Target:
  - 4 core Pentium® 4
  - 2GB RAM
  - 32 KB private L1 cache
  - 4 MB shared L2 cache
  - Linux OS

- Host: ۲
  - Intel® Core<sup>™</sup> 2 Duo
  - 2.4 Ghz
  - 2GB RAM



ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA



# Mathworks Matlab/Simulink

- Numerical computing environment developed to design, implement and test numerical algorithms
- Mathworks Simulink for simulation of dynamic systems: simplifies and speedups the development cycle of control systems
- Can be called as a computational engine by writing C and Fortran programs that use Mathworks Matlab's engine library
- Controller design two steps:
  - developing the control algorithm that optimizes the system performance
  - implementing it in the system

We allow a Mathworks Matlab/Simulink description of the controller to directly drive at run-time the performance knobs of the emulated system



## **Virtual Platform**

Mathworks Matlab interface:

- New module named Controller in RUBY
- Initialization: starts the Mathworks Matlab engine concurrent process,
- Every N cycle wake-up:

CONTROL-STRATEGIES DEVELOPMENT CYCLE

- 1. Controller design in Mathworks Matlab/Simulink framework
  - system represented by a simplified model
  - obtained by physical considerations and identification techniques
- 2. Set of simulation tests and design adjustments done in Simulink
- Tuned controller evaluation with an accurate model of the plant done in the virtual platform
- T T T T  $T, Tmax, P^*$   $T, Tmax, P^*$
- 4. Performance analysis, by simulating the overall system

Virtutech Simics



#### Outline

- Introduction
- Energy Controller
- Thermal Controller architecture
- Learning (self-calibration)
- Scalability
- Simulation Infrastructure
- Results
- Conclusion



#### Results





- Now working on the embedded implementation
  - Server multicore platform and Intel ® SCC
- Explore thermal aware scheduler solution
  - co-operate with presented solution
- Develop distributed+multi-scale solution for data-centers

# Thermal-aware task scheduling



ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA





ALMA MATER STUDIORUM ~ UNIVERSITÀ DI BOLOGNA