Paper Notes: CPU/GPU Task Scheduling with SVM
Last updated on July 27, 2023 pm
Paper Notes: CPU/GPU Task Scheduling with SVM
Info
Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms
Authors: Yuan Wen, Zheng Wang, Michael F.P. O’Boyle
Published in: HiPC 2014 Dec.
DOI: 10.1109/HiPC.2014.7116910
Keywords: GPU, OpenCL, task scheduling, machine learning
Background
Problem
To find the best map from OpenCL application to hardware resource, CPU or GPU.
The requirements are:
- Enlarge the system throughput (STP)
- Reduce the average normalized turn-around time (ANTT)
STP: the number of tasks completed per unit time.
ANTT: the average time between the submission of a task and the completion of it.
Alternative
Other scheduling policies:
- All on CPU
- All on GPU
- First come first serve (FCFS): baseline
- Input size guided: GPU gets the task with larger input size
- Work item guided: GPU gets the task with larger number of work items
Key Idea
Predict the speedup of a task. Speedup is defined as the ratio of the execution time of a task on CPU to that on GPU.
Organize all OpenCL tasks into a queue based on predicted speedup and runtime data transfer size. High speedup side tasks are scheduled to GPU, while low speedup side tasks are scheduled to CPU.
When CPU or GPU is idle, pop the task from the queue’s two side accordingly, until the end of the queue.
Implementation
The predictor takes in:
- Static code features
- Runtime features
Output a speedup category of the task. Tow categories are used: high speedup side and low speedup side.
Use Support Vector Machines classifier and train / evaluate with leave-one-out cross-validation.
The larger the square is, the more important the feature is.
Experiment
The experiment took place on real machine.
35 OpenCL benchmarks are used.
Limitation
- To compare with the best STP queue (comes from exhaustive searching), there is still 50% gap. As speedup is only one proxy of execution time. (Impossible to find the best ANTT queue)
- prediction accuracy is not high enough. On STP, SVM model is 87% accurate, decision tree model is 72%.
- coarse-grained classification. Only two categories are used. Enlarge the number of categories may improve the performance but not always.
Y. Wen, Z. Wang and M. F.P. O’Boyle, “Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms,” 2014 21st International Conference on High Performance Computing (HiPC), Goa, India, 2014, pp. 1-10, doi: 10.1109/HiPC.2014.7116910.