Paper Notes: CPU/GPU Task Scheduling with SVM

Last updated on July 27, 2023 pm

Paper Notes: CPU/GPU Task Scheduling with SVM

Info

Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms

Authors: Yuan Wen, Zheng Wang, Michael F.P. O’Boyle

Published in: HiPC 2014 Dec.

DOI: 10.1109/HiPC.2014.7116910

Keywords: GPU, OpenCL, task scheduling, machine learning

Background

Problem

To find the best map from OpenCL application to hardware resource, CPU or GPU.

The requirements are:

  • Enlarge the system throughput (STP)
  • Reduce the average normalized turn-around time (ANTT)

STP: the number of tasks completed per unit time.

ANTT: the average time between the submission of a task and the completion of it.

Alternative

Other scheduling policies:

  • All on CPU
  • All on GPU
  • First come first serve (FCFS): baseline
  • Input size guided: GPU gets the task with larger input size
  • Work item guided: GPU gets the task with larger number of work items

Key Idea

Predict the speedup of a task. Speedup is defined as the ratio of the execution time of a task on CPU to that on GPU.

Organize all OpenCL tasks into a queue based on predicted speedup and runtime data transfer size. High speedup side tasks are scheduled to GPU, while low speedup side tasks are scheduled to CPU.

When CPU or GPU is idle, pop the task from the queue’s two side accordingly, until the end of the queue.

Implementation

The predictor takes in:

  1. Static code features
  2. Runtime features

Output a speedup category of the task. Tow categories are used: high speedup side and low speedup side.

Use Support Vector Machines classifier and train / evaluate with leave-one-out cross-validation.

Used features

Importance of the features

The larger the square is, the more important the feature is.

Experiment

The experiment took place on real machine.

35 OpenCL benchmarks are used.

Limitation

  1. To compare with the best STP queue (comes from exhaustive searching), there is still 50% gap. As speedup is only one proxy of execution time. (Impossible to find the best ANTT queue)
  2. prediction accuracy is not high enough. On STP, SVM model is 87% accurate, decision tree model is 72%.
  3. coarse-grained classification. Only two categories are used. Enlarge the number of categories may improve the performance but not always.

Y. Wen, Z. Wang and M. F.P. O’Boyle, “Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms,” 2014 21st International Conference on High Performance Computing (HiPC), Goa, India, 2014, pp. 1-10, doi: 10.1109/HiPC.2014.7116910.


Paper Notes: CPU/GPU Task Scheduling with SVM
https://lingkang.dev/2023/07/28/read-paper-0-wen/
Author
Lingkang
Posted on
July 27, 2023
Licensed under