Standard

The Pipeline Performance Model: A Generic Executable Performance Model for GPUs. / Cornelis, Jan G.; Lemeire, Jan.

Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019. IEEE, 2019. p. 260-265 (Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019).

Research output: Chapter in Book/Report/Conference proceedingConference paper

Harvard

Cornelis, JG & Lemeire, J 2019, The Pipeline Performance Model: A Generic Executable Performance Model for GPUs. in Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019. Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019, IEEE, pp. 260-265, PDP 2019, Pavia, Italy, 13/02/19. https://doi.org/10.1109/EMPDP.2019.8671606

APA

Cornelis, J. G., & Lemeire, J. (2019). The Pipeline Performance Model: A Generic Executable Performance Model for GPUs. In Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019 (pp. 260-265). (Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019). IEEE. https://doi.org/10.1109/EMPDP.2019.8671606

Vancouver

Cornelis JG, Lemeire J. The Pipeline Performance Model: A Generic Executable Performance Model for GPUs. In Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019. IEEE. 2019. p. 260-265. (Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019). https://doi.org/10.1109/EMPDP.2019.8671606

Author

Cornelis, Jan G. ; Lemeire, Jan. / The Pipeline Performance Model: A Generic Executable Performance Model for GPUs. Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019. IEEE, 2019. pp. 260-265 (Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019).

BibTeX

@inproceedings{d7be13dbef784ed58699428a588b8867,
title = "The Pipeline Performance Model: A Generic Executable Performance Model for GPUs",
abstract = "This paper presents the pipeline performance model, a generic GPU performance model, which helps understand the performance of GPU code by using a code representation that is very close to the source code. The code is represented by a graph in which the nodes correspond to the source code instructions and the edges to data dependences between them. Furthermore, each node is enhanced with two latencies that characterize the instruction's time behavior on the GPU. This graph, together with a simple characterization of the GPU and the execution configuration, is used by a simulator to mimic the execution of the code. We validate the model on the micro-benchmarks used to determine the latencies and on a matrix multiplication kernel, both on an NVIDIA Fermi and an NVIDIA Pascal GPU. Initial results show that the simulated times follow the measured times, with acceptable errors, for a wide occupancy range. We argue that to achieve better accuracies it is necessary to further refine the model to take into account the complexity of memory access and warp scheduling, especially for more recent GPUs.",
keywords = "GPU, latencies, model, performance, pipeline",
author = "Cornelis, {Jan G.} and Jan Lemeire",
year = "2019",
month = "3",
day = "19",
doi = "10.1109/EMPDP.2019.8671606",
language = "English",
isbn = "9781728116440",
series = "Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019",
publisher = "IEEE",
pages = "260--265",
booktitle = "Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019",

}

RIS

TY - GEN

T1 - The Pipeline Performance Model: A Generic Executable Performance Model for GPUs

AU - Cornelis, Jan G.

AU - Lemeire, Jan

PY - 2019/3/19

Y1 - 2019/3/19

N2 - This paper presents the pipeline performance model, a generic GPU performance model, which helps understand the performance of GPU code by using a code representation that is very close to the source code. The code is represented by a graph in which the nodes correspond to the source code instructions and the edges to data dependences between them. Furthermore, each node is enhanced with two latencies that characterize the instruction's time behavior on the GPU. This graph, together with a simple characterization of the GPU and the execution configuration, is used by a simulator to mimic the execution of the code. We validate the model on the micro-benchmarks used to determine the latencies and on a matrix multiplication kernel, both on an NVIDIA Fermi and an NVIDIA Pascal GPU. Initial results show that the simulated times follow the measured times, with acceptable errors, for a wide occupancy range. We argue that to achieve better accuracies it is necessary to further refine the model to take into account the complexity of memory access and warp scheduling, especially for more recent GPUs.

AB - This paper presents the pipeline performance model, a generic GPU performance model, which helps understand the performance of GPU code by using a code representation that is very close to the source code. The code is represented by a graph in which the nodes correspond to the source code instructions and the edges to data dependences between them. Furthermore, each node is enhanced with two latencies that characterize the instruction's time behavior on the GPU. This graph, together with a simple characterization of the GPU and the execution configuration, is used by a simulator to mimic the execution of the code. We validate the model on the micro-benchmarks used to determine the latencies and on a matrix multiplication kernel, both on an NVIDIA Fermi and an NVIDIA Pascal GPU. Initial results show that the simulated times follow the measured times, with acceptable errors, for a wide occupancy range. We argue that to achieve better accuracies it is necessary to further refine the model to take into account the complexity of memory access and warp scheduling, especially for more recent GPUs.

KW - GPU

KW - latencies

KW - model

KW - performance

KW - pipeline

UR - http://www.scopus.com/inward/record.url?scp=85063869749&partnerID=8YFLogxK

U2 - 10.1109/EMPDP.2019.8671606

DO - 10.1109/EMPDP.2019.8671606

M3 - Conference paper

SN - 9781728116440

T3 - Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019

SP - 260

EP - 265

BT - Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019

PB - IEEE

ER -

ID: 44278988