BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160906Z
LOCATION:C145
DTSTART;TZID=America/Chicago:20181114T150000
DTEND;TZID=America/Chicago:20181114T170000
UID:submissions.supercomputing.org_SC18_sess468_spost111@linklings.com
SUMMARY:Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precis
 ion
DESCRIPTION:ACM Student Research Competition, Poster\nStudent Program, Tec
 h Program Reg Pass, ACM Student Research Competition\n\nAccelerating 2D FF
 T: Exploit GPU Tensor Cores through Mixed-Precision\n\nCheng, Sorna\n\nThe
  two-dimensional Fourier Transform is a widely-used computational kernel i
 n many HPC applications. The popular NVIDIA cuFFT library provides a simpl
 e interface to compute 2D FFT on GPUs, but it's yet to utilize the recent 
 hardware advancement in half-precision floating-point arithmetic. In this 
 poster, we propose a mixed-precision method to accelerate 2D FFT by exploi
 ting the FP16 matrix-multiply-and-accumulate units on the newest GPU archi
 tecture, known as tensor cores. We achieve a balance between speed and acc
 uracy by dynamically splitting the single-precision input data into two ha
 lf-precision operands and performing FFT separately. We present a CUDA-bas
 ed implementation that achieves 3-digit more accuracy than half-precision 
 cuFFT. We also demonstrate the stability and scalability of our approach a
 nd conclude that it attains high accuracy with tolerable splitting overhea
 d.
URL:https://sc18.supercomputing.org/presentation/?id=spost111&sess=sess468
END:VEVENT
END:VCALENDAR

