BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160904Z
LOCATION:C2/3/4 Ballroom
DTSTART;TZID=America/Chicago:20181113T083000
DTEND;TZID=America/Chicago:20181113T170000
UID:submissions.supercomputing.org_SC18_sess325_spost111@linklings.com
SUMMARY:Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precis
 ion
DESCRIPTION:ACM Student Research Competition, Poster\nTech Program Reg Pas
 s, Exhibits Reg Pass\n\nAccelerating 2D FFT: Exploit GPU Tensor Cores thro
 ugh Mixed-Precision\n\nCheng, Sorna\n\nThe two-dimensional Fourier Transfo
 rm is a widely-used computational kernel in many HPC applications. The pop
 ular NVIDIA cuFFT library provides a simple interface to compute 2D FFT on
  GPUs, but it's yet to utilize the recent hardware advancement in half-pre
 cision floating-point arithmetic. In this poster, we propose a mixed-preci
 sion method to accelerate 2D FFT by exploiting the FP16 matrix-multiply-an
 d-accumulate units on the newest GPU architecture, known as tensor cores. 
 We achieve a balance between speed and accuracy by dynamically splitting t
 he single-precision input data into two half-precision operands and perfor
 ming FFT separately. We present a CUDA-based implementation that achieves 
 3-digit more accuracy than half-precision cuFFT. We also demonstrate the s
 tability and scalability of our approach and conclude that it attains high
  accuracy with tolerable splitting overhead.
URL:https://sc18.supercomputing.org/presentation/?id=spost111&sess=sess325
END:VEVENT
END:VCALENDAR

