BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:C146
DTSTART;TZID=America/Chicago:20181114T133000
DTEND;TZID=America/Chicago:20181114T140000
UID:submissions.supercomputing.org_SC18_sess216_pap392@linklings.com
SUMMARY:Lessons Learned from Memory Errors Observed Over the Lifetime of C
 ielo
DESCRIPTION:Paper\nPerformance, Resiliency, Tools, Tech Program Reg Pass\n
 \nLessons Learned from Memory Errors Observed Over the Lifetime of Cielo\n
 \nLevy, Ferreira, DeBardeleben, Siddiqua, Sridharan...\n\nMaintaining the 
 performance of high-performance computing (HPC) applications as failures i
 ncrease is a major challenge for next-generation extreme-scale systems. Re
 cent research demonstrates that hardware failures are expected to become m
 ore common due to increased component counts, reduced device-feature sizes
 , and tightly-constrained power budgets. Few existing studies, however, ha
 ve examined failures in the context of the entire lifetime of a single pla
 tform. In this paper, we analyze failure data collected over the entire li
 fetime of Cielo, a leadership-class HPC system. Our analysis reveals sever
 al key findings, including: (i) Cielo’s memory (DRAM and SRAM) exhibited n
 o discernible aging effects; (ii) correctable memory faults are not predic
 tive of future uncorrectable memory faults; (iii) developing more comprehe
 nsive logging facilities will improve failure analysis on future machines;
  (iv) continued advances will be required to ensure current failure mitiga
 tion techniques remain a viable option for future platforms.
URL:https://sc18.supercomputing.org/presentation/?id=pap392&sess=sess216
END:VEVENT
END:VCALENDAR

