Obet: On-the-fly byte-level error tracking for correcting and detecting faults in unreliable dram systems

Duy Thanh Nguyen, Nhut Minh Ho, Weng Fai Wong, Ik Joon Chang

Research output: Contribution to journalArticlepeer-review

Abstract

With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000∼7000 times relative to conventional operation.

Original languageEnglish
Article number8271
JournalSensors
Volume21
Issue number24
DOIs
Publication statusPublished - 1 Dec 2021

Keywords

  • Availability
  • DDR5
  • DRAM chips
  • Debugging
  • Error correction codes
  • Failure analysis
  • Fault diagnosis
  • Memory architecture
  • Memory management
  • On-die ECC
  • Semiconductor device reliability
  • Semiconductor device testing

Fingerprint

Dive into the research topics of 'Obet: On-the-fly byte-level error tracking for correcting and detecting faults in unreliable dram systems'. Together they form a unique fingerprint.

Cite this