EDAC (Error Detection and Correction)

CentOS: sudo yum install edac-utils

Ubuntu: sudo apt-get edac utils

Does not work on Ubuntu Liveboot environment (i.e. USB)

var

edac-util

edac-util

-r report

-v verbose

-r,simple

-r,ue - Uncorrected Error

-r,ce - Corrected Error


dmidecode -t memory


Memory Controller (mc) Model, the memory controller's model abstracted in EDAC. Each 'mc' device controls a set of DIMM memory modules. These modules are laid out in a Chip-Select Row (csrowX) and Channel table (chX).

There can be multiple csrows and multiple channels. Memory controllers allow for several csrows, with 8 csrows being a typical value.

Channel, each channel represents a DIMM module. Dual channels allows for 128 bit data transfers to the CPU from memory. Some system supports more channels.

Csrow, Chip-Select Row, shows how memory module assembled, single or dual rank or more, the actual number of csrows depends on the electrical "loading" of a given motherboard, memory controller and DIMM characteristics.

For single rank DIMM module, a pair of DIMMs merge into one csrow, typically, you will see only csrow0, while csrow1 will be empty.


[39436.383929] mce: [Hardware Error]: Machine check events logged
[39436.384016] EDAC skx MC3: HANDLING MCE MEMORY ERROR
[39436.384023] EDAC skx MC3: CPU 10: Machine Check Event: 0 Bank 18: 8c000040000800c2
[39436.384032] EDAC skx MC3: TSC 0 
[39436.384035] EDAC skx MC3: ADDR 1378fe5380 
[39436.384038] EDAC skx MC3: MISC 900200020000086 
[39436.384042] EDAC skx MC3: PROCESSOR 0:50654 TIME 1531834011 SOCKET 1 APIC 20
[39436.384057] EDAC MC3: 1 CE memory scrubbing error on unknown memory (channel:2 slot:0 page:0x1378fe5 offset:0x380 grain:32 syndrome:0x0 - err_code:0008:00c2 socket:1 imc:1 rank:0 bg:2 ba:2 row:9431 col:238)
[105153.410099] mce: [Hardware Error]: Machine check events logged
[105153.410172] EDAC skx MC3: HANDLING MCE MEMORY ERROR
[105153.410178] EDAC skx MC3: CPU 10: Machine Check Event: 0 Bank 18: 8c000040000800c2
[105153.410183] EDAC skx MC3: TSC 0 
[105153.410186] EDAC skx MC3: ADDR 1378fe5380 
[105153.410189] EDAC skx MC3: MISC 900200020000086 
[105153.410193] EDAC skx MC3: PROCESSOR 0:50654 TIME 1531899883 SOCKET 1 APIC 20
[105153.410208] EDAC MC3: 1 CE memory scrubbing error on unknown memory (channel:2 slot:0 page:0x1378fe5 offset:0x380 grain:32 syndrome:0x0 - err_code:0008:00c2 socket:1 imc:1 rank:0 bg:2 ba:2 row:9431 col:238)

Weird example of standard -v versus simple

root:~$ edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc1: 0 Uncorrected Errors with no DIMM info
mc1: 0 Corrected Errors with no DIMM info
mc2: 0 Uncorrected Errors with no DIMM info
mc2: 0 Corrected Errors with no DIMM info
mc3: 0 Uncorrected Errors with no DIMM info
mc3: 0 Corrected Errors with no DIMM info
edac-util: No errors to report.

root:~$ edac-util --report=simple
mc0: Correctable errors: 0
mc0: Uncorrectable errors: 0
mc1: Correctable errors: 0
mc1: Uncorrectable errors: 0
mc2: Correctable errors: 0
mc2: Uncorrectable errors: 0
mc3: Correctable errors: 1778948
mc3: Uncorrectable errors: 0
Total CE: 1778948
Total UE: 0

If you want to check them yourself, the location of the edac files can be found at the following path:

/sys/devices/system/edac/