EDAC (Error Detection and Correction)
CentOS: sudo yum install edac-utils
Ubuntu: sudo apt-get edac utils
Does not work on Ubuntu Liveboot environment (i.e. USB)
var
edac-util
edac-util
-r report
-v verbose
-r,simple
-r,ue - Uncorrected Error
-r,ce - Corrected Error
dmidecode -t memory
Memory Controller (mc) Model, the memory controller's model abstracted in EDAC. Each 'mc' device controls a set of DIMM memory modules. These modules are laid out in a Chip-Select Row (csrowX) and Channel table (chX).
There can be multiple csrows and multiple channels. Memory controllers allow for several csrows, with 8 csrows being a typical value.
Channel, each channel represents a DIMM module. Dual channels allows for 128 bit data transfers to the CPU from memory. Some system supports more channels.
Csrow, Chip-Select Row, shows how memory module assembled, single or dual rank or more, the actual number of csrows depends on the electrical "loading" of a given motherboard, memory controller and DIMM characteristics.
For single rank DIMM module, a pair of DIMMs merge into one csrow, typically, you will see only csrow0, while csrow1 will be empty.
[39436.383929] mce: [Hardware Error]: Machine check events logged [39436.384016] EDAC skx MC3: HANDLING MCE MEMORY ERROR [39436.384023] EDAC skx MC3: CPU 10: Machine Check Event: 0 Bank 18: 8c000040000800c2 [39436.384032] EDAC skx MC3: TSC 0 [39436.384035] EDAC skx MC3: ADDR 1378fe5380 [39436.384038] EDAC skx MC3: MISC 900200020000086 [39436.384042] EDAC skx MC3: PROCESSOR 0:50654 TIME 1531834011 SOCKET 1 APIC 20 [39436.384057] EDAC MC3: 1 CE memory scrubbing error on unknown memory (channel:2 slot:0 page:0x1378fe5 offset:0x380 grain:32 syndrome:0x0 - err_code:0008:00c2 socket:1 imc:1 rank:0 bg:2 ba:2 row:9431 col:238) [105153.410099] mce: [Hardware Error]: Machine check events logged [105153.410172] EDAC skx MC3: HANDLING MCE MEMORY ERROR [105153.410178] EDAC skx MC3: CPU 10: Machine Check Event: 0 Bank 18: 8c000040000800c2 [105153.410183] EDAC skx MC3: TSC 0 [105153.410186] EDAC skx MC3: ADDR 1378fe5380 [105153.410189] EDAC skx MC3: MISC 900200020000086 [105153.410193] EDAC skx MC3: PROCESSOR 0:50654 TIME 1531899883 SOCKET 1 APIC 20 [105153.410208] EDAC MC3: 1 CE memory scrubbing error on unknown memory (channel:2 slot:0 page:0x1378fe5 offset:0x380 grain:32 syndrome:0x0 - err_code:0008:00c2 socket:1 imc:1 rank:0 bg:2 ba:2 row:9431 col:238)
Weird example of standard -v versus simple
root:~$ edac-util -v mc0: 0 Uncorrected Errors with no DIMM info mc0: 0 Corrected Errors with no DIMM info mc1: 0 Uncorrected Errors with no DIMM info mc1: 0 Corrected Errors with no DIMM info mc2: 0 Uncorrected Errors with no DIMM info mc2: 0 Corrected Errors with no DIMM info mc3: 0 Uncorrected Errors with no DIMM info mc3: 0 Corrected Errors with no DIMM info edac-util: No errors to report. root:~$ edac-util --report=simple mc0: Correctable errors: 0 mc0: Uncorrectable errors: 0 mc1: Correctable errors: 0 mc1: Uncorrectable errors: 0 mc2: Correctable errors: 0 mc2: Uncorrectable errors: 0 mc3: Correctable errors: 1778948 mc3: Uncorrectable errors: 0 Total CE: 1778948 Total UE: 0
If you want to check them yourself, the location of the edac files can be found at the following path:
/sys/devices/system/edac/