Table of Contents
Hardware failure details decision tree
I understand this is not formatted as a typical decision tree, but I was using basic macros (expand) in pre-2020 Confluence editor.
Expand |
---|
title | System does not power on when pressing the power button |
---|
|
System does not power on when pressing the power button Expand |
---|
title | POWER SUPPLY - Is there LED lights displaying on the Power Supplies while the system while it is powered off? |
---|
| POWER SUPPLY - Is there LED lights displaying on the Power Supplies while the system while it is powered off? Expand |
---|
| Yes Expand |
---|
title | What color are they? |
---|
| What color are they? Expand |
---|
| Green Expand |
---|
title | Is it blinking or solid? |
---|
| Is it blinking or solid? Expand |
---|
title | Blinking - (system powered off) |
---|
| Blinking - (system powered off) - Power supplies is working and is on stand by
Blinking - (system powered on; TYAN systems) - PSU supplies is working and it is on standby for redundancy
Expand |
---|
title | Does system power on yet after checking the power supplies? |
---|
| Does system power on yet after checking the power supplies? Expand |
---|
| (END) Yes - Move on to next troubleshooting tree if necessary
|
Expand |
---|
| No Expand |
---|
title | Does system power on after re-seating all Memory DIMM's? |
---|
| Does system power on after re-seating all Memory DIMM's? Expand |
---|
| (END) Yes - check topology in BIOS to make sure all installed memory are identified
|
Expand |
---|
| No Expand |
---|
title | CPU/Memory/Motherboard - Does it power on when system is brought down to 1x CPU and 1x Memory DIMM (on primary/first CPU/memory slot)? |
---|
| CPU/Memory/Motherboard - Does it power on when system is brought down to 1x CPU and 1x Memory DIMM (on primary/first CPU/memory slot)? - If they cannot perform this troubleshooting, they will need to ship this system back to Exxact for further troubleshooting; issue System RMA
- In red, because this is a line whether the hardware diagnostics is more involved/invasive and customers may damage internal components if not handled properly
- See other option to see if they can quickly check if the power button/ribbon cable is the root cause
Expand |
---|
title | Yes - swap CPU to see if issue persists; does system power on after swapping CPU? |
---|
| Yes - swap CPU to see if issue persists; does system power on after swapping CPU? Expand |
---|
title | (END) - Yes - Defective motherboard/slot; re-install memory and check topology in BIOS for CPU1 to make sure all installed memory are identified |
---|
| (END) - Yes - Defective motherboard/slot; re-install memory and check topology in BIOS for CPU1 to make sure all installed memory are identified - Could be bad CPU pin/slot on the motherboard on secondary CPU slot
- Ask if they are okay with performing RMA on the chassis+motherboard (honestly if they got this far, I'm sure they can swap the barebone)
|
Expand |
---|
title | No - swap memory DIMM's; does system power on after swapping through all memory DIMM's that were uninstalled when CPU2 was removed?? |
---|
| No - swap memory DIMM's; does system power on after swapping through all memory DIMM's that were uninstalled when CPU2 was removed?? - Still could be bad memory, try another memory DIMM to see if issue persists
Expand |
---|
title | (END) Yes - Defective memory DIMM; issue component RMA for Memory |
---|
| (END) Yes - Defective memory DIMM; issue component RMA for Memory - try to have them re-create issue by re-installing suspected DIMM to see if system fails to power on/POST
- repopulate CPU2 and memory in pairs to ensure the rest of the memory DIMM's are allowing system to POST
- check topology in BIOS to make sure all installed memory are identified
|
Expand |
---|
title | (END) No - Defective CPU; issue component RMA for CPU |
---|
| (END) No - Defective CPU; issue component RMA for CPU - Most likely confirmed to be bad CPU since:
- CPU1 slot works
- Installing either of the CPU's into secondary CPU slot does not allow system to power on
- Have them swap the memory DIMM's that were previously installed for CPU2's row into CPU1's to see if all memory is working properly
- System should still be able to power on with 1x CPU and DIMM's but they may lose half the PCI-e slots on certain systems (typically older ones using 2011-v3/v4 CPU's)
|
|
|
Expand |
---|
| (END) No - Defective motherboard; issue System RMA for confirmation of issue and repairs - Could be bad primary CPU1 slot or bad motherboard entirely; issue System RMA for confirmation of issue and repairs
|
Expand |
---|
title | POWER BUTTON - Does pushing power button not power on the system? |
---|
| POWER BUTTON - Does pushing power button not power on the system when all of the board and PSU LED lights are on? - In red, because this is a line whether the hardware diagnostics is more involved/invasive and customers may damage internal components if not handled properly
Expand |
---|
title | Have you tried removing the ribbon cable to manually jump the power pins? |
---|
| Does system power on after removing the ribbon cable to manually jump the power pins? Expand |
---|
title | (END) Yes - Defective Power Button |
---|
| (END) Yes - Defective Power Button - Have them re-seat the ribbon cable and try again; we can try to RMA the power button assembly if:
- They agree to perform the labor
- The barebone makes it easily accessible that we can provide a short guide (usually we don't replace this, we would just have MFR send us the assembly)
- Suggest if they are okay in swapping barebone, or make a judgement call whether we should issue System RMA (do you trust them to perform the labor?)
- Make sure CPU/Memory all identified properly in BIOS
|
|
|
|
|
|
|
|
|
Expand |
---|
| (END) - Solid - it shouldn't be green and solid while system is powered off; power drain the system and see if issue persist
|
|
|
Expand |
---|
title | Amber - (system powered off, and all PSU's are amber) |
---|
| (END) - Amber - (system powered off, and all PSU's are amber) - older systems use Amber for standby while system is powered off; try power button and see if they turn to solid green LED
|
Expand |
---|
title | One is Green, the other(s) is off / Amber / yellow / different |
---|
| (END) - One is Green, the other(s) is off / Amber / yellow / different (Defective PSU) - most likely one of the PSU's is bad; try re-seating it and swapping locations with another PSU module. If it follows PSU, then PSU needs Component RMA. If it follows the slot/insert, then barebone needs Component RMA (or System RMA if we need to swap components for customer and re-validate hardware)
|
|
|
Expand |
---|
title | No - (all PSU's off/no lights) |
---|
| (END) - No - (all PSU's off/no lights) - try different power cables, check outlets, re-seat the PSU module; if no lights/activity, PDB (or barebone) needs Component RMA (or System RMA if we need to swap components for customer and re-validate hardware)
|
|
|
...
Expand |
---|
title | System powers on when pressing the power button, and displays, but does not boot to OS |
---|
|
System powers on when pressing the power button, and displays, but does not boot to OS
Expand |
---|
title | Is system display stuck st splash screen? |
---|
| Is system display stuck st splash screen? Expand |
---|
title | Yes - does the system have any codes unrelated to display being pushed to offboard channel? (see notes below) |
---|
| Yes - does the system have any codes unrelated to display being pushed to offboard channel? (see notes below) - Check for any POST code displayed in addition to the board manufacturer logo
- Commonly observed codes
- Supermicro - 91 - display pushed to offboard channel
- Tyan - E3 - display pushed to offboard channel
- ASUS (on workstation motherboards or back of 2U chassis) - OS loaded properly
- B7/B9 (or B codes) is typically a motherboard component (mostly memory) causing system not to complete POST
- Refer to MFR manual for other uncommon codes
- Most important POST code is where the system gets stuck at
Expand |
---|
title | Yes - please continue to see if system fails to POST due to CPU/Memory/Motherboard |
---|
| Yes - please continue to see if system fails to POST due to CPU/Memory/Motherboard Expand |
---|
title | Does system power on after re-seating all Memory DIMM's? |
---|
| Does system POST after re-seating all Memory DIMM's? Expand |
---|
| (END) Yes - check topology in BIOS to make sure all installed memory are identified
|
Expand |
---|
| No Expand |
---|
title | CPU/Memory/Motherboard - Does it POST when system is brought down to 1x CPU and 1x Memory DIMM (on primary/first CPU/memory slot)? |
---|
| CPU/Memory/Motherboard - Does it POST when system is brought down to 1x CPU and 1x Memory DIMM (on primary/first CPU/memory slot)? - If they cannot perform this troubleshooting, they will need to ship this system back to Exxact for further troubleshooting; issue System RMA
- In red, because this is a line whether the hardware diagnostics is more involved/invasive and customers may damage internal components if not handled properly
- See other option to see if they can quickly check if the power button/ribbon cable is the root cause
Expand |
---|
title | Yes - swap CPU to see if issue persists; does system power on after swapping CPU? |
---|
| Yes - swap CPU to see if issue persists; does system power on after swapping CPU? Expand |
---|
title | (END) - Yes - Defective motherboard/slot; re-install memory and check topology in BIOS for CPU1 to make sure all installed memory are identified |
---|
| (END) - Yes - Defective motherboard/slot; re-install memory and check topology in BIOS for CPU1 to make sure all installed memory are identified - Could be bad CPU pin/slot on the motherboard on secondary CPU slot
- Ask if they are okay with performing RMA on the chassis+motherboard (honestly if they got this far, I'm sure they can swap the barebone)
|
Expand |
---|
title | No - swap memory DIMM's; does system power on after swapping through all memory DIMM's that were uninstalled when CPU2 was removed?? |
---|
| No - swap memory DIMM's; does system power on after swapping through all memory DIMM's that were uninstalled when CPU2 was removed?? - Still could be bad memory, try another memory DIMM to see if issue persists
Expand |
---|
title | (END) Yes - Defective memory DIMM; issue component RMA for Memory |
---|
| (END) Yes - Defective memory DIMM; issue component RMA for Memory - try to have them re-create issue by re-installing suspected DIMM to see if system fails to power on/POST
- repopulate CPU2 and memory in pairs to ensure the rest of the memory DIMM's are allowing system to POST
- check topology in BIOS to make sure all installed memory are identified
|
Expand |
---|
title | (END) No - Defective CPU; issue component RMA for CPU |
---|
| (END) No - Defective CPU; issue component RMA for CPU - Most likely confirmed to be bad CPU since:
- CPU1 slot works
- Installing either of the CPU's into secondary CPU slot does not allow system to power on
- Have them swap the memory DIMM's that were previously installed for CPU2's row into CPU1's to see if all memory is working properly
- System should still be able to power on with 1x CPU and DIMM's but they may lose half the PCI-e slots on certain systems (typically older ones using 2011-v3/v4 CPU's)
|
|
|
Expand |
---|
title | (END) No - Defective motherboard; issue System RMA for confirmation of issue and repairs |
---|
| (END) No - Defective motherboard; issue System RMA for confirmation of issue and repairs - Could be bad primary CPU1 slot or bad motherboard entirely; issue System RMA for confirmation of issue and repairs
|
|
|
|
|
Expand |
---|
title | No(END) No - they are using ports meant for unsafe-onboard display; make sure any display cables installed to the motherboard (onboard channel) are unplugged and reboot system |
---|
| (END) No - they are using ports meant for onboard display; make sure any display cables installed to the motherboard (onboard channel) are unplugged and reboot system |
|
Expand |
---|
title | No - can BIOS be accessed using the 'del' key while system powers on? |
---|
| No - can BIOS be accessed using the 'del' key while system powers on? Expand |
---|
title | Yes - have you checked boot priority to ensure all installed drives are identified, and that the boot priority is set correctly to use the disks containing the OS? |
---|
| Yes - does system boot to OS after verifying boot priority is set correctly to first scan the disks containing the OS? - make sure to check all installed drives are identified in the 'boot' tab in BIOS
Expand |
---|
title | No - does OS boot after physically re-seating the drives? |
---|
| No - does OS boot after physically re-seating the drives?
Expand |
---|
title | No - (if system was set to use offboard orginally) can you get OS if you change primary display setting to 'onboard'? |
---|
| No - (if system was set to use offboard orginally) can you get OS if you change primary display setting to 'onboard'? Expand |
---|
title | (END) Yes - boots to OS, see next tree related to OS issues |
---|
| (END) Yes - boots to OS, see next tree related to OS issues |
Expand |
---|
title | No - Does system boot up using another (separate) boot drive installed? |
---|
| No - Does system boot up using another (separate) boot drive installed? Expand |
---|
title | (END) Yes - Corrupted boot drive/OS; possible Component RMA, or escalation (see notes) |
---|
| (END) Yes - Corrupted boot drive/OS; possible Component RMA, or escalation (see notes) - Multiple boot drives and are setup as RAID1 - it is unlikely both drives corrupted at the same time unless the OS/kernel was altered
- escalate to management to have them review scenario or to quote options for drive/OS/SW
- Single boot drive - issue Component RMA
- we need to pre-load the OS/SW at a loss
|
Expand |
---|
title | (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
---|
| (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
|
|
|
|
Expand |
---|
title | No - are you receiving any activity lights on the keyboard while system powers on? |
---|
| No - are you receiving any activity lights on the keyboard while system powers on? - strike 'caps lock' or 'scroll lock' keys to see if the keyboard LED (if applicable) react
Expand |
---|
title | Yes - can system reboot by using 'ctrl+alt+del'? |
---|
| Yes - can system reboot by using 'ctrl+alt+del'? Expand |
---|
title | Yes - (if stuck at a blank screen after splash screens load) can the OS/kernel selection screen be accessed by using 'up+down arrow keys' after splash screen passes during boot? |
---|
| Yes - (if stuck at a blank screen after splash screens load) can the OS/kernel selection screen be accessed by using 'up+down arrow keys' after splash screen passes during boot? - Proceed to OS issues tree if able to get to OS/kernel selection screen
Expand |
---|
title | (END) Yes - Proceed to OS issues tree if able to get to OS/kernel selection screen |
---|
| (END) Yes - Proceed to OS issues tree if able to get to OS/kernel selection screen |
Expand |
---|
title | No - Does system boot up using another (separate) boot drive installed? |
---|
| No - Does system boot up using another (separate) boot drive installed? Expand |
---|
title | (END) Yes - Corrupted boot drive/OS; possible Component RMA, or escalation (see notes) |
---|
| (END) Yes - Corrupted boot drive/OS; possible Component RMA, or escalation (see notes) - Multiple boot drives and are setup as RAID1 - it is unlikely both drives corrupted at the same time unless the OS/kernel was altered
- escalate to management to have them review scenario or to quote options for drive/OS/SW
- Single boot drive - issue Component RMA
- we need to pre-load the OS/SW at a loss
|
Expand |
---|
title | (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
---|
| (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
|
Expand |
---|
title | (END) No - Proceed to OS issues tree if able to get to OS/kernel selection screen |
---|
| (END) No - Proceed to OS issues tree if able to get to OS/kernel selection screen |
|
|
Expand |
---|
title | No - can you get to BIOS using another keyboard and/or USB port after restarting the system? |
---|
| No - can you get to BIOS using another keyboard and/or USB port after restarting the system? Expand |
---|
title | Yes - go back to "No - can BIOS be accessed using the 'del' key while system powers on?" |
---|
| Yes - go back to "No - can BIOS be accessed using the 'del' key while system powers on?" |
Expand |
---|
title | No - (see notes) reboot system and try to get BIOS or OS/kernel selection screen; can you access either? |
---|
| No - (see notes) reboot system and try to get BIOS or OS/kernel selection screen; can you access either? - If the OS boots to a certain point, and display driver or core packages are corrupted, you may lose all keyboard/mouse activity; rebooting and trying to boot from a different point/kernel may help proceed with troubleshooting
- (If cannot get to BIOS) Proceed to OS issues tree if able to get to OS/kernel selection screen
Expand |
---|
title | Yes - if BIOS, go back up to "No - can BIOS be accessed using the 'del' key while system powers on?" |
---|
| Yes - if BIOS, go back up to "No - can BIOS be accessed using the 'del' key while system powers on?" |
Expand |
---|
title | No - can you get to BIOS by removing all drives and then trying the 'del' key again? |
---|
| No - can you get to BIOS by removing all drives and then trying the 'del' key again? - Make sure system is powered off before removing/installing drives
Expand |
---|
title | Yes - are unsafe-onboard/offboard settings correct? |
---|
| Yes - are onboard/offboard settings correct? - This can impact display for the OS, and possibly a sanity check to ensure they are using the correct display configuration to access OS
Expand |
---|
title | Yes - Does inserting the drive back in cause the same issue? |
---|
| Yes - Does inserting the drive back in cause the same issue? - This can impact display for the OS, and possibly a sanity check to ensure they are using the correct display configuration to access OS
Expand |
---|
title | Yes - Does system boot up using another (separate) boot drive installed? |
---|
| Yes - Does system boot up using another (separate) boot drive installed? Expand |
---|
title | (END) Yes - Corrupted boot drive/OS; possible Component RMA, or escalation (see notes) |
---|
| (END) Yes - Corrupted boot drive/OS; possible Component RMA, or escalation (see notes) - Multiple boot drives and are setup as RAID1 - it is unlikely both drives corrupted at the same time unless the OS/kernel was altered
- escalate to management to have them review scenario or to quote options for drive/OS/SW
- Single boot drive - issue Component RMA
- we need to pre-load the OS/SW at a loss
|
Expand |
---|
title | (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
---|
| (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
|
Expand |
---|
title | (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
---|
| (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
|
Expand |
---|
title | No - (if system was set to use offboard orginally) can you get OS if you change primary display setting to 'onboard'? |
---|
| No - (if system was set to use offboard orginally) can you get OS if you change primary display setting to 'onboard'? Expand |
---|
title | (END) Yes - boots to OS, see next tree related to OS issues |
---|
| (END) Yes - boots to OS, see next tree related to OS issues |
Expand |
---|
title | No - Does system boot up using another (separate) boot drive installed? |
---|
| No - Does system boot up using another (separate) boot drive installed? Expand |
---|
title | (END) Yes - Corrupted boot drive/OS; possible Component RMA, or escalation (see notes) |
---|
| (END) Yes - Corrupted boot drive/OS; possible Component RMA, or escalation (see notes) - Multiple boot drives and are setup as RAID1 - it is unlikely both drives corrupted at the same time unless the OS/kernel was altered
- escalate to management to have them review scenario or to quote options for drive/OS/SW
- Single boot drive - issue Component RMA
- we need to pre-load the OS/SW at a loss
|
Expand |
---|
title | (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
---|
| (END) No - (unlikely) Motherboard/SATA port issue with board; issue System RMA |
|
|
|
|
|
|
|
|
|
|
Hardware failure details
Servers
Processor | | - Does not appear for one slot (if dual-processor)
- Different memory DIMM failing to appear each reboot
| - Bad seating
- Bent CPU pins on MB
|
Memory | | - Does not appear, or one reads differently/incorrect from the others
| |
Graphics Card | - lspci
- nvidia-smi
- device manager
| | - Bad seating
- Damaged GPU pins
|
Power Supply | - Check LED status lights for redundant PSU modules
- Re-seat redundant PSU
| - Depending on barebone, but typically Amber or No-LED is a failed PSU
| |
Power Distribution Board | | - Power supply LED indicator remains Amber or No-LED in a single slot
- BMC event log shows multiple PSU's as failing
| |
Motherboard | - Physical inspection of board
| - No LED when Power Supplies are plugged in
- Blown capacitors
| |
Chassis | - Does not power on when pressing power button
| - Does not power on when... pressing power button
| - Power Supply
- PDB
- Bent pins on MB on primary CPU slot
|
Drives | - Check boot options/priority
| - Does not appear on boot options/priority
- OS/RAID Controller identifies bad SMART status or failed drive
| - Bad seating
- Bad SATA backplane
- Improper SATA cabling/seating
|
DevBox
Processor | | - Does not appear for one slot (if dual-processor)
- Different memory DIMM failing to appear each reboot
| - Bad seating
- Bent CPU pins on MB
|
Memory | | - Does not appear, or one reads differently/incorrect from the others
| |
Graphics Card | - lspci
- nvidia-smi
- device manager
| |
|
Power Supply | - PSU LED (if applicable)
- No power LED on motherboard
| - Does not allow system to power on when chassis or motherboard power-on options are used
|
|
Motherboard | | - No power LED displaying on board when power supply is known-working and plugged in
|
|
Chassis | - Press power button on chassis
- Pressing/jumping power on motherboard
| - No power when chassis power button is pressed
|
|