Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

[INTERNAL USE]

Contents

Table of Contents
minLevel1
maxLevel6
outlinefalse
typelist
printablefalse

HOW TO INSTALL TOOL

Tool File Name:
Hopper (H100 GPU): 629-24287-XXXX-FLD-38780.tgz
Ampre (A100 GPU): 629-22687-XX86-FLD-38225.tgz

...

Code Block
root@rdlab:/var/diags/629-22687-XX86-FLD-38225# ll
total 243812
drwxr-xr-x 2 root root      4096 Feb 12 22:27 ./
drwxr-xr-x 3 root root      4096 Feb 12 22:27 ../
-rwxr-xr-x 1 exx  exx      31232 Sep 13 22:39 fieldiag.sh*
-rwxr-xr-x 1 exx  exx  238455527 Sep 13 22:39 hgxfieldiag.r3.100*
-r-xr-xr-x 1 exx  exx   11115992 Sep 13 22:39 nvflash*
-rwxr-xr-x 1 exx  exx       2719 Sep 13 22:39 README.txt*
-rwxr-xr-x 1 exx  exx       1650 Sep 13 22:39 relnotes.txt*
-rwxr-xr-x 1 exx  exx       1046 Sep 13 22:39 sku_hgx-a100-4-gpu_40g_aircooled.json*
-rwxr-xr-x 1 exx  exx       1046 Sep 13 22:39 sku_hgx-a100-4-gpu_40g_liquidcooled.json*
-rwxr-xr-x 1 exx  exx       1676 Sep 13 22:39 sku_hgx-a100-4-gpu_64g.json*
-rwxr-xr-x 1 exx  exx       1876 Sep 13 22:39 sku_hgx-a100-4-gpu_80g_aircooled.json*
-rwxr-xr-x 1 exx  exx       1798 Sep 13 22:39 sku_hgx-a100-4-gpu_80g_liquidcooled.json*
-rwxr-xr-x 1 exx  exx       1048 Sep 13 22:39 sku_hgx-a100-4-gpu_96g.json*
-rwxr-xr-x 1 exx  exx       8618 Sep 13 22:39 testargs_hgx-a100-4-gpu.json*

PROBLEM SITUATION

Supermicro provided this file to diagnose HGX H100 GPU issues. Related to ZD-6179 / SMC CRM Case: SM2310022368.

...

View file
nameRMA28206-fabricmanager.log
View file
nameRMA28206-NVSwitch Detection..txt

FIELDIAG TOOL USAGE

Review the README.txt for details on usage and options.

...