...
Download/unpack files into root directoy
Code Block language java theme Emacs wget https://s3-us-west-1.amazonaws.com/exxact-support/Test+Folder/Stand_Alone_Validation_v3.1.tar.gz tar -xvzf Stand_Alone_Validation_v3.1.tar.gz
Change directory to unpacked folder
Code Block language java theme Emacs cd Stand_Alone_Validation
Set amount of GPU's/test cycles desired by editing 'run_test.x' file
Code Block language java theme Emacs nano run_test.x #How many GPUs in node gpu_count=4 #How many tests to run of each type #Large test requires 5GB memory #Xlarge test requires 11GB memory small_test_count=20 large_test_count=10 xlarge_test_count=5
Note: Duration of tests varies depending on GPU's being used. If you are using a smaller GPU specifically for display, you need to remove that GPU and use this system using terminal-view only or SSH to run the test.
Save changes using 'ctrl+x' and answering 'y' to the prompt; I typically like to set 5/5/2 tests. The default amount of cycles are typically meant for overnight/long duration testing
Run test in the background by using
Code Block language java theme Emacs nohup ./run_test.x &
Monitor GPU temps by opening another terminal and using 'nvidia-smi -l'; once you no longer see the 'standalone-test.bin' process being printed from 'nvidia-smi', you can check the logs to see if your set amount of cycles completed.
Code Block language java theme Emacs exx@ubuntu:~/Stand_Alone_Validation$ nvidia-smi -l Tue Jan 15 17:35:14 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 On | 00000000:05:00.0 On | N/A | | 78% 86C P2 149W / 180W | 4767MiB / 8118MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 On | 00000000:06:00.0 Off | N/A | | 77% 86C P2 155W / 180W | 4569MiB / 8119MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 1080 On | 00000000:09:00.0 Off | N/A | | 72% 86C P2 124W / 180W | 4569MiB / 8119MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 1080 On | 00000000:0A:00.0 Off | N/A | | 59% 83C P2 134W / 180W | 4569MiB / 8119MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1910 G /usr/lib/xorg/Xorg 157MiB | | 0 2889 G compiz 40MiB | | 0 5848 C ../standalone-test.bin 4557MiB | | 1 5849 C ../standalone-test.bin 4557MiB | | 2 5850 C ../standalone-test.bin 4557MiB | | 3 5851 C ../standalone-test.bin 4557MiB | +-----------------------------------------------------------------------------+
As for the time it takes per cycle, I have not yet measured them per small, large, or xlarge cycles. I assume with the 5/5/2 cycles, it will complete in 6-8 hours.
Checking results
View the output logs in the 'Stand_Alone_Validation' directory and make sure the results are matching for each cycle. In this example, I only had 5 small tests on 4x GPU's. The large and Xlarge tests write their own files per GPU_x.
Example:
Code Block | ||||
---|---|---|---|---|
| ||||
exx@ubuntu:~/Stand_Alone_Validation$ ls clean.x GPU_1.log GPU_3.log lib nohup.out output_files_large run_test.x standalone-test_v3.bin GPU_0.log GPU_2.log input LICENSE output_files README standalone-test.bin standalone-test_v3_p2p.bin exx@ubuntu:~/Stand_Alone_Validation$ cat *.log 0.0: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 0.1: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 0.2: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 0.3: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 0.4: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 1.0: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 1.1: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 1.2: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 1.3: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 1.4: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 2.0: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 2.1: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 2.2: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 2.3: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 2.4: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 3.0: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 3.1: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 3.2: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 3.3: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 3.4: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430 |
...