...
Download/unpack files into root directoy
Code Block language java theme Emacs wget https://s3-us-west-1.amazonaws.com/exxact-support/Test+Folder/Stand_Alone_Validation_v3.1.tar.gz tar -xvzf Stand_Alone_Validation_v3.1.tar.gz
Change directory to unpacked folder
Code Block language java theme Emacs cd Stand_Alone_Validation
Set amount of GPU's/test cycles desired by editing 'run_test.x' file
Code Block language java theme Emacs nano run_test.x#How many GPUs in node gpu_count=4 #How many tests to run of each type #Large test requires 5GB memory #Xlarge test requires 11GB memory small_test_count=20 large_test_count=10 xlarge_test_count=5
Note: Duration of tests varies depending on GPU's being used. If you are using a smaller GPU specifically for display, you need to remove that GPU and use this system using terminal-view only or SSH to run the test.
Save changes using 'ctrl+c' and answering 'y' to the prompt; I typically like to set 5/5/2 tests. The default amount of cycles are typically meant for overnight/long duration testing
Run test in the background by using
Code Block language java theme Emacs nohup ./run_test.x &
Monitor GPU temps by opening another terminal and using
Code Block language java theme Emacs nvidia-smi -l
Checking results
View the output logs in the 'Stand_Alone_Validation' directory and make sure the results are matching for each cycle.
Example:
Code Block | ||||
---|---|---|---|---|
| ||||
exx@ubuntu:~/Stand_Alone_Validation$ ls
clean.x GPU_1.log GPU_3.log lib nohup.out output_files_large run_test.x standalone-test_v3.bin
GPU_0.log GPU_2.log input LICENSE output_files README standalone-test.bin standalone-test_v3_p2p.bin
exx@ubuntu:~/Stand_Alone_Validation$ cat *.log
0.0: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
0.1: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
0.2: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
0.3: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
0.4: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
1.0: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
1.1: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
1.2: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
1.3: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
1.4: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
2.0: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
2.1: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
2.2: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
2.3: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
2.4: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
3.0: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
3.1: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
3.2: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
3.3: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
3.4: Etot = -58216.8663 EKtot = 14421.1768 EPtot = -72638.0430
|
As you can see above, 0.0 = GPU, cycle = Etot = EKtot = EPtot. I have 4 GPU's that has passed 5 cycles of the small test with matching results.
Info |
---|
Related articles
Filter by label (Content by label) | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
Page Properties | ||
---|---|---|
| ||
|