I tried your tool (DP18) on a V100.
[2020-06-22.11:35:35] [Info] DP: 0 TP: 0 853.74 Mpt/s (64 iter/s)
[2020-06-22.11:35:37] [Info] Verifying 40336 results
[2020-06-22.11:35:45] [Info] DP: 0 TP: 0 992.50 Mpt/s (75 iter/s)
[2020-06-22.11:35:48] [Info] Verifying 40362 results
[2020-06-22.11:35:55] [Info] DP: 0 TP: 0 991.18 Mpt/s (75 iter/s)
Kangaroo on a server too, configured in the same way, however, it is not clear how many kangaroo are running in parallel with your program and what grid setting is used.
GPU: GPU #0 Tesla V100-PCIE-16GB (80x64 cores) Grid(160x128) (207.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^21.32 kangaroos [11.2s]
[2000.07 MK/s][GPU 2000.07 MK/s][Count 2^37.48][01:52][Server OK]
It says exactly how many kangaroos are running in parallel, 58,395,776 in this example:
eclambda --name testjob85 --gpu-mem-usage 0.9 --device 2
______ ______ __ ___ __ ___ ____ ____ ___
/ ____// ____/ / / / | / |/ // __ ) / __ \ / |
/ __/ / / / / / /| | / /|_/ // __ |/ / / // /| |
/ /___ / /___ / /___ / ___ | / / / // /_/ // /_/ // ___ |
/_____ / \____/ /_____//_/ |_|/_/ /_//_____//_____//_/ |_|
EC LAMBDA CLIENT
VERSION 1.1.1 ALPHA
[2020-06-22.16:26:33] [Info] Connecting to 127.0.0.1
[2020-06-22.16:26:34] [Info] Target public key:
[2020-06-22.16:26:34] [Info] X:F1367CC260779F7EA6C7E4B7258A4D31A4C41D6282C5200571CE10E748A4AADE
[2020-06-22.16:26:34] [Info] Y:0743F0CA057C7F39A9D9A20D4A93555B19F712920EEEF2F267466A2F3D08662E
[2020-06-22.16:26:34] [Info] Distinguisher: 24 bits
[2020-06-22.16:26:34] [Info] Sending results to server every 10 minutes
[2020-06-22.16:26:34] [Info] Initializing GeForce RTX 2080 SUPER
[2020-06-22.16:26:34] [Info] Compiling OpenCL kernels...
[2020-06-22.16:26:34] [Info] Initializing...
[2020-06-22.16:27:09] [Info] Generating 58,395,776 starting points (7184.1MB)
[2020-06-22.16:27:37] [Info] 10.0%
[2020-06-22.16:27:42] [Info] 20.0%
[2020-06-22.16:27:48] [Info] 30.0%
[2020-06-22.16:27:50] [Info] 40.0%
[2020-06-22.16:27:50] [Info] 50.0%
[2020-06-22.16:27:50] [Info] 60.0%
[2020-06-22.16:27:51] [Info] 70.0%
[2020-06-22.16:27:51] [Info] 80.0%
[2020-06-22.16:27:52] [Info] 90.0%
[2020-06-22.16:27:52] [Info] 100.0%
[2020-06-22.16:27:54] [Info] Refilling GPU cache (319.3MB)
[2020-06-22.16:27:54] [Info] 10.0%
[2020-06-22.16:27:54] [Info] 20.0%
[2020-06-22.16:27:55] [Info] 30.0%
[2020-06-22.16:27:55] [Info] 40.0%
[2020-06-22.16:27:55] [Info] 50.0%
[2020-06-22.16:27:55] [Info] 60.0%
[2020-06-22.16:27:55] [Info] 70.0%
[2020-06-22.16:27:55] [Info] 80.0%
[2020-06-22.16:27:55] [Info] 90.0%
[2020-06-22.16:27:55] [Info] 100.0%
[2020-06-22.16:27:55] [Info] Tuning started
[2020-06-22.16:27:55] [Info] Results collection thread started
[2020-06-22.16:28:05] [Info] DP: 0 TP: 0 587.62 Mpt/s (10 iter/s)
[2020-06-22.16:28:15] [Info] DP: 0 TP: 0 1212.69 Mpt/s (20 iter/s)
[2020-06-22.16:28:25] [Info] DP: 0 TP: 0 1170.13 Mpt/s (20 iter/s)
[2020-06-22.16:28:28] [Info] Tuning complete
[2020-06-22.16:28:35] [Info] DP: 0 TP: 0 1187.71 Mpt/s (20 iter/s)
[2020-06-22.16:28:40] [Info] Verifying 2785 results
[2020-06-22.16:28:45] [Info] DP: 0 TP: 0 1325.58 Mpt/s (22 iter/s)
[2020-06-22.16:28:55] [Info] DP: 0 TP: 0 1322.54 Mpt/s (22 iter/s)
[2020-06-22.16:29:05] [Info] DP: 0 TP: 0 1315.67 Mpt/s (22 iter/s)
It automatically finds the best grid size, so I do not know if it's useful to even display it.
Increasing --gpu-mem-usage increases performance. By default it's low to avoid timing out/crashing for people testing it on display GPUs.