Hey pbfarmer! Thank you for the elaborate tests and detailed descriptions, much appreciated!
When running at lower clocks around 1407, the 15+15 configuration have been clearly superior to 16+14 in our tests as well, so your results are well aligned with our own testing.
Replying to the noted concerns/feedback below:
Some sort of 'resource release' process at shutdown would be useful. It seemed if the miner was started too soon after it was stopped, the entire machine froze up. Also, in general, the crash behavior of this miner is much less forgiving than others - most crashes meant a full reboot.
Absolutely agree. We're sloppy at shutdown, and it will be addressed shortly. We need to add proper signal handlers that works for both Linux and Windows and catch the ctrl-c/sighup signals, then do a proper release of all OpenCL resources.
This may just be the cost of mining cnv2, but power transients are huge. On other miners, i saw regular 30-40W spikes from the median (w/ similar drops,) but for TR, i'm seeing 70W+ spikes, causing your mean and median to significantly diverge. Specifically, while the observed median (2 GPUs, excluding idle) was around 285-290W, regular spikes up to 360W+ resulted in a mean draw around 310W. Any way to get these down? I could see these causing stability issues or tripped circuit protections for some people.
This is a very good point, and we've noted it ourselves, both when designing the kernels and by direct at-the-wall measurements. The issue stems from this miner requiring less power in the long-running main part of CN compared to others, but it also goes full throttle in other parts, requiring more power. Hence, the min-to-max power swings are amplified at both ends. Any CN miner with the same profile as this one that executes the algo in the most straightforward way would exhibit this problem.
We're also seeing a notable difference in stability on the Vega 64s vs 56s. I believe this is part of the problem. The fewer CUs means there will be additional pressure on 8-16 of the 56 CUs compared to the 64s as these spikes occur, a little bit depending on your CN config though.
There are a few ways of addressing this, and if we want to achieve max stability I believe we need to solve it. I have a very good design for solving it, but it's a big redesign and rewrite. We will get there at some point. Meanwhile, we're debating simpler forms of reducing the effect of the spikes, like cutting the worst case scenario in half. We'll go to work shortly on this.
Any possibility of incorporating a simple HTTP/REST report mechanism in addition to the cgminer rpc api (like stak, cast, srb, jce.) It could just dump the current rpc api summary json, and it would be much more useful for quick setup/tuning, esp if you're only incorporating summary reports and not miner controls.
Yep, we're aware that the cgminer/sgminer api isn't really the CN standard, and we have some plans. We've worked hard to keep the miner itself free from any open source dependencies, we've written every single line of code from scratch. For example, that's we're missing on-cpu verification for CN in these first versions, we refuse to steal any code from xmrig, xmr-stak or even the XMR wallet. We'd like to have it clean from attributions.
Given the above, we won't pull in e.g. lighttpd as a dependency in the miner. My plan is rather to implement a separate project that we open source, a little http adapter in C++/node.js/python/whatever that converts the cgminer/sgminer api to an xmr stak-like HTTP/REST api. It will also have the nice feature of working with any sgminer-derived miner, not just our miner. I don't get the point of these massive monolith miner implementations, it would be so much nicer if we would have separated these different concerns a long time ago in the miner dev world so everyone could focus on what really matterns, the kernels and mining process. You can also separate some of the watchdog aspects and place them outside of the miner. Never really a good idea to run a watchdog thread inside the same process it's supposed to monitor.
So, it's also on the TODO list, we'll see when we get there. Time is always the limiting factor
. If anyone in the community wants to get involved, give me a ping.