I've noticed that when Windows blocks CastXMR from accessing a GPU because of aggressive overclocking, the software doesn't handle it very gracefully. The affected GPU stops reporting its hashrate every five seconds, and after a few minutes the software hangs entirely and multiple ^C are needed to terminate the run, assuming that the system doesn't freeze entirely and need a warm reset.
A useful feature would be for CastXMR to recognize that if it is accessing a list of GPUs, each of them should report results periodically, and if one of them goes missing for, say, twice the running average time between results, then it should be considered dead, and a clean restart of CastXMR should be initiated.
Or better yet, if CastXMR could perform a disable/enable of the problem card and then reinitialize it individually, so that none of the other cards had to be bothered.
With a Mining Expert 19-slot motherboard running a pile of aggressively overclocked Vegas, you're more likely to wedge one of the cards than usual, and being able to recover gracefully from that situation would substantially improve the average hashrate.
It's not really a problem with CAST-XMR now is it. That is you choosing to overdrive the card and then demand that that the tool fix it for you. Maybe you should spend more time accepting a stable hash rate per card and live with it. Its really a driver issue anyway and you are taking risks when overclocking that you will put the system into an unrecoverable state.
I personally don't need a feature in CAST that disables the cards and I don't think that would be helpful for people who have configured their systems correctly. I reboot once a week for windows updates, other than that I don't really look at the rigs.
If you need these bandaid scripts to monitor your hash rate cause it drops or your cards hang you have other problems that are not associated with the miner. I would rather the dev spend time improving the miner efficiency rather than adding some lame monitoring function that is completely unnecessary.
Oh and by the way I use Wattman. YIKES!
Really? Unbelieveble!
I'm using cast_xmr_vega with 6x VEGA56. Everyday and very often I need to restart miner or PC. VEGA's are not modded.
Everything I do before running the miner is run devcon.exe for cli enable/disable GPU's and then Overdriven tool with overclock settings.
Very often is output paused... until S is pressed and after is all messages displayed with OUTDATED result.
I never had a ban from poolserver but last 2 days 3 times! Sometimes freezing after Difficulty changed message.
https://ibb.co/n9Ra4wWhen I press Q displayed 'quitting..' and nothing happend.
I would suggest not using useless tools like Overdriven. I know everyone raves about how awesome it is but I have never seen the need for it frankly, even though everyone says Wattman is the worst piece of software in the world.
I would make sure you started with a completely clean install of Windows. I would make sure you set your virtual memory correctly. I would make sure you configured windows power settings correctly. Then I would make sure to cleanly install the blockchain drivers. I would start would not start OC'ing anything until you can stay up for 24 hours at the same hash rate with no issues. Also make sure to add an exclusion for Windows Defender. AKA EVERYTHING THAT HAS ALREADY BEEN POSTED IN THIS WHOLE THREAD!
Lastly, since you did not state what you were mining or what pool you were using you could do that favor as well. Perhaps the pool or your internet connection or your drivers or your ethernet or whatever is not working correctly or you have a riser problem? Really man can you troubleshoot or problem solve? There are endless things that can contribute to your problems and if you are having problems you need to stop mining and start testing!
There are so many things that are connected via hardware and controlled by software and you need to understand all that!
I have the same Vega rig as you do and guess what... they all run at about 1975h/s average. I never restart the miner except if I decide to change a pool or mine something else, and the last time I looked at that Vega 56 rig was last Sunday. I can see the output on the site I am mining to and the hash rate looks good! I am still getting paid!
So you can package up your rig and send it to me! I will take this problem child off your hands :>)
First thank you for your comprehensive answer! Currently I mining XMR and ETH on supportxtm.com but same issues. I thing the internet connection is not problem. This thread is very long but ok I read it all.
This is my current logs:
[Job: #41 | Difficulty: 439427 | Running: 16.0 sec | Avg Job Time: 47.4 sec]
[Hash Rate Avg: 11788.4 H/s]
1977.6 H/s GPU0
1960.8 H/s GPU1
1965.5 H/s GPU2
1970.5 H/s GPU3
1966.0 H/s GPU4
1947.1 H/s GPU5
[Shares Found: 66 | Avg Search Time: 29.5 sec]
63 ( 95%) Accepted
0 ( 0%) Rejected by pool
0 ( 0%) Invalid result computation failed
0 ( 0%) Could not be submit because of network error
3 ( 5%) Outdated because of job change
[13:55:53] GPU5 | 42°C | Fan 4447 RPM | 1952.8 H/s
[13:55:53] GPU3 | 46°C | Fan 4552 RPM | 1972.1 H/s
[13:55:55] GPU1 | 44°C | Fan 4441 RPM | 1968.5 H/s
[13:55:56] GPU2 | 45°C | Fan 4538 RPM | 1968.9 H/s
[13:55:57] GPU0 | 45°C | Fan 4495 RPM | 1979.7 H/s
[13:55:58] GPU4 | 43°C | Fan 4494 RPM | 1968.1 H/s
[13:55:58] GPU5 | 42°C | Fan 4451 RPM | 1951.7 H/s
[13:55:59] GPU3 | 45°C | Fan 4560 RPM | 1972.5 H/s
My second rig:
[Pool: 'etn.fairhash.org:7777' | Connected: 2018-01-05 19:06:58]
17:50:03 (100%) Online
0:00:00 ( 0%) Offline
[Job: #1217 | Difficulty: 250406 | Running: 19.0 sec | Avg Job Time: 52.8 sec]
[Hash Rate Avg: 5889.9 H/s]
1965.7 H/s GPU0
1955.6 H/s GPU1
1968.5 H/s GPU2
[Shares Found: 1632 | Avg Search Time: 40.8 sec]
1575 ( 97%) Accepted
0 ( 0%) Rejected by pool
0 ( 0%) Invalid result computation failed
0 ( 0%) Could not be submit because of network error
57 ( 3%) Outdated because of job change
[12:57:02] GPU2 | 56°C | Fan 4541 RPM | 1970.7 H/s
[12:57:04] GPU0 | 53°C | Fan 4482 RPM | 1963.8 H/s
[12:57:12] GPU1 | 57°C | Fan 4487 RPM | 1954.9 H/s
[12:57:12] GPU2 | 56°C | Fan 4535 RPM | 1970.7 H/s
[12:57:12] GPU0 | 53°C | Fan 4477 RPM | 1962.8 H/s
[12:57:12] GPU1 | 57°C | Fan 4486 RPM | 1954.6 H/s
[12:57:13] GPU2 | 56°C | Fan 4533 RPM | 1970.7 H/s
[12:57:15] GPU0 | 53°C | Fan 4473 RPM | 1962.4 H/s
[12:57:16] GPU1 | 57°C | Fan 4479 RPM | 1955.3 H/s
[12:57:18] GPU2 | 55°C | Fan 4527 RPM | 1967.4 H/s