[ANN] TeamRedMiner v0.10.10 - Ironfish/Kaspa/ZIL/Kawpow/Etchash and More - page 71.

migo77

newbie

Activity: 23

Merit: 1

Quote from: arlekin on June 12, 2019, 06:07:20 PM

I have the same problem on 4 rigs with 4 various Vega 56 cards (ref with samsun g and asus/powercolor with hynix, with timings and without timings ). The card runs for an hour or two and then the hash rate drops to zero or to 1400 h/s with a DEAD message. I have observed in several rigs, but for some reason problems always with the first PCI slot (PCIe 3:0.0). If you do not stick the card into the first PCI slot (PCI 16x), then the problem disappears. Now I just do not use 16x pci slot Sad

Hi, interesting... for me it is GPU4 and it is located at PCIex16 slot too.

LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

arlekin

jr. member

Activity: 55

Merit: 12

I have the same problem on 4 rigs with 4 various Vega 56 cards (ref with samsun g and asus/powercolor with hynix, with timings and without timings ). The card runs for an hour or two and then the hash rate drops to zero or to 1400 h/s with a DEAD message. I have observed in several rigs, but for some reason problems always with the first PCI slot (PCIe 3:0.0). If you do not stick the card into the first PCI slot (PCI 16x), then the problem disappears. Now I just do not use 16x pci slot Sad

Rakly3

newbie

Activity: 45

Merit: 0

Quote from: Rakly3 on June 12, 2019, 12:05:09 PM

Before the swap
[2019-06-12 11:43:48] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.887kh/s a:90 r:3 hw:0
[2019-06-12 11:44:18] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.884kh/s a:90 r:3 hw:0
[2019-06-12 11:44:48] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.882kh/s a:90 r:3 hw:0
[2019-06-12 11:45:18] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.880kh/s a:90 r:3 hw:0
[2019-06-12 11:45:48] GPU 5 [54C, fan 58%] cnr: 2.359kh/s, avg 1.980kh/s, pool 1.877kh/s a:90 r:3 hw:0
[2019-06-12 11:45:51] GPU 5: detected DEAD (03:00.0), no restart script configured, will continue mining.

After the swap:
[2019-06-12 18:58:31] GPU 2 [56C, fan 66%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.603kh/s a:17 r:0 hw:0
[2019-06-12 18:59:01] GPU 2 [56C, fan 62%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.555kh/s a:17 r:0 hw:0
[2019-06-12 18:59:31] GPU 2 [57C, fan 66%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.509kh/s a:17 r:0 hw:0
[2019-06-12 19:00:01] GPU 2 [49C, fan 62%] cnr: 2.125kh/s, avg 1.967kh/s, pool 3.463kh/s a:17 r:0 hw:0
[2019-06-12 19:00:03] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.

[2019-06-12 19:24:23] GPU 2 [58C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.667kh/s a:5 r:0 hw:0
[2019-06-12 19:24:53] GPU 2 [58C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.601kh/s a:5 r:0 hw:0
[2019-06-12 19:25:23] GPU 2 [57C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.539kh/s a:5 r:0 hw:0
[2019-06-12 19:25:53] GPU 2 [56C, fan 66%] cnr: 1.987kh/s, avg 1.979kh/s, pool 1.482kh/s a:5 r:0 hw:0
[2019-06-12 19:26:23] GPU 2 [55C, fan 60%] cnr: 2.293kh/s, avg 1.973kh/s, pool 1.429kh/s a:5 r:0 hw:0
[2019-06-12 19:26:34] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.

So this time something a little different happened, but still conforms to my mem clockspeed theory.

[2019-06-12 21:08:46] GPU 2 [55C, fan 62%] cnr: 1.988kh/s, avg 1.985kh/s, pool 2.857kh/s a:22 r:1 hw:0
[2019-06-12 21:09:16] GPU 2 [57C, fan 60%] cnr: 1.987kh/s, avg 1.986kh/s, pool 2.838kh/s a:22 r:1 hw:0
[2019-06-12 21:09:46] GPU 2 [56C, fan 62%] cnr: 2.150kh/s, avg 1.985kh/s, pool 2.820kh/s a:22 r:1 hw:0
[2019-06-12 21:10:16] GPU 2 [54C, fan 54%] cnr: 1.550kh/s, avg 1.982kh/s, pool 2.802kh/s a:22 r:1 hw:0
[2019-06-12 21:10:46] GPU 2 [54C, fan 60%] cnr: 1.549kh/s, avg 1.979kh/s, pool 2.784kh/s a:22 r:1 hw:0
...
[2019-06-12 21:14:16] GPU 2 [52C, fan 53%] cnr: 1.549kh/s, avg 1.961kh/s, pool 2.666kh/s a:22 r:1 hw:0
[2019-06-12 21:14:46] GPU 2 [47C, fan 43%] cnr: 1.551kh/s, avg 1.950kh/s, pool 2.650kh/s a:22 r:1 hw:0
[2019-06-12 21:14:51] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.

Instead of crashing immediatly, it seems the clockspeed got reset to the bios default first.

Now I'm gonna swap the 2 cards back to their original place.

Rakly3

newbie

Activity: 45

Merit: 0

Quote from: kerney666 on June 12, 2019, 12:27:19 PM

Quote from: Rakly3 on June 12, 2019, 12:05:09 PM

GPU 2: detected DEAD (13:00.0)

completely different card & bus now :/

I'm still using the stock bios btw. Different clockspeeds or voltages don't seem to impact it.
Timing level don't seem to matter either (driver setting, no mod) But I havn't tested for that specifically yet.

Hi! Even though I’m not replying to every message I’m always reading everything. I’m currently testing a few things and have gone through our full commit history from 0.4.4 and forward.

We have a few small bug fixes on the way out, but it’s impossible to tell if any of those are the underlying issue here. I’ll reach out to a few of you in pm to see if you can run some test builds. Would love to nail this, if possible.

oh hi!
I often edit my posts with more info, like I just did now. (So to not spam the thread, I talk a lot.)
I added some hashing speed data from the logs.
I don't have the logs from before, I deleted them while testing my start_template.bat
But I did see the same thing.
While copying some of the log data i also noticed there seems to be a temp drop. (below 55C ramp up?)
Afterburner is controlling the fans.
But i find 3 examples a bit small to go on atm. Although, in migo's log data I see 55C or less too on his crashing card, but no rampup, prolly because he alread is at 2.2k+?

I'm gonna do another run, and after that swap the cards back to their origal order.
I have no idea how long it will take though. It ran for hours overnight just fine.

kerney666

member

Activity: 658

Merit: 86

Quote from: Rakly3 on June 12, 2019, 12:05:09 PM

GPU 2: detected DEAD (13:00.0)

completely different card & bus now :/

I'm still using the stock bios btw. Different clockspeeds or voltages don't seem to impact it.
Timing level don't seem to matter either (driver setting, no mod) But I havn't tested for that specifically yet.

Hi! Even though I’m not replying to every message I’m always reading everything. I’m currently testing a few things and have gone through our full commit history from 0.4.4 and forward.

We have a few small bug fixes on the way out, but it’s impossible to tell if any of those are the underlying issue here. I’ll reach out to a few of you in pm to see if you can run some test builds. Would love to nail this, if possible.

Rakly3

newbie

Activity: 45

Merit: 0

GPU 2: detected DEAD (13:00.0)

completely different card & bus now :/

I'm still using the stock bios btw. Different clockspeeds or voltages don't seem to impact it.
Timing level don't seem to matter either (driver setting, no mod) But I havn't tested for that specifically yet.

2nd run since switching the GPU's, And although it's not the same card or bus as before, it is again the one on bus 13
(bus 3 prior to swapping 2 cards.)

The card that is about to fail also always seem to spike in hasrate (by about 200-300h/s) right before crashing.

Before the swap
[2019-06-12 11:43:48] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.887kh/s a:90 r:3 hw:0
[2019-06-12 11:44:18] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.884kh/s a:90 r:3 hw:0
[2019-06-12 11:44:48] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.882kh/s a:90 r:3 hw:0
[2019-06-12 11:45:18] GPU 5 [55C, fan 62%] cnr: 1.986kh/s, avg 1.981kh/s, pool 1.880kh/s a:90 r:3 hw:0
[2019-06-12 11:45:48] GPU 5 [54C, fan 58%] cnr: 2.359kh/s, avg 1.980kh/s, pool 1.877kh/s a:90 r:3 hw:0
[2019-06-12 11:45:51] GPU 5: detected DEAD (03:00.0), no restart script configured, will continue mining.

After the swap:
[2019-06-12 18:58:31] GPU 2 [56C, fan 66%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.603kh/s a:17 r:0 hw:0
[2019-06-12 18:59:01] GPU 2 [56C, fan 62%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.555kh/s a:17 r:0 hw:0
[2019-06-12 18:59:31] GPU 2 [57C, fan 66%] cnr: 1.987kh/s, avg 1.984kh/s, pool 3.509kh/s a:17 r:0 hw:0
[2019-06-12 19:00:01] GPU 2 [49C, fan 62%] cnr: 2.125kh/s, avg 1.967kh/s, pool 3.463kh/s a:17 r:0 hw:0
[2019-06-12 19:00:03] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.

[2019-06-12 19:24:23] GPU 2 [58C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.667kh/s a:5 r:0 hw:0
[2019-06-12 19:24:53] GPU 2 [58C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.601kh/s a:5 r:0 hw:0
[2019-06-12 19:25:23] GPU 2 [57C, fan 66%] cnr: 1.987kh/s, avg 1.980kh/s, pool 1.539kh/s a:5 r:0 hw:0
[2019-06-12 19:25:53] GPU 2 [56C, fan 66%] cnr: 1.987kh/s, avg 1.979kh/s, pool 1.482kh/s a:5 r:0 hw:0
[2019-06-12 19:26:23] GPU 2 [55C, fan 60%] cnr: 2.293kh/s, avg 1.973kh/s, pool 1.429kh/s a:5 r:0 hw:0
[2019-06-12 19:26:34] GPU 2: detected DEAD (13:00.0), no restart script configured, will continue mining.

Spikes like these first of all make me think the mem clockspeed boosted somehow. Is there a way to track that? (with logging, I'm not gonna watch a graph all day)

Rakly3

newbie

Activity: 45

Merit: 0

Quote from: argominer on June 12, 2019, 03:15:04 AM

I think i have same issue (0.5.1). 6 x vega56 with timmings run maybe hour and then crach.

Watchdog GPU 0: stuck in enqueue, reporting.
GPU 0: detected DEAD (03:00.0), will execute restart script watchdog.bat

[2019-06-12 11:45:51] GPU 5: detected DEAD (03:00.0), no restart script configured, will continue mining.

Bus 3? What about the others with same problem?
H110 d3a mobo, bus 3 is the x16 slot on this mobo. igfx is turned on as primary display adapter.
it's always the same card/bus. I'll try switching it around with another vega and see what happens.

BTW where can i find some info about setting up a watchdog script (bat)? Like how to send runtime commands to teamredminer. Or the id of the open/running miner?

I got bored last night and wrote this if any one wants a setup template

Code:

@echo off
set ALGO=
set POOL=
set PORT=
set WALLET=
set PASSWORD=x
set DEVICES=
set INTENSITY=

:: !! optional: create logfile(s)? set LOG=YES
set LOG=YES

:: !! optional: reorder GPU's according to bus number. set REORDER=YES
:: !! NOTE! Intensity and Devices will correspond to reorder.
set REORDER=

:: !! optional: works only for cryptonote algos and if pool allows! Otherwise leave blank. (soesn't work with Nicehash)
set RIGNAME=
set DIFFICULTY=

:: !! optional: name of mining pool. (for logfile purposes only. Can be left blank.)
set POOLNAME=

:: !! optional: pause for error message? set PAUSE=YES (Prevents the command window from closing if you have a problem launching the miner.)
set PAUSE=



:: --------------------Change below settings at own risk!--------------------------

set GPU_MAX_ALLOC_PERCENT=100
set GPU_SINGLE_ALLOC_PERCENT=100
set GPU_MAX_HEAP_SIZE=100
set GPU_USE_SYNC_OBJECTS=1
set CUR_YYYY=%date:~10,4%
set CUR_MM=%date:~4,2%
set CUR_DD=%date:~7,2%
if defined PORT set PORT=:%PORT%
if defined DEVICES set DEVICES=-d %DEVICES%
if defined RIGNAME set RIGNAME=--rig_id %RIGNAME%
if defined DIFFICULTY set DIFFICULTY=.%DIFFICULTY%
if not exist LOGS\%POOLNAME% mkdir LOGS\%POOLNAME%
if "%LOG%"=="YES" set LOG=--log_file=LOGS\%POOLNAME%\LOG_%POOLNAME%_%CUR_YYYY%.%CUR_MM%.%CUR_DD%_%ALGO%.txt
if "%REORDER%"=="YES" set REORDER=--bus_reorder


@echo on
teamredminer.exe -a %ALGO% -o %POOL%%PORT%%DIFFICULTY% -u %WALLET% -p %PASSWORD% %REORDER% %DEVICES% --cn_config=%INTENSITY% %LOG% %RIGNAME%

@if "%PAUSE%"=="YES" pause (
	) else (
	@exit
	)

I actually just wanted some structure in my logfiles for troubleshooting these dead GPU issues but ended up with this Cheesy

argominer

newbie

Activity: 2

Merit: 0

I think i have same issue (0.5.1). 6 x vega56 with timmings run maybe hour and then crach.

Watchdog GPU 0: stuck in enqueue, reporting.
GPU 0: detected DEAD (03:00.0), will execute restart script watchdog.bat

migo77

newbie

Activity: 23

Merit: 1

Quote from: kerney666 on June 11, 2019, 09:43:37 AM

Quote from: migo77 on June 11, 2019, 08:52:14 AM

2 kerney666

Hi, my rig can't run longer then 1-2 hours on v 0.4.5 and newer. One GPU hangs (this is from 0.5.1 after autoconfig):

[2019-06-10 21:22:55] GPU 0 [57C, fan 87%] cnr: 2.486kh/s, avg 2.477kh/s, pool 2.709kh/s a:46 r:0 hw:0
[2019-06-10 21:22:55] GPU 1 [58C, fan 85%] cnr: 2.490kh/s, avg 2.482kh/s, pool 1.261kh/s a:22 r:0 hw:0
[2019-06-10 21:22:55] GPU 2 [56C, fan 86%] cnr: 2.486kh/s, avg 2.478kh/s, pool 2.969kh/s a:51 r:0 hw:1
[2019-06-10 21:22:55] GPU 3 [63C, fan 87%] cnr: 2.480kh/s, avg 2.468kh/s, pool 2.377kh/s a:42 r:0 hw:0
[2019-06-10 21:22:55] GPU 4 [45C, fan 90%] cnr: 2.483kh/s, avg 2.471kh/s, pool 2.737kh/s a:47 r:0 hw:3
[2019-06-10 21:22:55] Total cnr: 12.42kh/s, avg 12.38kh/s, pool 12.05kh/s a:208 r:0 hw:4
[2019-06-10 21:23:05] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

but v0.4.4 with exactly same config can run for weeks:

[2019-06-10 19:40:27] Stats Uptime: 13 days, 12:58:15
[2019-06-10 19:40:27] GPU 0 [59C, fan 87%] cnr: 2.471kh/s, avg 2.470kh/s, pool 2.340kh/s a:7780 r:0 hw:17
[2019-06-10 19:40:27] GPU 1 [60C, fan 85%] cnr: 2.474kh/s, avg 2.473kh/s, pool 2.423kh/s a:8057 r:0 hw:39
[2019-06-10 19:40:27] GPU 2 [57C, fan 86%] cnr: 2.471kh/s, avg 2.471kh/s, pool 2.337kh/s a:7768 r:0 hw:106
[2019-06-10 19:40:27] GPU 3 [63C, fan 87%] cnr: 2.464kh/s, avg 2.464kh/s, pool 2.381kh/s a:7917 r:0 hw:73
[2019-06-10 19:40:27] GPU 4 [55C, fan 88%] cnr: 2.470kh/s, avg 2.468kh/s, pool 2.259kh/s a:7519 r:0 hw:321
[2019-06-10 19:40:27] Total cnr: 12.35kh/s, avg 12.35kh/s, pool 11.74kh/s a:39041 r:0 hw:556
[2019-06-10 19:40:39] Pool pool.supportxmr.com received new job. (job_id: +kOSIEF95a5dlkxX6slHR0EW+l34)

I know, I'm pushing hard on limit, but what changed in TR miner 0.4.5 that causes this instability? With 0.4.4 and older was this rig rock stable. OS is Linux & amd18.3 drivers.

Thank you for answer,

Migo

Hi!

Man, it's such a hard question to answer. The changes between 0.4.4 and 0.4.5 are really tiny, and nothing that "should" affect anything in terms of stability. For cn/r, absolutely nothing of interest was touched in the kernels, and not anything specific in the host-side code either. For every release, we get a few people telling us how stable things are with the new version, then a others that (like you) unfortunately have a harder time keeping things running smoothly.

Since you're running linux, do you see anything interesting in your "dmesg" output from the kernel when a crash occurs?

-- K

Hi, this time it run bit longer:

[2019-06-12 04:25:27] GPU 4 [55C, fan 88%] cnr: 2.483kh/s, avg 2.485kh/s, pool 2.078kh/s a:225 r:0 hw:18
[2019-06-12 04:25:27] Total cnr: 12.42kh/s, avg 12.43kh/s, pool 11.40kh/s a:1240 r:0 hw:25
[2019-06-12 04:25:57] Stats Uptime: 0 days, 10:07:37
[2019-06-12 04:25:57] GPU 0 [60C, fan 88%] cnr: 2.484kh/s, avg 2.486kh/s, pool 2.371kh/s a:259 r:0 hw:0
[2019-06-12 04:25:57] GPU 1 [61C, fan 85%] cnr: 2.489kh/s, avg 2.490kh/s, pool 2.168kh/s a:236 r:0 hw:3
[2019-06-12 04:25:57] GPU 2 [57C, fan 86%] cnr: 2.485kh/s, avg 2.487kh/s, pool 2.320kh/s a:256 r:0 hw:2
[2019-06-12 04:25:57] GPU 3 [63C, fan 87%] cnr: 2.478kh/s, avg 2.480kh/s, pool 2.458kh/s a:264 r:0 hw:2
[2019-06-12 04:25:57] GPU 4 [46C, fan 89%] cnr: 2.484kh/s, avg 2.484kh/s, pool 2.076kh/s a:225 r:0 hw:18
[2019-06-12 04:25:57] Total cnr: 12.42kh/s, avg 12.43kh/s, pool 11.39kh/s a:1240 r:0 hw:25
[2019-06-12 04:26:09] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

relevant dmesg output sent to PM, don't want to pollute thread...

Thanx,

Milan

XxXBigDickXxX

newbie

Activity: 25

Merit: 2

Hi Kerney666! Will we have a miner for RandomX? Huh

migo77

newbie

Activity: 23

Merit: 1

Quote from: kerney666 on June 11, 2019, 09:43:37 AM

Quote from: migo77 on June 11, 2019, 08:52:14 AM

2 kerney666

Hi, my rig can't run longer then 1-2 hours on v 0.4.5 and newer. One GPU hangs (this is from 0.5.1 after autoconfig):

[2019-06-10 21:22:55] GPU 0 [57C, fan 87%] cnr: 2.486kh/s, avg 2.477kh/s, pool 2.709kh/s a:46 r:0 hw:0
[2019-06-10 21:22:55] GPU 1 [58C, fan 85%] cnr: 2.490kh/s, avg 2.482kh/s, pool 1.261kh/s a:22 r:0 hw:0
[2019-06-10 21:22:55] GPU 2 [56C, fan 86%] cnr: 2.486kh/s, avg 2.478kh/s, pool 2.969kh/s a:51 r:0 hw:1
[2019-06-10 21:22:55] GPU 3 [63C, fan 87%] cnr: 2.480kh/s, avg 2.468kh/s, pool 2.377kh/s a:42 r:0 hw:0
[2019-06-10 21:22:55] GPU 4 [45C, fan 90%] cnr: 2.483kh/s, avg 2.471kh/s, pool 2.737kh/s a:47 r:0 hw:3
[2019-06-10 21:22:55] Total cnr: 12.42kh/s, avg 12.38kh/s, pool 12.05kh/s a:208 r:0 hw:4
[2019-06-10 21:23:05] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

but v0.4.4 with exactly same config can run for weeks:

[2019-06-10 19:40:27] Stats Uptime: 13 days, 12:58:15
[2019-06-10 19:40:27] GPU 0 [59C, fan 87%] cnr: 2.471kh/s, avg 2.470kh/s, pool 2.340kh/s a:7780 r:0 hw:17
[2019-06-10 19:40:27] GPU 1 [60C, fan 85%] cnr: 2.474kh/s, avg 2.473kh/s, pool 2.423kh/s a:8057 r:0 hw:39
[2019-06-10 19:40:27] GPU 2 [57C, fan 86%] cnr: 2.471kh/s, avg 2.471kh/s, pool 2.337kh/s a:7768 r:0 hw:106
[2019-06-10 19:40:27] GPU 3 [63C, fan 87%] cnr: 2.464kh/s, avg 2.464kh/s, pool 2.381kh/s a:7917 r:0 hw:73
[2019-06-10 19:40:27] GPU 4 [55C, fan 88%] cnr: 2.470kh/s, avg 2.468kh/s, pool 2.259kh/s a:7519 r:0 hw:321
[2019-06-10 19:40:27] Total cnr: 12.35kh/s, avg 12.35kh/s, pool 11.74kh/s a:39041 r:0 hw:556
[2019-06-10 19:40:39] Pool pool.supportxmr.com received new job. (job_id: +kOSIEF95a5dlkxX6slHR0EW+l34)

I know, I'm pushing hard on limit, but what changed in TR miner 0.4.5 that causes this instability? With 0.4.4 and older was this rig rock stable. OS is Linux & amd18.3 drivers.

Thank you for answer,

Migo

Hi!

Man, it's such a hard question to answer. The changes between 0.4.4 and 0.4.5 are really tiny, and nothing that "should" affect anything in terms of stability. For cn/r, absolutely nothing of interest was touched in the kernels, and not anything specific in the host-side code either. For every release, we get a few people telling us how stable things are with the new version, then a others that (like you) unfortunately have a harder time keeping things running smoothly.

Since you're running linux, do you see anything interesting in your "dmesg" output from the kernel when a crash occurs?

-- K

Hi, thank you for your answer! I'm sorry I've not looked into dmesg. I'll stop 0.4.4 and run 0.5.1 again to look at dmesg. Can I provide some more info after crash?

0.4.4 run nicely from last 0.5.1 experiment yesterday:

[2019-06-11 18:16:28] Stats Uptime: 0 days, 20:38:06
[2019-06-11 18:16:28] GPU 0 [59C, fan 87%] cnr: 2.470kh/s, avg 2.470kh/s, pool 2.443kh/s a:512 r:0 hw:2
[2019-06-11 18:16:28] GPU 1 [60C, fan 84%] cnr: 2.470kh/s, avg 2.475kh/s, pool 2.537kh/s a:532 r:0 hw:0
[2019-06-11 18:16:28] GPU 2 [56C, fan 85%] cnr: 2.468kh/s, avg 2.471kh/s, pool 2.303kh/s a:482 r:0 hw:8
[2019-06-11 18:16:28] GPU 3 [63C, fan 87%] cnr: 2.461kh/s, avg 2.464kh/s, pool 2.401kh/s a:503 r:0 hw:7
[2019-06-11 18:16:28] GPU 4 [55C, fan 88%] cnr: 2.464kh/s, avg 2.468kh/s, pool 2.314kh/s a:485 r:0 hw:26
[2019-06-11 18:16:28] Total cnr: 12.33kh/s, avg 12.35kh/s, pool 12.00kh/s a:2514 r:0 hw:43
[2019-06-11 18:16:30] Pool pool.supportxmr.com received new job. (job_id: BI3HJirVchNMe6LpNGRZuX5bez1a)

Now I'm on 0.5.1 for debuging:

Team Red Miner version 0.5.1
[2019-06-11 18:18:19] Auto-detected AMD OpenCL platform 0
[2019-06-11 18:18:20] Initializing GPU 0.
[2019-06-11 18:18:21] Initializing GPU 1.
[2019-06-11 18:18:22] Initializing GPU 2.
[2019-06-11 18:18:23] Initializing GPU 3.
[2019-06-11 18:18:24] Initializing GPU 4.
[2019-06-11 18:18:25] Watchdog thread starting.
[2019-06-11 18:18:25] Runtime Command Keys: h - help, s - stats, e - enable gpu, d - disable gpu, t - tuning mode, q - quit
[2019-06-11 18:18:25] API initialized on 127.0.0.1:4028
[2019-06-11 18:18:25] Successfully initialized GPU 0: Vega with 64 CU (PCIe 03:00.0) (CN 16*14:CAA)
[2019-06-11 18:18:25] Successfully initialized GPU 1: Vega with 64 CU (PCIe 08:00.0) (CN 16*14:CAA)
[2019-06-11 18:18:25] Successfully initialized GPU 2: Vega with 64 CU (PCIe 0b:00.0) (CN 16*14:CAA)
[2019-06-11 18:18:25] Successfully initialized GPU 3: Vega with 64 CU (PCIe 0e:00.0) (CN 16*14:CAA)
[2019-06-11 18:18:25] Successfully initialized GPU 4: Vega with 64 CU (PCIe 11:00.0) (CN 16*14:CAA)

Thank you,

Migo

lupaarSen

newbie

Activity: 17

Merit: 0

Quote from: lupaarSen on June 11, 2019, 11:01:37 AM

Quote from: Rakly3 on June 11, 2019, 10:32:33 AM

Quote from: SamAlackass on June 11, 2019, 06:24:23 AM

Thank you for the link! I can't believe it never came up before.

My pleasure

Quote from: lupaarSen on June 11, 2019, 09:45:55 AM

I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S

Man... you all use so much power on your vegas.
I have 1950h/s - 1990h/s with both the core and mem at 880mV (stock timings still, I don't have them that long yet. Still learning Vega

))
The core powerdraw is avg 117W, with dips to 95W and spikes to 130W (Eth & CNr)
The mem i can't get an accurate reading, not even with HWinfo. It is stuck at 1.2V without budging once, but changing the mV does have impact on stability and hashrate, so I'm sure it's not 1.2V)

I ran several autotunes and used those configs
ilovetrm

As a sidenote. the list devices batfile recognizes all my 4x0 and 5x0 as 580's.
Dunno if that's normal?
And with autotune, my 4gb cards are faster than my 8gb cards.
It's a mishmash of brands and memory brands tho.(no vega's in these rigs just fyi) (the MSI armor 470 8Gb Micron is just abysmal. Barely breaking 700h/s)

https://ibb.co/vc2qXPj

Not changing the fact i get bewteen 100-200H/s more...

Muuum i'm thinking that i didn't use timings before and 56 pulse is a nano PCB; do you think timings can screw up all of this? i'm using newly Minerstat OS exp. (18.04-19.10) flashing the stable version now...if it's not good ill back to *+* with stock timings and report here...antoher question i set up my mem voltage to 875-900-950 for a mem clock of 950 any advise?

Rakly3

newbie

Activity: 45

Merit: 0

Quote from: Rakly3 on June 10, 2019, 12:48:23 PM

Quote from: Duck Hellen on June 10, 2019, 12:38:08 PM

I managed to solve today "one card crashing issue" on windows 10 Looks like compute mode switch is broken either by script and by swiching it manually in amd "control center" give me same result. So too gets cards too work i need to be in "graphic mode"... But in registry editor :

Computer\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Class\{4d36e968-e325-11ce-bfc1-08002be10318}

You can see your cards 0001 0002 etc... and there is "blokchain support setting" or something like that " with "0 value" I think that it is Huh

new name of "compute mode" after swichng it to "1 value" everefing still works fine

im on 19.6.1 driver

I just uninstalled all the AMD software and kept the drivers + a tool that can enable/disable compute mode. Card behaves now.
IMO it's wattman causing much of system instabilities. Even if you never 'turned it on', it is still being used.

CRAP!
It happened again!
Guess uninstalling the AMD software didn't fix diddly squat.
drivers 19.4.1
TRM versions 4.5 and up. Actually, I came here for the 5.x version yesterday to try and fix the problem.
Meanwhile I changed my other rigs to 5.1 too for the autotune, but I only have this problem on my vega rig.

Quote from: lupaarSen on June 11, 2019, 11:01:37 AM

https://ibb.co/vc2qXPj

Not changing the fact i get bewteen 100-200H/s more...

Sweet! Don't worry, I'll get there Wink

lupaarSen

newbie

Activity: 17

Merit: 0

Quote from: Rakly3 on June 11, 2019, 10:32:33 AM

Quote from: SamAlackass on June 11, 2019, 06:24:23 AM

Thank you for the link! I can't believe it never came up before.

My pleasure

Quote from: lupaarSen on June 11, 2019, 09:45:55 AM

I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S

Man... you all use so much power on your vegas.
I have 1950h/s - 1990h/s with both the core and mem at 880mV (stock timings still, I don't have them that long yet. Still learning Vega

))
The core powerdraw is avg 117W, with dips to 95W and spikes to 130W (Eth & CNr)
The mem i can't get an accurate reading, not even with HWinfo. It is stuck at 1.2V without budging once, but changing the mV does have impact on stability and hashrate, so I'm sure it's not 1.2V)

I ran several autotunes and used those configs
ilovetrm

As a sidenote. the list devices batfile recognizes all my 4x0 and 5x0 as 580's.
Dunno if that's normal?
And with autotune, my 4gb cards are faster than my 8gb cards.
It's a mishmash of brands and memory brands tho.(no vega's in these rigs just fyi) (the MSI armor 470 8Gb Micron is just abysmal. Barely breaking 700h/s)

https://ibb.co/vc2qXPj

Not changing the fact i get bewteen 100-200H/s more...

kerney666

member

Activity: 658

Merit: 86

Quote from: lupaarSen on June 11, 2019, 09:45:55 AM

I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S

Wow, I think you're the first one to report a clearly degraded hashrate going from 0.4.x to 0.5.0! So, am I understanding you correctly in that you had 2100 h/s with the same clocks and timings with TRM 0.4.x, but running 0.5.0 or 0.5.1 only gives you 1950 h/s?

Do you know the CN config you used for 2100 h/s in previous versions? Also, are you 100% certain the mem timings did stick? Your hashrates are very close to what my Gigabyte V56 Hynix gets at stock timings vs modded timings.

Rakly3

newbie

Activity: 45

Merit: 0

Quote from: SamAlackass on June 11, 2019, 06:24:23 AM

Thank you for the link! I can't believe it never came up before.

My pleasure

Quote from: lupaarSen on June 11, 2019, 09:45:55 AM

I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S

Man... you all use so much power on your vegas.
I have 1950h/s - 1990h/s with both the core and mem at 880mV (stock timings still, I don't have them that long yet. Still learning Vega

))
The core powerdraw is avg 117W, with dips to 95W and spikes to 130W (Eth & CNr)
The mem i can't get an accurate reading, not even with HWinfo. It is stuck at 1.2V without budging once, but changing the mV does have impact on stability and hashrate, so I'm sure it's not 1.2V)

I ran several autotunes and used those configs
ilovetrm

As a sidenote. the list devices batfile recognizes all my 4x0 and 5x0 as 580's.
Dunno if that's normal?
And with autotune, my 4gb cards are faster than my 8gb cards.
It's a mishmash of brands and memory brands tho.(no vega's in these rigs just fyi) (the MSI armor 470 8Gb Micron is just abysmal. Barely breaking 700h/s)

lupaarSen

newbie

Activity: 17

Merit: 0

I cant get more than 1950 H/s Pulse 56 Stock Bios Samsung memory /Lucky Timing from config file (56-Hynix) @ 1407/905 Mem:950 13*13:AAA
That suck really.... ideas?

Before 0.5.0 = 2100H/S

kerney666

member

Activity: 658

Merit: 86

Quote from: migo77 on June 11, 2019, 08:52:14 AM

2 kerney666

Hi, my rig can't run longer then 1-2 hours on v 0.4.5 and newer. One GPU hangs (this is from 0.5.1 after autoconfig):

[2019-06-10 21:22:55] GPU 0 [57C, fan 87%] cnr: 2.486kh/s, avg 2.477kh/s, pool 2.709kh/s a:46 r:0 hw:0
[2019-06-10 21:22:55] GPU 1 [58C, fan 85%] cnr: 2.490kh/s, avg 2.482kh/s, pool 1.261kh/s a:22 r:0 hw:0
[2019-06-10 21:22:55] GPU 2 [56C, fan 86%] cnr: 2.486kh/s, avg 2.478kh/s, pool 2.969kh/s a:51 r:0 hw:1
[2019-06-10 21:22:55] GPU 3 [63C, fan 87%] cnr: 2.480kh/s, avg 2.468kh/s, pool 2.377kh/s a:42 r:0 hw:0
[2019-06-10 21:22:55] GPU 4 [45C, fan 90%] cnr: 2.483kh/s, avg 2.471kh/s, pool 2.737kh/s a:47 r:0 hw:3
[2019-06-10 21:22:55] Total cnr: 12.42kh/s, avg 12.38kh/s, pool 12.05kh/s a:208 r:0 hw:4
[2019-06-10 21:23:05] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

but v0.4.4 with exactly same config can run for weeks:

[2019-06-10 19:40:27] Stats Uptime: 13 days, 12:58:15
[2019-06-10 19:40:27] GPU 0 [59C, fan 87%] cnr: 2.471kh/s, avg 2.470kh/s, pool 2.340kh/s a:7780 r:0 hw:17
[2019-06-10 19:40:27] GPU 1 [60C, fan 85%] cnr: 2.474kh/s, avg 2.473kh/s, pool 2.423kh/s a:8057 r:0 hw:39
[2019-06-10 19:40:27] GPU 2 [57C, fan 86%] cnr: 2.471kh/s, avg 2.471kh/s, pool 2.337kh/s a:7768 r:0 hw:106
[2019-06-10 19:40:27] GPU 3 [63C, fan 87%] cnr: 2.464kh/s, avg 2.464kh/s, pool 2.381kh/s a:7917 r:0 hw:73
[2019-06-10 19:40:27] GPU 4 [55C, fan 88%] cnr: 2.470kh/s, avg 2.468kh/s, pool 2.259kh/s a:7519 r:0 hw:321
[2019-06-10 19:40:27] Total cnr: 12.35kh/s, avg 12.35kh/s, pool 11.74kh/s a:39041 r:0 hw:556
[2019-06-10 19:40:39] Pool pool.supportxmr.com received new job. (job_id: +kOSIEF95a5dlkxX6slHR0EW+l34)

I know, I'm pushing hard on limit, but what changed in TR miner 0.4.5 that causes this instability? With 0.4.4 and older was this rig rock stable. OS is Linux & amd18.3 drivers.

Thank you for answer,

Migo

Hi!

Man, it's such a hard question to answer. The changes between 0.4.4 and 0.4.5 are really tiny, and nothing that "should" affect anything in terms of stability. For cn/r, absolutely nothing of interest was touched in the kernels, and not anything specific in the host-side code either. For every release, we get a few people telling us how stable things are with the new version, then a others that (like you) unfortunately have a harder time keeping things running smoothly.

Since you're running linux, do you see anything interesting in your "dmesg" output from the kernel when a crash occurs?

-- K

Iamtutut

full member

Activity: 1120

Merit: 131

Quote from: Iamtutut on June 06, 2019, 05:42:48 AM

Very impressive upgrade, a few extra hundred hashes with Turtlecoin and very good implementation of CN Heavy algos.

Excellent work, kudos to the devs.

The tuning mode is amazing, now on a few algos I have the same hashrate than when I had bios mods on my GPUs. To achieve such speeds, I use the AMD memory timings tool.

migo77

newbie

Activity: 23

Merit: 1

2 kerney666

Hi, my rig can't run longer then 1-2 hours on v 0.4.5 and newer. One GPU hangs (this is from 0.5.1 after autoconfig):

[2019-06-10 21:22:55] GPU 0 [57C, fan 87%] cnr: 2.486kh/s, avg 2.477kh/s, pool 2.709kh/s a:46 r:0 hw:0
[2019-06-10 21:22:55] GPU 1 [58C, fan 85%] cnr: 2.490kh/s, avg 2.482kh/s, pool 1.261kh/s a:22 r:0 hw:0
[2019-06-10 21:22:55] GPU 2 [56C, fan 86%] cnr: 2.486kh/s, avg 2.478kh/s, pool 2.969kh/s a:51 r:0 hw:1
[2019-06-10 21:22:55] GPU 3 [63C, fan 87%] cnr: 2.480kh/s, avg 2.468kh/s, pool 2.377kh/s a:42 r:0 hw:0
[2019-06-10 21:22:55] GPU 4 [45C, fan 90%] cnr: 2.483kh/s, avg 2.471kh/s, pool 2.737kh/s a:47 r:0 hw:3
[2019-06-10 21:22:55] Total cnr: 12.42kh/s, avg 12.38kh/s, pool 12.05kh/s a:208 r:0 hw:4
[2019-06-10 21:23:05] GPU 4: detected DEAD (11:00.0), will execute restart script watchdog.sh

but v0.4.4 with exactly same config can run for weeks:

[2019-06-10 19:40:27] Stats Uptime: 13 days, 12:58:15
[2019-06-10 19:40:27] GPU 0 [59C, fan 87%] cnr: 2.471kh/s, avg 2.470kh/s, pool 2.340kh/s a:7780 r:0 hw:17
[2019-06-10 19:40:27] GPU 1 [60C, fan 85%] cnr: 2.474kh/s, avg 2.473kh/s, pool 2.423kh/s a:8057 r:0 hw:39
[2019-06-10 19:40:27] GPU 2 [57C, fan 86%] cnr: 2.471kh/s, avg 2.471kh/s, pool 2.337kh/s a:7768 r:0 hw:106
[2019-06-10 19:40:27] GPU 3 [63C, fan 87%] cnr: 2.464kh/s, avg 2.464kh/s, pool 2.381kh/s a:7917 r:0 hw:73
[2019-06-10 19:40:27] GPU 4 [55C, fan 88%] cnr: 2.470kh/s, avg 2.468kh/s, pool 2.259kh/s a:7519 r:0 hw:321
[2019-06-10 19:40:27] Total cnr: 12.35kh/s, avg 12.35kh/s, pool 11.74kh/s a:39041 r:0 hw:556
[2019-06-10 19:40:39] Pool pool.supportxmr.com received new job. (job_id: +kOSIEF95a5dlkxX6slHR0EW+l34)

I know, I'm pushing hard on limit, but what changed in TR miner 0.4.5 that causes this instability? With 0.4.4 and older was this rig rock stable. OS is Linux & amd18.3 drivers.

Thank you for answer,

Migo

Topic: [ANN] TeamRedMiner v0.10.10 - Ironfish/Kaspa/ZIL/Kawpow/Etchash and More - page 71. (Read 211986 times)