Author

Topic: [ANN] TeamRedMiner v0.10.10 - Ironfish/Kaspa/ZIL/Kawpow/Etchash and More - page 141. (Read 211432 times)

member
Activity: 658
Merit: 86
Hey pbfarmer! Thank you for the elaborate tests and detailed descriptions, much appreciated!

When running at lower clocks around 1407, the 15+15 configuration have been clearly superior to 16+14 in our tests as well, so your results are well aligned with our own testing.

No problem - more to come.  Good to know i'm in line.  At what point (effective clock) do you see 16+14 start to make sense?


It has varied a little with other parameters, I think a general ballpark nr would be around 1475-1500 cclk. I have data for a range of tests, but some things in the kernels have changed since then so I would need to validate the data again.
newbie
Activity: 47
Merit: 0
I'm looking at some of my stats being show in the tool.   I have a rig with 9 GPUs and I see that the number of results for each GPU varies quite a bit.  I'm at 300 results and one of the 9 GPUs shows 46/0 while the lowest shows 23/0.  The cn v8 hashrate avg is 2.178khs and 2.102 khs so they are pretty close in speed which leads me to think the lowest GPU isnt getting enough work.

Are the jobs balanced across the GPUs evenly or is there some other mechanism?

Also, how long should it take before the reported pool speed matches to what I see in supportXMR or MoneroOcean? I know the dev fee will take a small amount of the totals but I am seeing a pretty large difference between what the GPU average is and the pool rate.  Almost 1khs.  There arent any reported failures.
member
Activity: 658
Merit: 86
Another question.

How are the GPU|N calculated? Based on PCI or some other method? From what i can see, GPU|N corresponds to the same way, they are shown in the AMD Driver panel. GPU0 is GPU0, and so on, and thus meaning, it is probably NOT PCI related.

Two options: if you don't provide -d x,y,z on the command line, we enumerate the devices in the order provided by the OpenCL driver. This means it will match some tools, but not others. If you do provide -d x,y,z, those numbers still refer to the OpenCL order, i.e. you're picking indeces from 0..N-1 from the order that appears when starting the miner without -d. We'll then reorder and set GPU0=x, GPU1=y, GPU2=z.

The OpenCL order is also the same order as displayed by the "clinfo" command which dumps a shitload of data about your cards, available in any OpenCL environment.

For automatic reordering, we don't have a bus reordering option right now unf, it's on the TODO list. For mapping against other tools, we do display the PCIe bus id on startup in the "Successfully initialized..." lines, and also provide it over the API in the "devdetails" command.  For example:


Client request:
{"command":"devdetails"}

Miner response:
{
  "STATUS": [
    {
      "STATUS": "S",
      "When": 1541255118,
      "Code": 69,
      "Msg": "Device Details",
      "Description": "TeamRedMiner 0.3.6"
    }
  ],
  "DEVDETAILS": [
    {
      "DEVDETAILS": 0,
      "Name": "GPU",
      "ID": 0,
      "Driver": "opencl",
      "Kernel": "cnv8",
      "Model": "AMD Radeon (TM) RX 580",
      "Device Path": "0c:00:0"
    },
    {
      "DEVDETAILS": 1,
      "Name": "GPU",
      "ID": 1,
      "Driver": "opencl",
      "Kernel": "cnv8",
      "Model": "Radeon RX Vega",
      "Device Path": "0b:00:0"
    }
  ],
  "id": 1
}


For reordering, as you can see my two cards on this ws are not in bus id order in the OpenCL enumeration. Given that information, I can manually reorder using -d 1,0 when starting the miner, and my Vega would be GPU0 and the 580 GPU1.

Of course it's annoying to manually reorder, and we'll add a reordering option soon, but at least it is possible until then.
newbie
Activity: 156
Merit: 0
cpu usage more then other miners, some work offloaded to cpu from gpu  Wink

miner speed great btw
member
Activity: 190
Merit: 59
Here are my experiences and collection of facts with TeamRedMiner (in rest of the text: TMR) and Vega cards so far.

I have 7 rigs with Octominer riserless motherboard, 1200HP server PSU, each rig having  7x Vega 64 or flashed 56, using power tables from vega mining guide.

1. Compared to JCE Miner and SRB miner, I have increased my efficiency from around 9H/W to over 10H/W (10.50 is my rigs average i believe), mining fee included. I did not log all the data and make exact calculations, but the efficiency of TMR is at least 15% better than previously best miners. If you don't care about efficiency, you can reach incredible speeds with Vega cards, and dev fee is well worth of it.

2. TMR causes power usage spikes. JCE and SRB miner mine in a "smooth sailing" way, where the power consumption is constant. Do not tune TMR at the edge of your PSU limit, leave some headroom. These are not big spikes, but they are there and they can crash your rig or burn your PSU.

3. If your rig is stable with srb or jce at certain frequency and voltage, it doesn't mean they will be stable with TMR. 3 out of my 7 rigs simply swapped from SRB to TMR, and work stable ever since, i didn't even put yet new TMR versions on them. However other rigs had many stability issues.

4. Unstable rigs now have 0.3.6 version and a mix of 18.6.1. and 18.10.1 drivers. They were shutting down for all kinds of reasons, and even corrupt the drivers and cards disappearing from the system. Now they work ok with 0.3.6 but i can't overclock them. This is already issue with binning of the cards and luck, nothing with the TMR.

5. It took me approximately 2 days of partial downtime and 1 burned PSU to get TMR running stable. This is because I was chasing maximum performance instead of efficient running, I was even running my 1200W psu at more than 1400W because I was excited with incredible hashes that TMR can spit out at high core frequencies. Don't be me, respect your hardware. Still, this will quickly pay off through increased hashes and reduced power usage (it will take me around 20 days to pay downtime and burned PSU, not so bad!)

6. 16+14 config always gave me best hashes per watt, no matter the speed of GPU, however, I did drop to 15+14 on some of the rigs during troubleshooting as I suspected that rigs with very slow CPU have issues with 16+14 for some reason. It is up to you to find your setup that works for you.

TMR is hands down the fastest Vega CN8 out there, that negates increased power consumption that CN8 brought to us. It is well worth to try. Dev fee of 2.5 % may seem high but it is worth the price, the devs are active all the time and try to help in every way.

There are 2 things that I would like to see.
1. Miner uptime. Maybe on some of the lines the information about miner uptime can be squeezed into.
2. Watchdog. Once i got my rigs stable, i didn't experience single GPU dropout. however, sometimes it happens that miner starts and initilazies all the cards, but hashes are 0. Some simple watchdog would be very useful for such occasion.
member
Activity: 413
Merit: 17
Any chance to add support to other CN variants? 8GB cards are great with 4MB scratchpad algorithms.
newbie
Activity: 50
Merit: 0
Windows 7 ?
does the miner work?
jr. member
Activity: 194
Merit: 4
Another question.

How are the GPU|N calculated? Based on PCI or some other method? From what i can see, GPU|N corresponds to the same way, they are shown in the AMD Driver panel. GPU0 is GPU0, and so on, and thus meaning, it is probably NOT PCI related.
full member
Activity: 729
Merit: 114

Thanks for the answer.

Somethign that came to my mind after reading your answer.

The reason, why the power consumption is lower, is it because there is no CPU verification? 20-40W difference is big, and i doubt it, but does not hurt to ask Smiley

Other miners on cnv2 can also reduce power consumption footprint. Try worksize 16 and 32 you'll notice power consumption will reduce but so do the hashrate.  TRM has achieved a balance there.

Right now whenever a card is hung the miner still goes on with other cards.  This might be desirable in some cases but it would also be nice to have an option to crash the miner itself.
member
Activity: 340
Merit: 29
Hey pbfarmer! Thank you for the elaborate tests and detailed descriptions, much appreciated!

When running at lower clocks around 1407, the 15+15 configuration have been clearly superior to 16+14 in our tests as well, so your results are well aligned with our own testing.

No problem - more to come.  Good to know i'm in line.  At what point (effective clock) do you see 16+14 start to make sense?

Any possibility of incorporating a simple HTTP/REST report mechanism in addition to the cgminer rpc api (like stak, cast, srb, jce.)  It could just dump the current rpc api summary json, and it would be much more useful for quick setup/tuning, esp if you're only incorporating summary reports and not miner controls.

My plan is rather to implement a separate project that we open source, a little http adapter in C++/node.js/python/whatever that converts the cgminer/sgminer api to an xmr stak-like HTTP/REST api.

Exactly what I was thinking - might take a look around to see if there's already something like this out there.  And great to know you're on top of the other issues.  Keep up the great work!
jr. member
Activity: 194
Merit: 4

A clarification for the Power Usage though - i observed it, and although lower, it would still spike to ~20W higher than the average. Not sure if you have observed this.


It's an unfortunate side effect of the algo and the efficiency level of this miner in the various stages of CN. I'm 100% sure we will solve it in a good way in the end though. See I reply I wrote ~30 mins ago here: https://bitcointalksearch.org/topic/m.47528064 for more details.



Ops, sure did miss that.

I may have missed the answer to my next question:

Since the 570 and 580s have 32/36 CU's, why they both run at 7-7/7+7 ? None of my cards liked 8-8/8+8, while i see some people have run them at 8-8/8+8.

And what exactly does the CN_config do? The miner API reports
Code:
Intensity: 20
regardless of the CN_config values.

Ah, we should integrate the CN config better into the API.

The numbers are only about the nr of pads per thread, so directly related to mem consumed. The +- is a little mode tweak that sometimes does nothing and other times something.

If your cards are 4GB, 8+8 would consume everything, and it won't work. 7+7 or 8+7 should be doable depending on how much the driver has grabbed on the card(s) already.




I assumed that it would be like that (4GB cards here), but was not sure. Using Windows, and the Windows driver limitation for the max VRAM allocation should be ~3584MB, and 8+7 sounds like it would use more than that... but it works Cheesy
member
Activity: 658
Merit: 86

A clarification for the Power Usage though - i observed it, and although lower, it would still spike to ~20W higher than the average. Not sure if you have observed this.


It's an unfortunate side effect of the algo and the efficiency level of this miner in the various stages of CN. I'm 100% sure we will solve it in a good way in the end though. See I reply I wrote ~30 mins ago here: https://bitcointalksearch.org/topic/m.47528064 for more details.



Ops, sure did miss that.

I may have missed the answer to my next question:

Since the 570 and 580s have 32/36 CU's, why they both run at 7-7/7+7 ? None of my cards liked 8-8/8+8, while i see some people have run them at 8-8/8+8.

And what exactly does the CN_config do? The miner API reports
Code:
Intensity: 20
regardless of the CN_config values.

Ah, we should integrate the CN config better into the API.

The numbers are only about the nr of pads per thread, so directly related to mem consumed. The +- is a little mode tweak that sometimes does nothing and other times something.

If your cards are 4GB, 8+8 would consume everything, and it won't work. 7+7 or 8+7 should be doable depending on how much the driver has grabbed on the card(s) already.


jr. member
Activity: 194
Merit: 4

A clarification for the Power Usage though - i observed it, and although lower, it would still spike to ~20W higher than the average. Not sure if you have observed this.


It's an unfortunate side effect of the algo and the efficiency level of this miner in the various stages of CN. I'm 100% sure we will solve it in a good way in the end though. See I reply I wrote ~30 mins ago here: https://bitcointalksearch.org/topic/m.47528064 for more details.



Ops, sure did miss that.

I may have missed the answer to my next question:

Since the 570 and 580s have 32/36 CU's, why they both run at 7-7/7+7 ? None of my cards liked 8-8/8+8, while i see some people have run them at 8-8/8+8.

And what exactly does the CN_config do? The miner API reports
Code:
Intensity: 20
regardless of the CN_config values.
member
Activity: 658
Merit: 86

A clarification for the Power Usage though - i observed it, and although lower, it would still spike to ~20W higher than the average. Not sure if you have observed this.


It's an unfortunate side effect of the algo and the efficiency level of this miner in the various stages of CN. I'm 100% sure we will solve it in a good way in the end though. See I reply I wrote ~30 mins ago here: https://bitcointalksearch.org/topic/m.47528064 for more details.

jr. member
Activity: 194
Merit: 4
Somethign that came to my mind after reading your answer.

The reason, why the power consumption is lower, is it because there is no CPU verification? 20-40W difference is big, and i doubt it, but does not hurt to ask Smiley
ff, but it works. I left only the most relevant data. Replace again IP and PORT with the ones you need.

Ha! That would def be cheating, wouldn't it? No, I can assure you that the lower power draw is from the continuous gpu work. After all, cpu verifications are for found shares only, so depending on your pool diff you maybe crunch one CN hash on the cpu every 10 secs (max)? It's tiny work in the grand scheme of things. Love the question though!



A clarification for the Power Usage though - i observed it, and although lower, it would still spike to ~20W higher than the average. Not sure if you have observed this.
member
Activity: 658
Merit: 86
Somethign that came to my mind after reading your answer.

The reason, why the power consumption is lower, is it because there is no CPU verification? 20-40W difference is big, and i doubt it, but does not hurt to ask Smiley
ff, but it works. I left only the most relevant data. Replace again IP and PORT with the ones you need.

Ha! That would def be cheating, wouldn't it? No, I can assure you that the lower power draw is from the continuous gpu work. After all, cpu verifications are for found shares only, so depending on your pool diff you maybe crunch one CN hash on the cpu every 10 secs (max)? It's tiny work in the grand scheme of things. Love the question though!

jr. member
Activity: 194
Merit: 4

I checked with them, seems like you are correct, they have not changed the error message. The rejected shares are due to crappy card quality, not playing nice with the strap. Another thing that i seem to notice is, XMR-Stak was discarding Invalid shares, but logging them regardless, which were equivalent to Hardware Error (HW). But on this fork, instead of being HW, its a Rejected share. I have used the sgminer-gm-5.5.5, and it was doing HWs, instead of rejected shares. Pretty confusing.

I will be checking for the 30/60s stuff for the API.

Thanks for reporting back! In the current version our miner doesn't have cpu verification of the hashes, probably the only one that doesn't. We just send everything to the pool and let them check the shares. We're just about to add the cpu verification parts though, so when that's in place we will be reporting bad shares as "HW errors" as well and not piss of the pool(s) when you push your mem straps too hard Smiley.



Thanks for the answer.

Somethign that came to my mind after reading your answer.

The reason, why the power consumption is lower, is it because there is no CPU verification? 20-40W difference is big, and i doubt it, but does not hurt to ask Smiley





Also, if anyone wants to pull API information remotely, and see what the miner reports, you can do with the following:


Be sure to have started minning with  --api_listen=IP:PORT, default port was/is 4028

Code:
echo -n "gpu|N" | nc IP PORT > log.txt
- no colons, space between the IP and PORT. N = GPU number, [0-9]

The pulled data is not in good format, so you need to edit it.

I have made a script for this:

Code:
#!/bin/bash

echo -n "gpucount" | nc IP PORT > log.txt
GPU_COUNT=$(grep -oP '(?<=Count=)\w+' log.txt)

i=0

while [ $i -lt $GPU_COUNT ]
do
        echo -n "gpu|$i" /n | nc IP PORT /n >> log.txt
        true $(( i++ ))
done


sed -i 's/STATUS/\n&/g' log.txt
sed -i '1,2d' log.txt
sed -i 's/^.*GPU=/GPU=/' log.txt
sed -i 's/Temperature=0.00.*KHS//' log.txt
sed -i 's/Utility=[0-9].*Rejected%=[0-9]\.[0-9]//' log.txt
echo -e  >> log.txt

Dirty stuff, but it works. I left only the most relevant data. Replace again IP and PORT with the ones you need.
member
Activity: 658
Merit: 86
Hey pbfarmer! Thank you for the elaborate tests and detailed descriptions, much appreciated!

When running at lower clocks around 1407, the 15+15 configuration have been clearly superior to 16+14 in our tests as well, so your results are well aligned with our own testing.

Replying to the noted concerns/feedback below:

Some sort of 'resource release' process at shutdown would be useful.  It seemed if the miner was started too soon after it was stopped, the entire machine froze up.  Also, in general, the crash behavior of this miner is much less forgiving than others - most crashes meant a full reboot.

Absolutely agree. We're sloppy at shutdown, and it will be addressed shortly. We need to add proper signal handlers that works for both Linux and Windows and catch the ctrl-c/sighup signals, then do a proper release of all OpenCL resources.

This may just be the cost of mining cnv2, but power transients are huge.  On other miners, i saw regular 30-40W spikes from the median (w/ similar drops,) but for TR, i'm seeing 70W+ spikes, causing your mean and median to significantly diverge.  Specifically, while the observed median (2 GPUs, excluding idle) was around 285-290W, regular spikes up to 360W+ resulted in a mean draw around 310W.  Any way to get these down?  I could see these causing stability issues or tripped circuit protections for some people.

This is a very good point, and we've noted it ourselves, both when designing the kernels and by direct at-the-wall measurements. The issue stems from this miner requiring less power in the long-running main part of CN compared to others, but it also goes full throttle in other parts, requiring more power. Hence, the min-to-max power swings are amplified at both ends. Any CN miner with the same profile as this one that executes the algo in the most straightforward way would exhibit this problem.

We're also seeing a notable difference in stability on the Vega 64s vs 56s. I believe this is part of the problem. The fewer CUs means there will be additional pressure on 8-16 of the 56 CUs compared to the 64s as these spikes occur, a little bit depending on your CN config though.

There are a few ways of addressing this, and if we want to achieve max stability I believe we need to solve it. I have a very good design for solving it, but it's a big redesign and rewrite. We will get there at some point. Meanwhile, we're debating simpler forms of reducing the effect of the spikes, like cutting the worst case scenario in half. We'll go to work shortly on this.

Any possibility of incorporating a simple HTTP/REST report mechanism in addition to the cgminer rpc api (like stak, cast, srb, jce.)  It could just dump the current rpc api summary json, and it would be much more useful for quick setup/tuning, esp if you're only incorporating summary reports and not miner controls.

Yep, we're aware that the cgminer/sgminer api isn't really the CN standard, and we have some plans. We've worked hard to keep the miner itself free from any open source dependencies, we've written every single line of code from scratch. For example, that's we're missing on-cpu verification for CN in these first versions, we refuse to steal any code from xmrig, xmr-stak or even the XMR wallet. We'd like to have it clean from attributions.

Given the above, we won't pull in e.g. lighttpd as a dependency in the miner. My plan is rather to implement a separate project that we open source, a little http adapter in C++/node.js/python/whatever that converts the cgminer/sgminer api to an xmr stak-like HTTP/REST api. It will also have the nice feature of working with any sgminer-derived miner, not just our miner. I don't get the point of these massive monolith miner implementations, it would be so much nicer if we would have separated these different concerns a long time ago in the miner dev world so everyone could focus on what really matterns, the kernels and mining process. You can also separate some of the watchdog aspects and place them outside of the miner. Never really a good idea to run a watchdog thread inside the same process it's supposed to monitor.

So, it's also on the TODO list, we'll see when we get there. Time is always the limiting factor Sad. If anyone in the community wants to get involved, give me a ping.

newbie
Activity: 168
Merit: 0
Is nicehash supported?  Huh

Sure, Nicehash works.

Like joseph32 writes, it is supported. However, I'm not a big fan of Nicehash in the CN world. We need to support it though since it's the same mechanism that proxies like xmrig-proxy use as well.

Modifying a reply I wrote earlier today in our other ann thread that explains why it's a bad idea imho:

NiceHash mining is a little problematic with CN. Calculating a round of hashes for CN takes > 1 sec, which is a very long processing time. For other hash functions numbers in a few ms are more common. So, CN has a much higher probability of your hashes being stale when the gpu job completes. The more often your pool/NiceHash sends out new jobs, the higher the probability that calculated shares belong to a stale job. For e.g. direct XMR mining (a coin with 2 min block time), pools are generally nice about accepting shares for both the current and the previous pool job as long as the new job wasn't sent out as a reaction to a new network block. The more common reason for a new job is just an ntime roll forward in time. In that case, the shares you submit for the previous job are still possible to convert into a network block, so pools should really accept them. With a 2 min block time, these type of new jobs will be sent out (on average) every 2 mins. Given this, we can expect to have maybe between 0.5-1% rejected shares over time. Some pools accept shares for stale jobs anyway to be nice.

For NiceHash, this just isn’t true. They can throw you around between client orders every 5 secs and generally seem to be more picky with stale shares. Using the same style of mining as for normal pools, you will see a much higher reject ratio. They have pushed the cost of the long CN calculation period for gpus onto miners. Clients only pay for accepted shares. CPU miners will be fine though. One approach to dealing with NiceHash is an abort mechanism, i.e. as soon as a new job comes in you abort any ongoing calculations and restart with the new job instead. However, this means that the last ~0.5 secs of gpu time (on avg) will be wasted for every job switch. The more often new jobs are sent out, the more gpu time you will waste, so it’s not 100% sure you’ll be better off with this approach anyway.

In the end, maybe NiceHash will give you a higher return anyway due to an implicit profit switching between CNv8 coins, but when using this miner in its current form, you will lose a few pcts of hashrate in rejected shares.




wow crystal
member
Activity: 658
Merit: 86
Is nicehash supported?  Huh

Sure, Nicehash works.

Like joseph32 writes, it is supported. However, I'm not a big fan of Nicehash in the CN world. We need to support it though since it's the same mechanism that proxies like xmrig-proxy use as well.

Modifying a reply I wrote earlier today in our other ann thread that explains why it's a bad idea imho:

NiceHash mining is a little problematic with CN. Calculating a round of hashes for CN takes > 1 sec, which is a very long processing time. For other hash functions numbers in a few ms are more common. So, CN has a much higher probability of your hashes being stale when the gpu job completes. The more often your pool/NiceHash sends out new jobs, the higher the probability that calculated shares belong to a stale job. For e.g. direct XMR mining (a coin with 2 min block time), pools are generally nice about accepting shares for both the current and the previous pool job as long as the new job wasn't sent out as a reaction to a new network block. The more common reason for a new job is just an ntime roll forward in time. In that case, the shares you submit for the previous job are still possible to convert into a network block, so pools should really accept them. With a 2 min block time, these type of new jobs will be sent out (on average) every 2 mins. Given this, we can expect to have maybe between 0.5-1% rejected shares over time. Some pools accept shares for stale jobs anyway to be nice.

For NiceHash, this just isn’t true. They can throw you around between client orders every 5 secs and generally seem to be more picky with stale shares. Using the same style of mining as for normal pools, you will see a much higher reject ratio. They have pushed the cost of the long CN calculation period for gpus onto miners. Clients only pay for accepted shares. CPU miners will be fine though. One approach to dealing with NiceHash is an abort mechanism, i.e. as soon as a new job comes in you abort any ongoing calculations and restart with the new job instead. However, this means that the last ~0.5 secs of gpu time (on avg) will be wasted for every job switch. The more often new jobs are sent out, the more gpu time you will waste, so it’s not 100% sure you’ll be better off with this approach anyway.

In the end, maybe NiceHash will give you a higher return anyway due to an implicit profit switching between CNv8 coins, but when using this miner in its current form, you will lose a few pcts of hashrate in rejected shares.


Jump to: