BAMT version 0.5 - Easy USB based mining Linux with farm wide management tools - page 66.

boozer

sr. member

Activity: 309

Merit: 250

Quote from: lodcrappo on March 06, 2012, 12:36:15 PM

It is possible, though I haven't heard of it. Anything is possible when you are locking up hardware. It is not a situation where there is much control.

When you add up the time lost in mining from people locking up their GPUs or rigs and the value of the time a person puts into overclocking, I think most people come away with a net loss. Had they just left everything stock, they would have been solid mining 24/7, probably more actual shares mined anyway, not to mention the time they would have saved. Don't even get me started on people overclocking for highest mhash number on the screen and not even looking at what they've done to their actual shares submitted rate, or the power per hash ratio.

Good points. I was able find stable OC's on my other rig very easily... not sure why this one is giving me so much grief, lol. I just wasnt checking the other OC's as gpu0 was always indicated as the one with the issue, now that I disregard that, I should have more luck I'm guessing. I just got stuck on that fact that gpu0 kept getting returned to stock.... now that i know that I should not always use that as an indicator, I bet I'm good to go as revisit my other OC's (once its run stock stable for awhile). Still going strong.

lodcrappo

hero member

Activity: 616

Merit: 506

Quote from: malevolent on March 06, 2012, 12:46:51 PM

Quote from: lodcrappo on March 06, 2012, 12:36:15 PM

Don't tell me you are running your GPUs at stock clocks Cheesy

No, but I don't have the trouble that most people seem to with keeping their cards mining. If I was losing mining time every day (or every month) due to overclocking, or had to spend more than 10 minutes on it in the life of a rig, I would consider it a loss.

We did some math in the IRC channel.. every 1 Mhash you squeeze out of your GPU is a whopping $1.25 PER YEAR. (at current diff and btc price). Not really worth putting much time into.

malevolent

legendary

Activity: 3472

Merit: 1724

Quote from: lodcrappo on March 06, 2012, 12:36:15 PM

Don't tell me you are running your GPUs at stock clocks Cheesy

lodcrappo

hero member

Activity: 616

Merit: 506

Quote from: boozer on March 06, 2012, 02:09:00 AM

Is there any possibility that a failure on "any" GPU gets reported as GPU0 issue, regardless which gpu it was? I removed the card i thought was bad... so only had 4 GPU's, but the new gpu0 again started having the same problem.

So I added the card I thought had issues back into the mix and set every card to stock and it has been running fine. So before, gpu0 running stock would die, but with all other gpus at stock, everything seems to be fine.... its only been 30 minutes, but thats longer than it used to last, lol. Just thought I would ask about the gpu0 theory, but maybe they both had problems.... I'll check in the AM to see if it stayed up the rest of the night.

It is possible, though I haven't heard of it. Anything is possible when you are locking up hardware. It is not a situation where there is much control.

When you add up the time lost in mining from people locking up their GPUs or rigs and the value of the time a person puts into overclocking, I think most people come away with a net loss. Had they just left everything stock, they would have been solid mining 24/7, probably more actual shares mined anyway, not to mention the time they would have saved. Don't even get me started on people overclocking for highest mhash number on the screen and not even looking at what they've done to their actual shares submitted rate, or the power per hash ratio.

boozer

sr. member

Activity: 309

Merit: 250

Quote from: DeathAndTaxes on March 06, 2012, 11:24:29 AM

If it is stable at stock and not at higher then it is simply excessive overclock. Lower memclock won't make it more stable but it does use less wattage. 300 is commonly used number but it is actually a very bad memclock. It is actually slower than both 280 and 310.

When you find clocks that are stable for 24 hours you likely aren't there yet. Eventually the rig will crash. When it does drop the core clock 5 Mhz to 10 Mhz on the affected GPU and reboot. One by one you will find the stable clocks. Now the rig may run 15 days or so and crash. You can either accept that or drop clocks another 5 Mhz or so. Eventually you will find the speed that runs 24/7 for 90+ days.

Thanks. The weird thing was that gpu0 always seemed to be the victim of the overclock (even if it was not OC'd), which was confusing me. I ran stock just fine, then OC'd... and cores 2-5 would appear to be fine as if it rebooted, it set gpu0 to noOC. So I set gpu0 and 1 to stock, but still rebooted, always adding gpu0 to the noOC ACTIVE directory, which made me think it was a bad card (since it looked to be causing reboots at stock clocks)... but everything has been stable so far (a lot longer than normal) with all cards at stock. So it seems that OC'ing 2-5 caused 0-1 to become unstable (even if 0-1 were running stock) and that was throwing me for a loop.

DeathAndTaxes

donator

Activity: 1218

Merit: 1079

Gerald Davis

Quote from: boozer on March 06, 2012, 11:09:20 AM

Its a wide open rig, similar to the one shown in the "build your own" hardware section of this forum. I think I ran everything everything at stock for 24 hours, but its been awhile, so I'll go back to that and set mem at 240 and see if stock is stable or not.

If it is stable at stock and not at higher then it is simply excessive overclock. Lower memclock won't make it more stable but it does use less wattage. 300 is commonly used number but it is actually a very bad memclock. It is actually slower than both 280 and 310.

When you find clocks that are stable for 24 hours you likely aren't there yet. Eventually the rig will crash. When it does drop the core clock 5 Mhz to 10 Mhz on the affected GPU and reboot. One by one you will find the stable clocks. Now the rig may run 15 days or so and crash. You can either accept that or drop clocks another 5 Mhz or so. Eventually you will find the speed that runs 24/7 for 90+ days.

boozer

sr. member

Activity: 309

Merit: 250

Quote from: DeathAndTaxes on March 06, 2012, 08:24:53 AM

If you are having multi-card failures start at stock.
Run everything at 725/240 (300 also works but it is actually a MH "valley") with 85% fan for 24 hours to check for stability.

From your descriptions it sounds like you have them in a closed case. Likely that isn't going to work w/ 3x5970s. I run 3x5970s in open frame with a Ultra Kaze fan placed at the expansion slot to create negative pressure and aid w/ exhaust. Even then temps are high. 5970s simply run hot and trying to run 3x in a case (no matter how good the case) is simply a recipe for failure.

Its a wide open rig, similar to the one shown in the "build your own" hardware section of this forum. I think I ran everything everything at stock for 24 hours, but its been awhile, so I'll go back to that and set mem at 240 and see if stock is stable or not.

jamesg

vip

Activity: 1358

Merit: 1000

AKA: gigavps

To all BAMT users:

If you are asking for support on the forums from lodcrappo, please help him out by sending a donation. It wasn't until very recently that I fully understood just how awesome BAMT is and how much easier it makes MY life.

0.5 has been FLAWLESS for me and I run 89 GPUs and 1 FPGA so if you are having problems, it is most likey NOT BAMT.

Please show lodcrappo your appreciation for his FREE software! Cheesy

Best,
gigavps

Intention

full member

Activity: 128

Merit: 100

Quote from: lodcrappo on March 05, 2012, 03:43:56 PM

turn mem_speed back on for 0 and 1. your card probably has a default speed higher than 300 in profile 1, if not 0. that will prevent you from setting 2 to 300. higher profile cannot have lower values than lower profile, basic rule of overclocking (some cards don't care, some do).

When I ran it with:

gpu0:

# remove disabled: or set it to 0 to actually use this card..

disabled: 0
debug_oc: 1
#core_speed_0: 800
#core_speed_1: 850
core_speed_2: 900

mem_speed_0: 300
mem_speed_1: 300
mem_speed_2: 300

#core_voltage_0: 1.125
#core_voltage_1: 1.125
#core_voltage_2: 1.125

It is still throwing errors...taking the you cannot have lower values on higher profiles into account I also attempted mem_speed_0:300 1:350 2:400 just to see what the card would do.
Right now I'm just running it stock but if it keeps being stubborn I might just put Windows or something on the USB stick since the stupid poorly designed mobo has the SATA ports blocked by the fan of a videocard...granted this was back when cards were only 1 slot.

I appreciate the suggestions.

DeathAndTaxes

donator

Activity: 1218

Merit: 1079

Gerald Davis

Quote from: boozer on March 06, 2012, 02:09:00 AM

Is there any possibility that a failure on "any" GPU gets reported as GPU0 issue, regardless which gpu it was? I removed the card i thought was bad... so only had 4 GPU's, but the new gpu0 again started having the same problem.

So I added the card I thought had issues back into the mix and set every card to stock and it has been running fine. So before, gpu0 running stock would die, but with all other gpus at stock, everything seems to be fine.... its only been 30 minutes, but thats longer than it used to last, lol. Just thought I would ask about the gpu0 theory, but maybe they both had problems.... I'll check in the AM to see if it stayed up the rest of the night.

If you are having multi-card failures start at stock.
Run everything at 725/240 (300 also works but it is actually a MH "valley") with 85% fan for 24 hours to check for stability.

From your descriptions it sounds like you have them in a closed case. Likely that isn't going to work w/ 3x5970s. I run 3x5970s in open frame with a Ultra Kaze fan placed at the expansion slot to create negative pressure and aid w/ exhaust. Even then temps are high. 5970s simply run hot and trying to run 3x in a case (no matter how good the case) is simply a recipe for failure.

Splirow

full member

Activity: 164

Merit: 100

Quote from: Definit on March 06, 2012, 02:20:49 AM

as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.

for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?

i cant even seem to get it to like those settings on 0.5...

even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...

doesnt make sense to me right now

I have Bamt with 3 5970. I have them at 825/300/1.05v getting 382 m/hash

It is perfectly stable for me. Last time I checked, i was pulling 930 watts from the wall.

malevolent

legendary

Activity: 3472

Merit: 1724

Quote from: boozer on March 05, 2012, 09:35:09 PM

Try mining with the card on windows and open gpu-z which reads temperatures from all sensors. This is how I found why one of my cards was throttling despite <80C shown in bamt and stock voltage.

boozer

sr. member

Activity: 309

Merit: 250

Quote from: Definit on March 06, 2012, 02:20:49 AM

as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.

for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?

i cant even seem to get it to like those settings on 0.5...

even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...

doesnt make sense to me right now

have you tried running them all at stock 725/300/1.05v? That's what I went back to and have been the most stable on thus far.... I'm headed to bed now.. see how it is in the AM.

Definit

sr. member

Activity: 357

Merit: 250

as for the gpu thing i mentioned earlier... re-writing bamt onto the usb drive worked...but as i add in a 2nd & or 3rd 5970 a whole new set of problem's occur. urggh.

for everyone running 6 gpu's / 3 5970's... how many watts you pushing them with? with the 800/300/1.05v ?

i cant even seem to get it to like those settings on 0.5...

even with just 2 5970's (4gpus) with 1-1300w psu rosewill lightning, it just seems to lock up... i tried adjusting the boot up time to a more spread 10sec between them...but nothing, just locking up...

doesnt make sense to me right now

boozer

sr. member

Activity: 309

Merit: 250

Is there any possibility that a failure on "any" GPU gets reported as GPU0 issue, regardless which gpu it was? I removed the card i thought was bad... so only had 4 GPU's, but the new gpu0 again started having the same problem.

So I added the card I thought had issues back into the mix and set every card to stock and it has been running fine. So before, gpu0 running stock would die, but with all other gpus at stock, everything seems to be fine.... its only been 30 minutes, but thats longer than it used to last, lol. Just thought I would ask about the gpu0 theory, but maybe they both had problems.... I'll check in the AM to see if it stayed up the rest of the night.

BitMinerN8

hero member

Activity: 626

Merit: 500

Mining since May 2011.

Quote from: boozer on March 05, 2012, 11:18:29 PM

Quote from: BitMinerN8 on March 05, 2012, 11:02:38 PM

Quote from: lodcrappo on March 05, 2012, 10:58:05 PM

you can put

detect_defunct: 0

in settings area of bamt.conf. that will stop mother from rebooting. i'd guess that would just leave the thing hung up and not mining every 20 minutes instead of rebooting, but who knows.

if temps are jumping all over the place... well doesn't sound good to me.

You say you have 3 GPU on this rig. Is the GPU in the middle or on an end? Maybe swap them around and see if the problem follows the card. I had 1 of the 5 on one of my rigs that was acting up. I moved it to the end with nothing blocking the fans and it's been stable now for 5 days.

Mother just put noGPU0 back into the ACTIVE directory, so its not even running now, however, it just rebooted again... now its the second core on that same card that got the OC removed... I had it at 800/300... sigh....

BitMinerN8:
Its at the top and currently running about 10C higher than the middle and bottom card with one core disabled and the other at stock. I'll try moving it to the "bottom" slot.. I have an open air rig, so the "bottom" slot runs the coolest as nothing is in its way... I assume there is no way to determine GPU numbers it will be if I move it? I thought I remember reading somewhere that it was just based on the motherboard.

If i still have issues, I'll try resetting the thermal paste/heatsink.. and if still issues... I'll finish beating my head on a wall and return it as i just bought it on ebay, lol.

There is a cool tool built into bamt since fix 13 for helping to determine GPU numbers. Type: idgpu
More info here: http://bamter.org/redmine/news/11

Definit

sr. member

Activity: 357

Merit: 250

im actually dealing with an issue that is similar... maybe not, but...
with 1 5970 in only, no matter if i switch it out with a different 5970=

GPU0 gets 368 m/hs
GPU1 gets 322 m/hs

it seems as if gpu1 will never overclock no matter the settings, even if its the same as gpu0 in which case should work since its dual gpu...

trying to re-write the usb, and start fresh........any ideas as to what i might need to do will be tried n tested.

xanadu

member

Activity: 63

Merit: 10

Quote

looks like xwindows isn't happy. try (logged in as root, never sudo):
/opt/bamt/start_mining pre
(let it sit there for a bit)

and

xauth merge /home/user/.Xauthority
all that happens on boot anyway, but sometimes if your rig boots slowly it can happen too soon.

the best test for if its fixed is:
atitweak -s
if atitweak can't list your cards, the ADL libs aren't working and BAMT isn't going to have any joy.

That did the trick, just running the /opt/bamt/start_mining pre and everything suddenly appeared on my monitoring machine, then I ran a mine restart to get things hashing. So, how do I make this happen properly during the normal booting sequence?

Thank you for the advice.
-X

lodcrappo

hero member

Activity: 616

Merit: 506

Quote from: xanadu on March 06, 2012, 12:00:11 AM

I burned a new USB stick with the .5b image tonight, edited bamt.conf and the pools files, then applied the fixes via fixer.

The GPUs won't start mining, I'm getting the "No protocol specified" error.

Mother -v returns:

Quote

mother starts (19 seconds since last run)
babysit autoconf client...
autoconf client is ok
look for defunct phoenix...
gathering GPU status...No protocol specified
done
broadcasting status
No protocol specified
checking GPU health...
refreshing desktop bg..

Running /etc/init.d/mine restart returns:

Quote

Stopping mining processes...: mine...
Starting mining processes...: minestart_mining: starting mining processes
No protocol specified
..munin.

Running aticonfig --list-adapters returns

Quote

* 0. 07:00.0 ATI Radeon HD 5900 Series
  1. 0e:00.0 ATI Radeon HD 5900 Series
  2. 0f:00.0 ATI Radeon HD 5900 Series
  3. 06:00.0 ATI Radeon HD 5900 Series
* - Default adapter

What did I miss, it seems like things should be running fine?

Thanks!
-X

looks like xwindows isn't happy. try (logged in as root, never sudo):

/opt/bamt/start_mining pre

(let it sit there for a bit)

and

xauth merge /home/user/.Xauthority

all that happens on boot anyway, but sometimes if your rig boots slowly it can happen too soon.

the best test for if its fixed is:

atitweak -s

if atitweak can't list your cards, the ADL libs aren't working and BAMT isn't going to have any joy.

xanadu

member

Activity: 63

Merit: 10

I burned a new USB stick with the .5b image tonight, edited bamt.conf and the pools files, then applied the fixes via fixer.

The GPUs won't start mining, I'm getting the "No protocol specified" error.

Mother -v returns:

Quote

mother starts (19 seconds since last run)
babysit autoconf client...
autoconf client is ok
look for defunct phoenix...
gathering GPU status...No protocol specified
done
broadcasting status
No protocol specified
checking GPU health...
refreshing desktop bg..

Running /etc/init.d/mine restart returns:

Quote

Stopping mining processes...: mine...
Starting mining processes...: minestart_mining: starting mining processes
No protocol specified
..munin.

Running aticonfig --list-adapters returns

Quote

* 0. 07:00.0 ATI Radeon HD 5900 Series
  1. 0e:00.0 ATI Radeon HD 5900 Series
  2. 0f:00.0 ATI Radeon HD 5900 Series
  3. 06:00.0 ATI Radeon HD 5900 Series
* - Default adapter

What did I miss, it seems like things should be running fine?

Thanks!
-X

Topic: BAMT version 0.5 - Easy USB based mining Linux with farm wide management tools - page 66. (Read 324176 times)