BAMT - Easy persistent USB key based linux for dedicated miners/mining farms - page 38.

lodcrappo

hero member

Activity: 616

Merit: 506

Quote from: jamesg on September 01, 2011, 11:14:31 AM

Quote from: gnar1ta$ on September 01, 2011, 11:06:02 AM

Sounds like one of your cards is hanging. gpumon and atitweak don't work for me if one hangs (or crashes?? IDK the correct term). type top and look for a phoenix instace at 100% cpu. I have been able to edit the delay times in the .conf file and run gpumon quick enough to see which adapter is causing the issue.

I should have given more info. The computer ends up hanging randomly, hours or days after it was started. With the 5970s, i did need to up the wait time between starting gpus from 3 to 6 as the computer would sometimes hang when starting the mine process.

Step 1: Remove all overclocking. Does the problem go away?

Step 2: Remove the GPUs one at a time. After each one is removed, does the problem go away?

lodcrappo

hero member

Activity: 616

Merit: 506

Quote from: blackhat on September 01, 2011, 07:43:06 AM

Quote from: lodcrappo on August 31, 2011, 05:29:03 PM

Since I don't have any of this hardware, there is really nothing I can do. If you guys find a solution I am happy to put it into the next version of BAMT.

To my current knowledge and after the research I've been done it's nothing special about the hardware. Everything is fine until one puts 8 GPU or more together. 6 GPU work. Today I will find out if 7 GPU will be working, by replacing one of the 6870x2 cards with a 5850. I'll let you guys know.

I don't have any motherboards that take more than 3 GPUs. I went with "build your farm as cheap as possible", which meant lots of crap motherboards, crap power supplies, and tons of GPUs. Those massive motherboards that run tons of GPUs and monster power supplies just don't work economically. Neither do the dual GPU cards.

So what I am saying is, I have no way to test 8 GPUs, or even 4 for that matter

blackhat

newbie

Activity: 53

Merit: 0

Quote from: blackhat on September 01, 2011, 07:43:06 AM

To my current knowledge and after the research I've been done it's nothing special about the hardware. Everything is fine until one puts 8 GPU or more together. 6 GPU work. Today I will find out if 7 GPU will be working, by replacing one of the 6870x2 cards with a 5850. I'll let you guys know.

7 GPU work without any hassle.

As soon as it gets to 8, X is punching out. It's irrelevant which card I remove, all of them work OK in combination with two others. The 4 cards together all show up in aticonfig --list-adapters, however starting X is impossible.

gnar1ta$

donator

Activity: 798

Merit: 500

Quote from: jamesg on September 01, 2011, 01:37:19 PM

If one card crashes it takes the rest of them down with it or this a driver issue where a card crashes and none of the other cards can function?

I think they will all continue mining for some time but eventually they will drop, I've had plenty mornings when testing clocks overnight causes a card to hang and the rest stop mining.

jamesg

vip

Activity: 1358

Merit: 1000

AKA: gigavps

Quote from: gnar1ta$ on September 01, 2011, 01:19:05 PM

You can still use top to monitor the processes, or screen -r gpuX to see the individual miners. But it still sounds like a card crashing, mine sometimes takes hours or days.

If one card crashes it takes the rest of them down with it or this a driver issue where a card crashes and none of the other cards can function?

gnar1ta$

donator

Activity: 798

Merit: 500

Quote from: jamesg on September 01, 2011, 11:14:31 AM

Quote from: gnar1ta$ on September 01, 2011, 11:06:02 AM

Sounds like one of your cards is hanging. gpumon and atitweak don't work for me if one hangs (or crashes?? IDK the correct term). type top and look for a phoenix instace at 100% cpu. I have been able to edit the delay times in the .conf file and run gpumon quick enough to see which adapter is causing the issue.

I should have given more info. The computer ends up hanging randomly, hours or days after it was started. With the 5970s, i did need to up the wait time between starting gpus from 3 to 6 as the computer would sometimes hang when starting the mine process.

You can still use top to monitor the processes, or screen -r gpuX to see the individual miners. But it still sounds like a card crashing, mine sometimes takes hours or days.

jamesg

vip

Activity: 1358

Merit: 1000

AKA: gigavps

Quote from: gnar1ta$ on September 01, 2011, 11:06:02 AM

Sounds like one of your cards is hanging. gpumon and atitweak don't work for me if one hangs (or crashes?? IDK the correct term). type top and look for a phoenix instace at 100% cpu. I have been able to edit the delay times in the .conf file and run gpumon quick enough to see which adapter is causing the issue.

I should have given more info. The computer ends up hanging randomly, hours or days after it was started. With the 5970s, i did need to up the wait time between starting gpus from 3 to 6 as the computer would sometimes hang when starting the mine process.

gnar1ta$

donator

Activity: 798

Merit: 500

Quote from: jamesg on September 01, 2011, 08:34:53 AM

I am having an issue where mining on a rig will hault for no apparent reason. I can still ssh into the box and if I reboot, everything starts up fine again. Also, when I try to access gpumon after ssh-ing into the box, the process seem to be hung.

Is there a way to monitor the phoenix processes and if they become hung, restart them or the box itself?

Sounds like one of your cards is hanging. gpumon and atitweak don't work for me if one hangs (or crashes?? IDK the correct term). type top and look for a phoenix instace at 100% cpu. I have been able to edit the delay times in the .conf file and run gpumon quick enough to see which adapter is causing the issue.

kirax

member

Activity: 77

Merit: 10

Quote from: mikeo on September 01, 2011, 08:29:03 AM

I've messed up a couple pools files in /etc/bamt/ and I want to delete them. However, from File Manager I get a permissions error that won't allow me to delete, rename, or overwrite the old poolsX files. Someone help this linux noob, please. I have changed the default P/W.

As long as you are comfortable with a little command line, that is the easiest way to do it: Go to your "start menu that we cannot call a start menu because microsoft trademarked it", whatever it is called these days: Under the top option, I think it is system? There is a root terminal option. You'll have to enter your root password, default is "changeme", but you did specify you changed it. Once in there, the following commands, without quotes of course:
"cd /etc/bamt" This brings you to the directory /etc/bamt, similar to cd on dos systems
"ls" this is similar to the dos "dir" command, in that it will show you all of the files in the directory
To remove a file, like if the file is your pool32 , just go "rm ./pool32", and it should delete it, so you can put whatever else you want in its place. The "./" just tells it to look in the current directory and no where else for the file... Generally totally not needed, but safer :p

I usually remote in to my BAMT boxes from my desktop, so I do not remember how to do it in the gui, although you might want to look for somethign like "root file manager" in the menu if you want to do it that way.

jamesg

vip

Activity: 1358

Merit: 1000

AKA: gigavps

I am having an issue where mining on a rig will hault for no apparent reason. I can still ssh into the box and if I reboot, everything starts up fine again. Also, when I try to access gpumon after ssh-ing into the box, the process seem to be hung.

Is there a way to monitor the phoenix processes and if they become hung, restart them or the box itself?

mikeo

full member

Activity: 196

Merit: 100

Oikos.cash | Decentralized Finance on Tron

I've messed up a couple pools files in /etc/bamt/ and I want to delete them. However, from File Manager I get a permissions error that won't allow me to delete, rename, or overwrite the old poolsX files. Someone help this linux noob, please. I have changed the default P/W.

blackhat

newbie

Activity: 53

Merit: 0

Quote from: lodcrappo on August 31, 2011, 05:29:03 PM

Since I don't have any of this hardware, there is really nothing I can do. If you guys find a solution I am happy to put it into the next version of BAMT.

To my current knowledge and after the research I've been done it's nothing special about the hardware. Everything is fine until one puts 8 GPU or more together. 6 GPU work. Today I will find out if 7 GPU will be working, by replacing one of the 6870x2 cards with a 5850. I'll let you guys know.

jamesg

vip

Activity: 1358

Merit: 1000

AKA: gigavps

Quote from: blackhat on August 31, 2011, 01:27:27 PM

Has anyone had a similar or the same problem while migrating to BAMT? I already found this one: http://blog.zorinaq.com/?e=46, but there's said that 8 GPU work fine whereas 10 GPU are currently the limiting number.

I am glad it isn't just me. Unfortunately I am not a rocket surgeon when it comes to linux, so I think I'm just going to do 3 cards per box. If this is fixed, awesome, if not, it definitely won't derail my plans. Thanks again for BAMT, it's awesome.

lodcrappo

hero member

Activity: 616

Merit: 506

Quote from: kirax on August 31, 2011, 09:59:11 AM

Quote from: Mobius on August 31, 2011, 07:58:50 AM

Quote from: lodcrappo on August 30, 2011, 09:21:46 PM

Quote from: mikeo on August 30, 2011, 09:13:30 PM

Quote from: lodcrappo on August 30, 2011, 11:40:02 AM

Quote from: mikeo on August 30, 2011, 08:45:32 AM

A proper pools file example in the wiki for multiple mining rigs would also be helpful. I currently have a separate poolsX file for every GPU. This works but there must be a simpler method.

One simpler method is to just have one pools file and use it for all your GPUs. You only need separate files if you actually want each GPU going to a different pool.

@lodcrappo
Feeling rather dense tonight. So if I create one pool file for 9 miners on Deepbit the three separate machines hosting the 9 GPUs will read the file (on each machine, haven't done a single config file yet) and they will all 9 find a miner account on Deepbit? I must be missing something.

you need one pools file per machine, though the managed config option can help you get a single central file onto all the machines. for instance I have one pools file running 12 machines with total of 30 GPUs. it is stored on a server, and all the rigs pull the file using rsync. this is through the managed_config_command in bamt.conf.

a simpler compromise would be to just copy the same pools file onto all your machines, if you don't change pools a lot that will work fine.

Some pools have separate worker sub accounts, I keep a directory on my workstation with one pool file for each gpu and then run scp manually(in a .sh with one scp entry per rig) to update all the rigs.

any pools have that... but in general, there is not a lot of need to use them. I have had 12 cards connecting to one account on btcguild, working fine.

Yeah, I have never needed to make more than one worker for any given pool, and frankly with the number of GPUs in my farm I just wouldn't mine at a pool that required such silliness. No way I'm managing a bunch of silly worker accounts.

lodcrappo

hero member

Activity: 616

Merit: 506

blackhat

newbie

Activity: 53

Merit: 0

Quote from: kirax on August 31, 2011, 02:33:24 PM

The extenders used, by the way, did they happen to be powered?

I don't use any extenders, gigasvps uses one. The cards I'm talking about are 4 Dual-GPU Cards with actually two GPUs per card. So no need for an extender if you use a board that can take 4 cards.

Quote

YOu can have the biggest, baddest power supply in the world, but a motherboard can only supply so much power,

I'm not quite sure which circuit we are talking about, 5V / 3,3V from the mobo?

The GPU gets its main current from separate +12V feeds directly connected to the PSU, as you probably know.
The powerdraw from the other circuits is not as much, so the mobo won't get into trouble. earlier grafix cards sucked the hell out of the slots (some early-day AGP-driven monster GPUs without additional +12V feeds), but today this is hardly an issue.

Quote

If X dies just as it tries to display the GUI, wouldn't that be right where it tries to kick them all to 3D mode and the power draw increases?

Yesterday, I suspected the same. But it's not logical. It may sound strange, but 4 6870 dual-cores pull just the same as 3 6990 dual-cores from the system. When the latter works (and there is wide proof that it does) there is no reason why the former shouldn't work too, provided you use the right PSU. Plus, I noticed the input load climbing up to ~600W shortly after starting X with three cards. consequently, with 4 cards it should be anywhere around 800 to 900, but no higher. Remember, when X starts, the cards just initialize in graphics mode, and fall back to sleep soon. This short load spike should have been handled by the 1250W PSU that was already installed yesterday. But hey, for to be sure: Today I installed a 1500W SilverStone PSU, with 120A outlets just only for +12V (!) and as soon as I placed the fourth card into the system, the dang thing did go dying again. This can't be a power issue anymore.

But, aside from power issues, I figured out that the system didn't just lock up as a hardware failure would do. As said, X segfaults after initializing the AMD FireGL driver. This is seconds away before the load real climbs up in the normal case, and power failure on any circuit wouldn't make a driver segfaulting. It would shutdown the board. Seriously, I doubt this. I'm pretty sure that it is a bug or maybe a limitation in the current drivers. I appreciate any hint on this one.

kirax

member

Activity: 77

Merit: 10

blackhat

newbie

Activity: 53

Merit: 0

Quote from: jamesg on August 30, 2011, 09:08:50 PM

Anyone running 8 GPUs with bamt?

Yes. Well, uh .... no! I *tried* but got stuck where you are standing now.

I've got 4x 6870x2 (DUAL-GPU) that I've been struggling with for some days.

Goal is to put them on a single board altogether.

When I try to boot BAMT (0.4b, all recent fixes), I got the same problem. This probably hasn't got
to do with BAMT directly, but with X, or more specifically with the AMD fglrx drivers that X calls.
When I remove one of the Dual-GPUs, hence only 6 GPU's left in the system, BAMT boots happily
doing its work.

I've put some time into that issue today, because I'm very keen on putting on this rig ultimately with
all FOUR cards. Operating with only three cards is simply not an option. Wink

Quote

So, when i had 8 GPUs installed, BAMT died when it tried to display the GUI. So it made it through the BAMT startup screen and the initial startup.

Exactly same thing here. What happens is this:

X tries to start up and initialize all cards. When 8 GPU are at work the fglrx driver segfaults with following error:

Code:

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x3b) [0x80adedb]
1: /usr/bin/X (0x8048000+0x5aab5) [0x80a2ab5]
2: (vdso) (__kernel_rt_sigreturn+0x0) [0xb778740c]
3: /usr/lib/xorg/modules/drivers/fglrx_drv.so (xdl_x750_atiddxPreInit+0x2554) [0xb6946d84]
4: /usr/bin/X (InitOutput+0x5c8) [0x80b09b8]
5: /usr/bin/X (0x8048000+0x1e7f0) [0x80667f0]
6: /lib/i686/cmov/libc.so.6 (__libc_start_main+0xe6) [0xb74bec76]
7: /usr/bin/X (0x8048000+0x1e5a1) [0x80665a1]
Segmentation fault at address 0x8

Fatal server error:
Caught signal 11 (Segmentation fault). Server aborting

The full Xorg.0.log shows that all 8 GPUs (plus the primary iGPU) get detected correctly,
and then the drivers are loaded. After this, the backtrace shows up and X can't get started.
Because you have the same issue, but probably are connected through the first (primary) GPU
with your monitor, you are not seeing anything because on bailout of X the cards get reset
and the system get stuck in an unstable state. I could only see this after enabling the iGPU
(onboard gfx) from the mobo and plugging the monitor there. (Didn't help on the effect, though)

Unfortunately, I'm not sure if upgrading to latest 11.8 helps. Frankly, I doubt it.

Quote

I removed the GPU on the 1x extender and all is well. Is this a problem of me needing a MB with 4 16x PCIe slot or is it a problem with BAMT?

It's probably not the extender and the mobo is well, too, as long as you get to boot into the kernel and stop right before X starts and bails out with the segfault. If it stops way before, i.e. throwing things at you before booting into the FS and INIT, you're encountering a different problem.

Has anyone had a similar or the same problem while migrating to BAMT? I already found this one: http://blog.zorinaq.com/?e=46, but there's said that 8 GPU work fine whereas 10 GPU are currently the limiting number.

kirax

member

Activity: 77

Merit: 10

Quote from: Mobius on August 31, 2011, 07:58:50 AM

Quote from: lodcrappo on August 30, 2011, 09:21:46 PM

Quote from: mikeo on August 30, 2011, 09:13:30 PM

Quote from: lodcrappo on August 30, 2011, 11:40:02 AM

Quote from: mikeo on August 30, 2011, 08:45:32 AM

A proper pools file example in the wiki for multiple mining rigs would also be helpful. I currently have a separate poolsX file for every GPU. This works but there must be a simpler method.

One simpler method is to just have one pools file and use it for all your GPUs. You only need separate files if you actually want each GPU going to a different pool.

@lodcrappo
Feeling rather dense tonight. So if I create one pool file for 9 miners on Deepbit the three separate machines hosting the 9 GPUs will read the file (on each machine, haven't done a single config file yet) and they will all 9 find a miner account on Deepbit? I must be missing something.

you need one pools file per machine, though the managed config option can help you get a single central file onto all the machines. for instance I have one pools file running 12 machines with total of 30 GPUs. it is stored on a server, and all the rigs pull the file using rsync. this is through the managed_config_command in bamt.conf.

a simpler compromise would be to just copy the same pools file onto all your machines, if you don't change pools a lot that will work fine.

Some pools have separate worker sub accounts, I keep a directory on my workstation with one pool file for each gpu and then run scp manually(in a .sh with one scp entry per rig) to update all the rigs.

any pools have that... but in general, there is not a lot of need to use them. I have had 12 cards connecting to one account on btcguild, working fine.

Mobius

hero member

Activity: 988

Merit: 1000

Quote from: lodcrappo on August 30, 2011, 09:21:46 PM

Quote from: mikeo on August 30, 2011, 09:13:30 PM

Quote from: lodcrappo on August 30, 2011, 11:40:02 AM

Quote from: mikeo on August 30, 2011, 08:45:32 AM

A proper pools file example in the wiki for multiple mining rigs would also be helpful. I currently have a separate poolsX file for every GPU. This works but there must be a simpler method.

One simpler method is to just have one pools file and use it for all your GPUs. You only need separate files if you actually want each GPU going to a different pool.

@lodcrappo
Feeling rather dense tonight. So if I create one pool file for 9 miners on Deepbit the three separate machines hosting the 9 GPUs will read the file (on each machine, haven't done a single config file yet) and they will all 9 find a miner account on Deepbit? I must be missing something.

you need one pools file per machine, though the managed config option can help you get a single central file onto all the machines. for instance I have one pools file running 12 machines with total of 30 GPUs. it is stored on a server, and all the rigs pull the file using rsync. this is through the managed_config_command in bamt.conf.

a simpler compromise would be to just copy the same pools file onto all your machines, if you don't change pools a lot that will work fine.

Some pools have separate worker sub accounts, I keep a directory on my workstation with one pool file for each gpu and then run scp manually(in a .sh with one scp entry per rig) to update all the rigs.

Topic: BAMT - Easy persistent USB key based linux for dedicated miners/mining farms - page 38. (Read 167490 times)