Pages:
Author

Topic: Smartcoin Linux mining administration. [MULTI-MACHINE SUPPORT NOW IN!] - page 12. (Read 105029 times)

full member
Activity: 134
Merit: 100
I want to donate and currently do, but can you lower the aggression level you have set for the donation period? My machines basically are frozen when donation runs.

The donation profile uses the miner flagged as default. You can change the default miner and/or the options used by that miner (such as aggression) in the Configure Miners -> Edit.


The special donation profile was written before failover was implemented. I'm thinking in the future, that I will rewrite the donation profile to use failover (instead of multiple instances), which would also help out.

Thanks Jon I'll double check that.
full member
Activity: 238
Merit: 100
Quote
I also run two of my cards in the 80-83 range and things seem fine. I've successfully modified overclock and fan speed settings using the command-line option of AMDOverdriveCtrl while GPUs were mining. Whatever you do for temp functions, I think you should use hook scripts that are user customizable. At least those are my thoughts...
Yeah, I'm thinking along the same lines, as there is just an incredible amount of variance to deal with, and each user will probably have a unique set of needs.

Also, the hook scripts are going to get a nice overhaul soon, such as having parameters passed into them when they are launched
(For instance, passing the machine number as an argument to the lockup.sh script so you can have different cases for different machines when multi-machine support is in..  Or having the GPU temperatures passed in to a temperature.sh script to make things easier)
full member
Activity: 238
Merit: 100
I want to donate and currently do, but can you lower the aggression level you have set for the donation period? My machines basically are frozen when donation runs.

The donation profile uses the miner flagged as default. You can change the default miner and/or the options used by that miner (such as aggression) in the Configure Miners -> Edit.


The special donation profile was written before failover was implemented. I'm thinking in the future, that I will rewrite the donation profile to use failover (instead of multiple instances), which would also help out.
member
Activity: 79
Merit: 10
Jon --

I'm adding a 4th card to my motherboard and having some heat issues I'm trying to sort out.  I'm watching the temps and as they crest 80C I want to shut the GPU's miner off.

I went under "Configure Devices" and edited GPU2, just took the defaults, but when it asked me if I wanted to disable it, I chose "y".  This was while all four GPUs were mining.

When I went back to the miner screen, GPU2 had been removed from the top temp and load display.  However, GPU2 was still listed under the mining display and was clearly still mining.

Not sure if this is a bug or intended, but I would think that both displays should be consistent.  I would also LOVE a way to just Idle a worker easily, so I can make adjustments.  I find I basically have to kill Smartcoin, make my changes, and if one of the GPUs is still too warm, I have to shut everything down again.  I know you mentioned an IDLE profile, which would work, but my preference would be to just Idle an individual worker, or all workers, which would provide better flexibility.

Just a thought.

Yes, i need to start adding the code that syncs "live" changes. The lower level stuff for this is already in place, but I haven't gotten around to making use of it yet.  I'll be adding this in soon though, and the principle in how it works will be quite simple.. After an Add/Edit/Delete, a check will run to see if whatever was added/edited/deleted has any part in the current profile, and if so force a reload of the miners.

I know this is personal preference, but I don't worry at all until over 90 degrees - I think cards will run along just fine in the 80-degree range.

I do plan on adding some temperature related functions eventually - i'm just not sure the best way to go about it yet (controlling fan speed? Dynamically enabling/disabling the gpu? etc)


Also, regarding disabling workers, the workers table in the database already does have a 'disabled' field (though it isn't used yet), but eventually when the live changes syncing is in place, and I give access to the disabled field through add/edit workers, then you will have the functionality that you were thinking about!

I also run two of my cards in the 80-83 range and things seem fine. I've successfully modified overclock and fan speed settings using the command-line option of AMDOverdriveCtrl while GPUs were mining. Whatever you do for temp functions, I think you should use hook scripts that are user customizable. At least those are my thoughts...

member
Activity: 79
Merit: 10
@jondecker76

Yeah, my box has been running fine even with bad inet, so it's not really necessary to do something more custom than what you already have going! My cards also very between each other, however are quite consistent among themselves. So the only way you could use temp data is for each GPU miner to have a temp profile of idle and working with some slack built-in. That said, I think your lockup detection is working fine. For me, I'll just add a system reboot on lockup since a locked up card can't be recovered AFAIK.

Thanks,
jaebird
full member
Activity: 238
Merit: 100
Jon --

I'm adding a 4th card to my motherboard and having some heat issues I'm trying to sort out.  I'm watching the temps and as they crest 80C I want to shut the GPU's miner off.

I went under "Configure Devices" and edited GPU2, just took the defaults, but when it asked me if I wanted to disable it, I chose "y".  This was while all four GPUs were mining.

When I went back to the miner screen, GPU2 had been removed from the top temp and load display.  However, GPU2 was still listed under the mining display and was clearly still mining.

Not sure if this is a bug or intended, but I would think that both displays should be consistent.  I would also LOVE a way to just Idle a worker easily, so I can make adjustments.  I find I basically have to kill Smartcoin, make my changes, and if one of the GPUs is still too warm, I have to shut everything down again.  I know you mentioned an IDLE profile, which would work, but my preference would be to just Idle an individual worker, or all workers, which would provide better flexibility.

Just a thought.

Yes, i need to start adding the code that syncs "live" changes. The lower level stuff for this is already in place, but I haven't gotten around to making use of it yet.  I'll be adding this in soon though, and the principle in how it works will be quite simple.. After an Add/Edit/Delete, a check will run to see if whatever was added/edited/deleted has any part in the current profile, and if so force a reload of the miners.

I know this is personal preference, but I don't worry at all until over 90 degrees - I think cards will run along just fine in the 80-degree range.

I do plan on adding some temperature related functions eventually - i'm just not sure the best way to go about it yet (controlling fan speed? Dynamically enabling/disabling the gpu? etc)


Also, regarding disabling workers, the workers table in the database already does have a 'disabled' field (though it isn't used yet), but eventually when the live changes syncing is in place, and I give access to the disabled field through add/edit workers, then you will have the functionality that you were thinking about!
full member
Activity: 238
Merit: 100
As a developer, I'd be reluctant to add network detection failure to my mining software, just because it would seem out of the core functionality of Smartcoin itself and adds bloat, especially because Jon is doing all of this in bash scripting.  I would leave it up to the user to figure out if they have a network problem or not.  GPU failure detection is within the core, so it makes sense to have Smartcoin do that and then provide hooks to allow the user to take actions.

I think you could argue it on either side. Mining cannot be done if the GPU is locked or the network is unavailable. Both are edge conditions outside the normal operation of smartcoin, however both have the potential to reduce mining's effectiveness.

BTW, It appears that it was not internet connectivity that was causing my lockup detection issue. It was the accidental rollback of SVN that Jon mentioned several posts back.

@jondecker76,

When will the lockup.sh get called? If this script exists, is that all smartcoin calls? If the script is not there, does it default to killing smartcoin?

I've noticed that when a GPU locks up the temps drop even though the card reports 99% utilization... this is another indicator that the GPU is locked. Whereas when the miner is idle (not locked), both temps and % utilization are down. I'm thinking of a pre_lockup_detection hook script that smartcoin calls that can return a result. This hook script could also check for connectivity Smiley

Thanks,
jaebird

The lockup.sh script gets called after a lockup condition is detected just before the miners are restarted - but only if the script exists.  So basically, lockup script or not smartcoin will continue to restart the miner instances each time a lockup is detected.  Then it will be the users responsibility to take action on this event in their lockup.sh script if they want (including stopping smartcoin with the 'smartcoin --kill' command). It would be easy to ping a known server from the lockup.sh script and decide for yourself what to do if the Internet is down (For example, if the internet is down, kill smartcoin and run a loop pinging an internet server every minute, then restart smartcoin when the Internet is back online, for example)

Regarding detecting lockup via GPU temperature, I think it would be much harder than it appears. For example, on my mining rig, there are 3 cards. One of them runs at about 55 degrees, another at around 68 degrees and another around 80 degrees. There is so much variance (because of individual cases of airflow, whether the card is sandwiched between other cards or on the end, etc.) that I think it would be very hard to implement in a general sense.
Also, i still think the current lockup detection scheme works fine for failed internet connections. If the internet goes down, then smartcoin will continue to restart the miners every 5 minutes or so until it comes back, which does no harm.
brand new
Activity: 0
Merit: 0
On one of my rig running this, it has weird issues. When I update a worker, the assigned pool # will show up increased by one than it actually is assigned to. Sometimes this happens to miner assignment as well.


Are you talking about the pre-filled in selection while editing?

Code:
Smartcoin r488e 09:42:39
EDIT PROFILE INSTANCE

1) GPU[0] - BTCGuild USEAST.btcguild_useast - phoenix
2) GPU[0] - BTCguild USWEST.btcguild_uswest - phoenix
3) GPU[0] - BTCguild USWEST.btcguild_uswest - phoenix
4) GPU[0] - BTCguild USWEST.btcguild_uswest - phoenix
5) GPU[1] - BTCGuild USEAST.btcguild_useast - phoenix
6) GPU[1] - BTCguild USWEST.btcguild_uswest - phoenix
7) GPU[1] - BTCguild USWEST.btcguild_uswest - phoenix
8) GPU[1] - BTCguild USWEST.btcguild_uswest - phoenix
9) GPU[2] - BTCGuild USEAST.btcguild_useast - phoenix
10) GPU[2] - BTCguild USWEST.btcguild_uswest - phoenix
11) GPU[2] - BTCguild USWEST.btcguild_uswest - phoenix
12) GPU[2] - BTCguild USWEST.btcguild_uswest - phoenix
Which instance above do you want to edit?
12

1) phoenix
2) poclbm
Please select the miner from the list above to use with this instance
1

1) BTCGuild USEAST.lab0
2) BTCGuild USEAST.lab1
3) BTCGuild USEAST.lab2
4) DeepBit.deepbit_lab0
5) DeepBit.deepbit_lab1
6) DeepBit.deepbit_lab2
7) MtRed.mtred_1
8) MtRed.mtred_2
9) BTCGuild USEAST.btcguild_useast
10) BTCguild USCENTRAL.btcguild_uscentral
11) BTCguild USWEST.btcguild_uswest
12) DeepBit.deepbit
13) MtRed.mtred_0
14) Bitcoin.cz (slush).slush
15) MtRed.mtred
16) Eligius.eligius_lab
17) BTCguild USWEST.btcguild_uswest2
Please select the pool worker from the list above to use with this instance
12


When editing profile, instance 12 should be associated with worker 11, but instead is showing associated with worker 12. If I just hit enter, it will be associated with deepbit instead of btcguild_uswest. This only happens to one of my rig though.
full member
Activity: 134
Merit: 100
I want to donate and currently do, but can you lower the aggression level you have set for the donation period? My machines basically are frozen when donation runs.
member
Activity: 84
Merit: 10
Jon --

I'm adding a 4th card to my motherboard and having some heat issues I'm trying to sort out.  I'm watching the temps and as they crest 80C I want to shut the GPU's miner off.

I went under "Configure Devices" and edited GPU2, just took the defaults, but when it asked me if I wanted to disable it, I chose "y".  This was while all four GPUs were mining.

When I went back to the miner screen, GPU2 had been removed from the top temp and load display.  However, GPU2 was still listed under the mining display and was clearly still mining.

Not sure if this is a bug or intended, but I would think that both displays should be consistent.  I would also LOVE a way to just Idle a worker easily, so I can make adjustments.  I find I basically have to kill Smartcoin, make my changes, and if one of the GPUs is still too warm, I have to shut everything down again.  I know you mentioned an IDLE profile, which would work, but my preference would be to just Idle an individual worker, or all workers, which would provide better flexibility.

Just a thought.
member
Activity: 79
Merit: 10
As a developer, I'd be reluctant to add network detection failure to my mining software, just because it would seem out of the core functionality of Smartcoin itself and adds bloat, especially because Jon is doing all of this in bash scripting.  I would leave it up to the user to figure out if they have a network problem or not.  GPU failure detection is within the core, so it makes sense to have Smartcoin do that and then provide hooks to allow the user to take actions.

I think you could argue it on either side. Mining cannot be done if the GPU is locked or the network is unavailable. Both are edge conditions outside the normal operation of smartcoin, however both have the potential to reduce mining's effectiveness.

BTW, It appears that it was not internet connectivity that was causing my lockup detection issue. It was the accidental rollback of SVN that Jon mentioned several posts back.

@jondecker76,

When will the lockup.sh get called? If this script exists, is that all smartcoin calls? If the script is not there, does it default to killing smartcoin?

I've noticed that when a GPU locks up the temps drop even though the card reports 99% utilization... this is another indicator that the GPU is locked. Whereas when the miner is idle (not locked), both temps and % utilization are down. I'm thinking of a pre_lockup_detection hook script that smartcoin calls that can return a result. This hook script could also check for connectivity Smiley

Thanks,
jaebird
brand new
Activity: 0
Merit: 0
@jondecker76

For reboot, I recommend something like this
Code:
sudo su
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger
as most GPU lock up will stuck on sudo reboot command. but I am not sure how to run that as root, something like sudo echo 1 > /proc/sys/kernel/sysrq doesn't seem to work.
newbie
Activity: 53
Merit: 0
Hey! I just started using smartcoin today. But my second GPU wont run!

This is what i get when i try to launch it in Phoenix
Code:
miner@miner-MS-7642:~$ startminer.sh 1
ATI Overdrive(TM) enabled
ERROR - Set clocks failed for Adapter 1 - ATI Radeon HD 5800 Series
        Please check that input values were valid
Startming Miner: 1
[20/07/2011 16:44:42] FATAL kernel error: Failed to apply BFI_INT patch to kernel! Is BFI_INT supported on this hardware?
miner@miner-MS-7642:~$

This is definitely a configuration problem with phoenix or a problem with your card. I think the 5800 should support BFI_INT.   I also see that you have an aticonfig error on this GPU as well. This almost looks like a fried card to me. You could edit your startminer.sh script and remove the bfi_int flag and see if it helps.  I would also run some aticonfig commands manually and see if it talks to the card properly.

Yeah, but the card works fine in windows :S Ill try to remove the BFI_INT flag
full member
Activity: 238
Merit: 100
On one of my rig running this, it has weird issues. When I update a worker, the assigned pool # will show up increased by one than it actually is assigned to. Sometimes this happens to miner assignment as well.


Are you talking about the pre-filled in selection while editing?

Code:
Smartcoin r488e 09:42:39
EDIT PROFILE INSTANCE

1) GPU[0] - BTCGuild USEAST.btcguild_useast - phoenix
2) GPU[0] - BTCguild USWEST.btcguild_uswest - phoenix
3) GPU[0] - BTCguild USWEST.btcguild_uswest - phoenix
4) GPU[0] - BTCguild USWEST.btcguild_uswest - phoenix
5) GPU[1] - BTCGuild USEAST.btcguild_useast - phoenix
6) GPU[1] - BTCguild USWEST.btcguild_uswest - phoenix
7) GPU[1] - BTCguild USWEST.btcguild_uswest - phoenix
8) GPU[1] - BTCguild USWEST.btcguild_uswest - phoenix
9) GPU[2] - BTCGuild USEAST.btcguild_useast - phoenix
10) GPU[2] - BTCguild USWEST.btcguild_uswest - phoenix
11) GPU[2] - BTCguild USWEST.btcguild_uswest - phoenix
12) GPU[2] - BTCguild USWEST.btcguild_uswest - phoenix
Which instance above do you want to edit?
12

1) phoenix
2) poclbm
Please select the miner from the list above to use with this instance
1

1) BTCGuild USEAST.lab0
2) BTCGuild USEAST.lab1
3) BTCGuild USEAST.lab2
4) DeepBit.deepbit_lab0
5) DeepBit.deepbit_lab1
6) DeepBit.deepbit_lab2
7) MtRed.mtred_1
8) MtRed.mtred_2
9) BTCGuild USEAST.btcguild_useast
10) BTCguild USCENTRAL.btcguild_uscentral
11) BTCguild USWEST.btcguild_uswest
12) DeepBit.deepbit
13) MtRed.mtred_0
14) Bitcoin.cz (slush).slush
15) MtRed.mtred
16) Eligius.eligius_lab
17) BTCguild USWEST.btcguild_uswest2
Please select the pool worker from the list above to use with this instance
12


When editing profile, instance 12 should be associated with worker 11, but instead is showing associated with worker 12. If I just hit enter, it will be associated with deepbit instead of btcguild_uswest. This only happens to one of my rig though.

Thankyou for the excellent report - it contained exactly the information that I needed to zero in on the problem

Can you try the new r489e update, and let me know if it fixes your problem?

Thanks!
full member
Activity: 238
Merit: 100
Update r489e is now available:
- Fixes a bug with default selections (thanks hipaulshi).

full member
Activity: 238
Merit: 100
Hey! I just started using smartcoin today. But my second GPU wont run!

This is what i get when i try to launch it in Phoenix
Code:
miner@miner-MS-7642:~$ startminer.sh 1
ATI Overdrive(TM) enabled
ERROR - Set clocks failed for Adapter 1 - ATI Radeon HD 5800 Series
        Please check that input values were valid
Startming Miner: 1
[20/07/2011 16:44:42] FATAL kernel error: Failed to apply BFI_INT patch to kernel! Is BFI_INT supported on this hardware?
miner@miner-MS-7642:~$

This is definitely a configuration problem with phoenix or a problem with your card. I think the 5800 should support BFI_INT.   I also see that you have an aticonfig error on this GPU as well. This almost looks like a fried card to me. You could edit your startminer.sh script and remove the bfi_int flag and see if it helps.  I would also run some aticonfig commands manually and see if it talks to the card properly.
full member
Activity: 238
Merit: 100
@jondecker76

For reboot, I recommend something like this
Code:
sudo su
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger
as most GPU lock up will stuck on sudo reboot command. but I am not sure how to run that as root, something like sudo echo 1 > /proc/sys/kernel/sysrq doesn't seem to work.

When I tested the functioning of lockup detection here, I would purposely lock up 3 GPUs by extreme overclocking - the reboot command always still worked just fine (my miner is at a friends house quite a ways away so I always have to depend on the reboot command).  So far on every test I've done, reboot has always worked.
newbie
Activity: 53
Merit: 0
Hey! I just started using smartcoin today. But my second GPU wont run!

This is what i get when i try to launch it in Phoenix
Code:
miner@miner-MS-7642:~$ startminer.sh 1
ATI Overdrive(TM) enabled
ERROR - Set clocks failed for Adapter 1 - ATI Radeon HD 5800 Series
        Please check that input values were valid
Startming Miner: 1
[20/07/2011 16:44:42] FATAL kernel error: Failed to apply BFI_INT patch to kernel! Is BFI_INT supported on this hardware?
miner@miner-MS-7642:~$
member
Activity: 84
Merit: 10
Just adding my 0.02 BTC here...

This condition *could* be handled in a custom failure script (lockup.sh).  When run, you shutdown Smartcoin (you've lost your Internet, you're not getting work and your miners are idle anyhow).  Your script runs in a loop performing a ping looking for when the network comes back.  When it comes back, you execute "smartcoin --silent" and fire it back up.  Yes, this would loop indefinitely if you've actually got a lock up...  You could add a condition to see how long it has been since you ran it last, and if too short of a time, take the reboot option...

Just thinking out loud.

yeah, I was thinking of something like that. Or also smartcoin could attempt to ping known good hosts in the event it "thinks" the cards are locked-up. It is a very very rare occurrence that all 4 cards would be locked up at the same time! In the meantime, how do I disable the lockup detection or action, add an empty "lockup.sh" script?

Thanks.

As a developer, I'd be reluctant to add network detection failure to my mining software, just because it would seem out of the core functionality of Smartcoin itself and adds bloat, especially because Jon is doing all of this in bash scripting.  I would leave it up to the user to figure out if they have a network problem or not.  GPU failure detection is within the core, so it makes sense to have Smartcoin do that and then provide hooks to allow the user to take actions.
full member
Activity: 238
Merit: 100
Off to bed for a while - had a long night at work!
Pages:
Jump to: