Pages:
Author

Topic: BAMT - Easy persistent USB key based linux for dedicated miners/mining farms - page 35. (Read 167468 times)

donator
Activity: 1731
Merit: 1008
Please re-read my post regarding power cycle.
I did read your first post, sorry not to have thanked you before.
I was to test this out further but right now I've got 10 USB stick to purchase, or maybe go back to old model  (2.5" hdd).

I'm pretty sure I had situation were "coldreboot" would not fix the problem, but I'm not sure they were caused by a hung GPU either.

 My last problem was ; I could see all 5 gpu in gpumon (in red / not working) but could not see any with "screen -ls" at this point I decided to reinstall BAMT on this USB drive,  I was able to delete partition but ,,, dud usb, It would not write a thing back to it.

So that is to say, USB failure may look very different, First failure was more obvious ; a debian page with warning "write protected, can't boot"
newbie
Activity: 53
Merit: 0
Would that not make the "now" part of "shutdown -rn now" sort of redundant? Or is "now" starting now and "n" skip the down init process?

Absolutely. "now" really means "now!". Instead, you could use some time in the future to get logged in users prepared that the machine comes down in a while.

If you call -n, it doesn't invoke the init process to terminate processes, it kills them itself and with no mercy.
The use of that command normally is discouraged (read man file on this), because many filesystems do the fsync as a last command right before init shutdowns all processes. If there is no init and no normal, ordered shutdown, chances are that some data would get lost, so don't use it without a good reason.
member
Activity: 77
Merit: 10

Quote
I did not know -r means it drops now, good to know, thanks!

It's not the "r" (that means reboot), it's the "n". Maybe sometimes it won't succeed, in particular when a GPU boiled really hot and can't be reset from the system,i.e. kexec kernel reboot. You know you've run into this if the broken card doesn't come up after a normal reboot (even with -rn).

Use "coldreboot" then.

Would that not make the "now" part of "shutdown -rn now" sort of redundant? Or is "now" starting now and "n" skip the down init process?
newbie
Activity: 53
Merit: 0
I will go ask elsewhere for more information about isolating the poor GPU instability behavior that persist beyond a reset/power cycle.

No need for that. Help is here. Please re-read my post regarding power cycle. If you simply reboot, it will invoke the so-called "kexec reboot", that simply reboots the kernel. it doesn't reboot the whole system. it's no "cold reboot". CPU, MMU, registers and -yes- GPU's won't get any reset signal from the board. If a GPU is locked up because of whatever issue, it's likely that a "soft reboot" like that won't reboot the GPU..

If you happen to run into this, use "coldreboot", this circumvents the kexec reboot and does a "full" cold restart, like a power cycle would do either. The card should work again after that.

On windows, there is no "soft reboot" like that so you won't experience that behaviour. This goes for actual windows versions only. I personally wouldn't switch back to windows just because of such an issue. The linux way - in terms of stability, scalability, remote management and sth. like that - is OK and way more sophisticated.
newbie
Activity: 53
Merit: 0
I have suggested to you before that it seems you will be happier in Windows.  Your hardware doesn't work right in BAMT.  Sorry, but such is life.  I'm not sure why you are still trying to use bamt on it, but it just isn't gunna happen man.

I wonder if this is the right way to communicate with users that support you and use your software.
And I don't think there is any reason to send back users to windows. In fact, there is absolutely no reason for that.
donator
Activity: 1731
Merit: 1008
... Your hardware doesn't work right in BAMT.  Sorry, but such is life.  I'm not sure why you are still trying to use bamt on it, but it just isn't gunna happen man.  

If you send me the details of your equipment, I will add them to the wiki as unreliable.
You can add to your unsupported hardware list "cheap ($5 range) 2gb usb keys"
Quote
I recommend using cheap ($5 range) 2gb usb keys.

Leaving very little free space will most likely boggle any write-levellling scheme. I read very few cheap usb stick have wear leveling features anyway. My take would be to use good brand 10$ range 4gb usb key.

WTF do you mean by "it just isn't gunna happen man." ? ? ? BAMT does work on all motherboard I tested, I got hit by poor USB stick (or poor write management) ,  BTW  I'm trying to contribute to documentation / wiki.

I will go ask elsewhere for more information about isolating the poor GPU instability behavior that persist beyond a reset/power cycle.

Thanks for your support.
hero member
Activity: 616
Merit: 506

Yes, I second that. After some starting probs, BAMT (i.e. phoenix) is really stable now. I'm thinking about putting these munin rrd files onto a NFS drive to avoid the usb stick being written every 5 minutes.

@lodcrappo: care to make that optional in the /etc/bamt/bamt.conf?


I think this is outside the scope of bamt itself, but of course you can configure this by adjusting the standard munin configuration files.

Future bamt images will probably not include munin (still will have munin-node).  It makes a lot more sense to collect the data from a server with reliable r/w storage than to be collecting it on the nodes themselves, and will eliminate the biggest known source of writes (there may be others, I haven't spent a huge amount of time worrying about it, but munin is an obvious writer).
hero member
Activity: 616
Merit: 506
I went to linux for stability but I find it to be less reliable than a windows box.  Or is it just BAMT ?

The thigh is, if ONE gpu fail, ALL fail, and not even a power off will bring it back, on some board a reset will, on others only switching off AC will bring it back (if it does).
And can't do that remotely.

I never had a GPU problem on windows that a reboot(reset) would not fix.



I have suggested to you before that it seems you will be happier in Windows.  Your hardware doesn't work right in BAMT.  Sorry, but such is life.  I'm not sure why you are still trying to use bamt on it, but it just isn't gunna happen man.  

If you send me the details of your equipment, I will add them to the wiki as unreliable.
hero member
Activity: 616
Merit: 506
It has been stated in the FAQ since day 1 of BAMT:

Quote
Q: Will running a writeable FS wear out my USB key?

A: Maybe. Every effort has been made to reduce the number of writes that the system will make during normal use, but the system simply has not been in production long enough to know exactly. I recommend using cheap ($5 range) 2gb usb keys.

If you ignored this advice and used more expensive keys... well I'm sure you can guess what I have to say about that Smiley
If you're surprised when your usb keys wear out... see above.

I have 3 brands of keys in my farm, well I have 2 now because each and every one of these cheap keys I got at walmart wore out after a few weeks.  Dont think they even have a brand name on them.  The others are all still going fine.

If you want to reduce writes:

apt-get remove munin munin-node apache2

That will get rid of the munin collect process which causes some writes every 5 minutes, and the web server which writes logs when accessed.

There may be other things that can be done, as explained from day 1 this just isn't something that I had time to test.  Anyone who posts a way to reduce writes here earns a gold star.




donator
Activity: 1731
Merit: 1008
Converting BAMT to normal HDD mode is not a solution. As it comes to high-density rigs, hardware really get hot inside. Even if you are using big fans, the environment temperature may be constantly over 45°C which is already too warm for many HDD's as well.
HDD temp is not a problem, an HDD is considered hot at ~55c and our rig are not so "high density" with pcie extenders.

On my previous win setup I was using a bunch of 2.5" 80gb drives and they stayed cold,  if not, the 3.5" were spinning-down after 20min and rarely woke up to write anything.

I have not monitored my BAMT via munin a single time, is setting "do_monitor:" to 0 stopping munin ?


Now, Looking at my usb key while it mine, I see em blinking 3-4 time every 20-30 sec, at this rate it must be writing ~100 000 times at the same place over a single month.

I don't know how common are USB keys implementing wear leveling since I've never seen it advertised,

I'd hate losing a good 16gb Kingston even more given I could be using old HDD that would be idling to 0.5W anyway.



newbie
Activity: 53
Merit: 0
I think it's not only the flash memory to blame. Since BAMT uses a UnionFS ontop of that cow live system (that is mounted noatime), writes to the flash already have been lowered to a minimum. But the rrd files of munin stats get updated quite often, and the live system writes that almost instantly to the flash. I guess, that's killing any sort of cheap flash-space, it's just a matter of time. Maybe there are even more processes that silently write another chunk of data to disk, every once in a while.

Converting BAMT to normal HDD mode is not a solution. As it comes to high-density rigs, hardware really get hot inside. Even if you are using big fans, the environment temperature may be constantly over 45°C which is already too warm for many HDD's as well. Maybe one would be better off just booting from usb stick, then start BAMT processes , and leaving all the work completely in RAM and store logs and RRD files and everything else that should persist via NFS.
donator
Activity: 1731
Merit: 1008
I lost 3 stick out of 9 in the last 3 day. been running for ~2month.  These Ultra-speed brand stick were all I had, epic downtime to come ...

I'm not sure how I'll get the .img onto an HDD
newbie
Activity: 53
Merit: 0
Yes, it is all about average uptime: sadly, however, the cheap USB sticks have been kingston and sandisk. Lost three of 12 in the last four months Sad

Hmm, that's odd. I never lost a SanDisk USB stick / SDHC card to date (Maybe it's just luck), but I lost a number of cheaper ones like Sharkoon et al. But I just take the Premium SanDisk/Kingston products that have an extra cost. Which brand do you count as  "premium" then?

Quote
On a side note, BAMT for me ha much better total uptime due to linuxcoin's included phoenix stalling when it fails to connect, so my average uptime (My uptime, not pool uptime, grump) is pretty dam high these days.

Yes, I second that. After some starting probs, BAMT (i.e. phoenix) is really stable now. I'm thinking about putting these munin rrd files onto a NFS drive to avoid the usb stick being written every 5 minutes.

@lodcrappo: care to make that optional in the /etc/bamt/bamt.conf?

Quote
I did not know -r means it drops now, good to know, thanks!

It's not the "r" (that means reboot), it's the "n". Maybe sometimes it won't succeed, in particular when a GPU boiled really hot and can't be reset from the system,i.e. kexec kernel reboot. You know you've run into this if the broken card doesn't come up after a normal reboot (even with -rn).

Use "coldreboot" then.
member
Activity: 77
Merit: 10
Quote
3) If it seems to have rebooted and is still locked, I found it often had not rebooted yet and I had not waited long enough. Also, try "coldreboot" from the command line.

Next time, try "shutdown -rn now"

This skips the whole init process and shuts all services immediately, even hung ones.

Quote
There have been a few times that I have had it lockup so bad I had to physically reboot it... And all but one of those has been because of a failed USB stick I was using as a drive (But at $5 a crack, and BAMT so easy to configure, I will likely keep using the cheap ones...)

And that is the mistake. Cheap ones you'll buy more often makes them being expensive ones. plus, a cheap stick has cheap flash cells without any wear leveling and maybe built-in errors that occur when the stick is only written often enough. The cow debian live system and noatime mounting of file systems doesn't help on that much.

Better stick (pun intended) to brands like Kingston, SanDisk or the like. They are way faster than the cheap ones, anyway....

A more general thought on this one:

Successful mining is about stability. You can't afford your miners being down all the time because of outages due to overclocking, cheap disks/sticks, exploding PSU's, overheating cases, overheating houses  Wink or anything else that stops your cards from doing billions of hashes per second. Think about it.


Yes, it is all about average uptime: sadly, however, the cheap USB sticks have been kingston and sandisk. Lost three of 12 in the last four months Sad
On a side note, BAMT for me ha much better total uptime due to linuxcoin's included phoenix stalling when it fails to connect, so my average uptime (My uptime, not pool uptime, grump) is pretty dam high these days.

I did not know -r means it drops now, good to know, thanks!
newbie
Activity: 53
Merit: 0
Quote
3) If it seems to have rebooted and is still locked, I found it often had not rebooted yet and I had not waited long enough. Also, try "coldreboot" from the command line.

Next time, try "shutdown -rn now"

This skips the whole init process and shuts all services immediately, even hung ones.

Quote
There have been a few times that I have had it lockup so bad I had to physically reboot it... And all but one of those has been because of a failed USB stick I was using as a drive (But at $5 a crack, and BAMT so easy to configure, I will likely keep using the cheap ones...)

And that is the mistake. Cheap ones you'll buy more often makes them being expensive ones. plus, a cheap stick has cheap flash cells without any wear leveling and maybe built-in errors that occur when the stick is only written often enough. The cow debian live system and noatime mounting of file systems doesn't help on that much.

Better stick (pun intended) to brands like Kingston, SanDisk or the like. They are way faster than the cheap ones, anyway....

A more general thought on this one:

Successful mining is about stability. You can't afford your miners being down all the time because of outages due to overclocking, cheap disks/sticks, exploding PSU's, overheating cases, overheating houses  Wink or anything else that stops your cards from doing billions of hashes per second. Think about it.
member
Activity: 77
Merit: 10
I went to linux for stability but I find it to be less reliable than a windows box.  Or is it just BAMT ?

The thigh is, if ONE gpu fail, ALL fail, and not even a power off will bring it back, on some board a reset will, on others only switching off AC will bring it back (if it does).
And can't do that remotely.

I never had a GPU problem on windows that a reboot(reset) would not fix.



Even with testing with very high OC with my bamt nodes, I found that if it refuses to reboot at first, you can often still get it to reboot in a few ways:
1) Wait. Phoenix is hung, and the system is trying to kill it. Sometimes it takes like, 5 minutes, but it will reboot (I spend some time physically away from my miners, so this was my only option, this is how I found this)
2) ps -A | grep phoenix. Find the pid for phoenix, and "kill -9 " with the pid for phoenix where is. THis also often takes a few minutes. After it kills phoenix, reboot usually works.
3) If it seems to have rebooted and is still locked, I found it often had not rebooted yet and I had not waited long enough. Also, try "coldreboot" from the command line.

There have been a few times that I have had it lockup so bad I had to physically reboot it... And all but one of those has been because of a failed USB stick I was using as a drive (But at $5 a crack, and BAMT so easy to configure, I will likely keep using the cheap ones...)
vip
Activity: 1358
Merit: 1000
AKA: gigavps
I went to linux for stability but I find it to be less reliable than a windows box.  Or is it just BAMT ?

The thigh is, if ONE gpu fail, ALL fail, and not even a power off will bring it back, on some board a reset will, on others only switching off AC will bring it back (if it does).
And can't do that remotely.

I never had a GPU problem on windows that a reboot(reset) would not fix.



I am running 15 boxes with BAMT without issue. Please lower your overclocks to find where they are stable.
donator
Activity: 1731
Merit: 1008
I went to linux for stability but I find it to be less reliable than a windows box.  Or is it just BAMT ?

The thigh is, if ONE gpu fail, ALL fail, and not even a power off will bring it back, on some board a reset will, on others only switching off AC will bring it back (if it does).
And can't do that remotely.

I never had a GPU problem on windows that a reboot(reset) would not fix.

full member
Activity: 196
Merit: 100
Oikos.cash | Decentralized Finance on Tron
I played around with wireless NIC and BAMT a while ago. I've set up things like this in debian before, but still somewhat limited experience. No expert anyway.

First I had some problem with /etc/network/interfaces not beeing persistant / beeing overwritten at boot. This is where I normaly put my nic to be upped. When I editied it and upped it manually it worked like a charm. IP was assigned from DHCP and I used WEP2.
I tried putting the stuff in separate script to be run at startup, but it didn't work. It was messy to work with since I dont have any screens attached to my miners. Someone proposed an alternative soultion; to set up an old wireless router as a WLAN to wire bridge and connect you miners to it. Some old Netgear and ddwrt did the trick.

Still. It sometimes annoys me that I didnt get it to work like I first intended  Cry


So from my exerience:
- BAMT has drivers. Check with lsusb and try first to bring it up manually (ifup or ifconfig wlan0 up)
- You can't use /etc/network/interfaces straight off. Check the startup. From what I can tell this is different from standard debian. There are some hints in this thread on where you could fiddle with network setup.

Thanks, but being a linux ignoramous I'm not about to muck around in any critical subdirectories. If the GUI doesn't see the USB wireless link and give me a chance to select it--well, stick a fork in me, I'm done ;-)
full member
Activity: 226
Merit: 100
I played around with wireless NIC and BAMT a while ago. I've set up things like this in debian before, but still somewhat limited experience. No expert anyway.

First I had some problem with /etc/network/interfaces not beeing persistant / beeing overwritten at boot. This is where I normaly put my nic to be upped. When I editied it and upped it manually it worked like a charm. IP was assigned from DHCP and I used WEP2.
I tried putting the stuff in separate script to be run at startup, but it didn't work. It was messy to work with since I dont have any screens attached to my miners. Someone proposed an alternative soultion; to set up an old wireless router as a WLAN to wire bridge and connect you miners to it. Some old Netgear and ddwrt did the trick.

Still. It sometimes annoys me that I didnt get it to work like I first intended  Cry


So from my exerience:
- BAMT has drivers. Check with lsusb and try first to bring it up manually (ifup or ifconfig wlan0 up)
- You can't use /etc/network/interfaces straight off. Check the startup. From what I can tell this is different from standard debian. There are some hints in this thread on where you could fiddle with network setup.
Pages:
Jump to: