10 ASICminer Blade setup with backplane is rebooting every hour.

bluedragon

newbie

Activity: 29

Merit: 0

Quote from: schnellminer on December 30, 2013, 09:56:22 AM

Quote from: gbsray on December 14, 2013, 03:24:17 PM

Quote from: bluedragon on December 12, 2013, 11:34:05 AM

The setup that's been most stable for me is setting it to use 2 different servers. Because I want it to stay on 1 pool, I set it to use a stratum proxy for 1 of the servers, and connect directly to the getwork server on the same pool for the other server. It switches itself back and forth every so often, but doesn't stop hashing and restart.

That's the trick! Server with 2 NICs on 2 different IP's but same port. Switches between the two but dosent reset!

My blades have been rebooting every hour to the second since I got them. This suggestion is what worked for me with a few changes. Evidently the blades do not like having the same server IP address listed for primary and secondary. What I did was bind a second IP address to the same NIC on my Linux box running the stratum proxy. Changed the secondary address on the blade to the new secondary IP bound on the stratum proxy and problem gone.

Thanks to bluedragon and gbsray for pointing me in the right direction.

I changed my config back to 2 pools and it's been stable, not restarting and not switching servers hourly. I have no idea why it's staying on 1 pool now, and wasn't before. The difference in my config this time is that I started a second stratum proxy listening on a different port (same IP address) for my backup pool, instead of using 1 stratum proxy and 1 getwork server. I just don't get it.

Edit: fixed (I hope) a missing closing quote tag that was messing things up.

schnellminer

newbie

Activity: 4

Merit: 0

Quote from: gbsray on December 14, 2013, 03:24:17 PM

Quote from: bluedragon on December 12, 2013, 11:34:05 AM

The setup that's been most stable for me is setting it to use 2 different servers. Because I want it to stay on 1 pool, I set it to use a stratum proxy for 1 of the servers, and connect directly to the getwork server on the same pool for the other server. It switches itself back and forth every so often, but doesn't stop hashing and restart.

That's the trick! Server with 2 NICs on 2 different IP's but same port. Switches between the two but dosent reset!

My blades have been rebooting every hour to the second since I got them. This suggestion is what worked for me with a few changes. Evidently the blades do not like having the same server IP address listed for primary and secondary. What I did was bind a second IP address to the same NIC on my Linux box running the stratum proxy. Changed the secondary address on the blade to the new secondary IP bound on the stratum proxy and problem gone.

Thanks to bluedragon and gbsray for pointing me in the right direction.

gbsray

member

Activity: 126

Merit: 10

Quote from: bluedragon on December 12, 2013, 11:34:05 AM

The setup that's been most stable for me is setting it to use 2 different servers. Because I want it to stay on 1 pool, I set it to use a stratum proxy for 1 of the servers, and connect directly to the getwork server on the same pool for the other server. It switches itself back and forth every so often, but doesn't stop hashing and restart.

That's the trick! Server with 2 NICs on 2 different IP's but same port. Switches between the two but dosent reset!

defcon23

legendary

Activity: 1120

Merit: 1002

for everyone how need help to setup your blades, just PM me Wink

cheers !

bluedragon

newbie

Activity: 29

Merit: 0

The setup that's been most stable for me is setting it to use 2 different servers. Because I want it to stay on 1 pool, I set it to use a stratum proxy for 1 of the servers, and connect directly to the getwork server on the same pool for the other server. It switches itself back and forth every so often, but doesn't stop hashing and restart.

mitymouse

newbie

Activity: 19

Merit: 0

I'm also having problems with the single blade I just received restarting every hour. I was having problems with it restarting every 2 minutes at first so I moved it in front of the Watchguard firewall and connected it directly to the Comcast business router instead. That fixed the 2 minute reboots but now it restarts every hour like clockwork.

Right now I've got it using a stratum proxy running on Windows Server 2012. I'm going to try switching to other proxies to see if it's something on the Windows server that is causing an interruption.

pontiacg5

sr. member

Activity: 364

Merit: 250

I'd try and neuter my home network to see if that fixes the blades. I have a hunch it's something else on your network causing the blades to loose connection. I remember reading about android phones wifi causing problems, but never noticed a problem with my nex4.

My blades are fine, though not on a backplane.

Miner_2049er

newbie

Activity: 32

Merit: 0

I had a similar anomaly yesterday after I tried overclocking a Blade, and not only did it affect the Blade
that I modified, it also caused frequent reboots of the three other Blades. See my recent post
in this section for more details.

I then reverted back to the stock clock and the modified Blade then worked normally and
the three others have now been running for over 12 hours with no reboots.

No changes to my proxy since I installed it a week+ ago.

There have been other mysterious Blade proxy/reboot issues posted here recently.
I wonder if they are related? If I had to guess, I would guess that there is a
firmware problem relating to basic network access on the Blade that can cause
problems with other devices on the subnet. So the problem with the OP could be
narrowed down to 1 or more Blades that may be interfering with others.
Temporarily isolating them (just pull the Cat-5) may "fix" others from
rebooting. Very interested in the final outcome.

gbsray

member

Activity: 126

Merit: 10

Same exact problem for me guys. I have 10 v2 blades running on a backplane with the specified HP server power supply and lots of cooling. When I point to the cloud stratum proxy that a gentleman on the fourms setup, it works fine and the blades do not restart and mine near the rated 10.7Ghz for days on end, When I point them to my proxy (proxy_mining.exe) running on a windows 2008 server in a data center with a 100+ Mbps connection to the internet, they stop mining and restart about every 2-7 minutes. I running with debugging on right now. Narrowing it down to 1 error message, "share below expected target, then the clean job=false message a few time, then they restart.

Can anyone proved assistance?

MUCH appreciated!

bluedragon

newbie

Activity: 29

Merit: 0

I've had some trouble with my blade restarting too. When I first set it up, it restarted every 5-7 minutes. After cycling the power off and on, it stayed up without restarting for a few hours. In my case, if it gets to the 8 minute mark, it will stay up.

It stayed up for over a week until I manually restarted it today (moved it to my backup pool for a while), and I noticed it restarting itself every 5-7 minutes again. It was okay again after I moved it back to my main pool.

I also noticed that it switches pools about every hour. (Which is annoying because I'd like to set a backup pool, but it seems to load balance instead.)

AnatomicFlack

newbie

Activity: 21

Merit: 0

Still doesn't make it over the hour mark without restarts.

The network is still alive while it's fighting to find a pool because I can still hit the admin page and watch it switching back and forth between the two server configs... but no communication is coming from the miners as I'm watching both the mining_proxy and BFGminer and neither is seeing any traffic during that 2-1/2 min span before it's restarts.

Are there any other pages on the admin side of things for more information?

The last thing I will have to try is a straight getwork server from the blade.

vm1990

legendary

Activity: 1540

Merit: 1002

the only other solution is point it to a getwork protocol server like bitminter and see what happens

jcppkkk

newbie

Activity: 2

Merit: 0

I am mining with the 38G blade cube, and i have same issue, too. At first the cube is fine and can keep uptime over days. However It starts restarting itself after i move my cube to a new environment. Since the cube itself is not changed, the only difference are the PSU and network env.

I have changed the PSU to a server PSU with 12V100A spec, but it's still restarting itself every hour. So the PSU might not be the cause.

I have used stratum-mining-proxy for weeks. the cube runs well with stratum-mining-proxy in my old environment but not in new environment. So i change to bfgminer today, but still get no luck.

The restarting timing is interesting because it's independent with what time i restart my cube. At first the restarting time is 32 minutes past every hour, but it changes to restarting at around 52 minutes past every hour after a few days. No matter what time i manually restart the cube, it always sticks to a fixed minutes past every hour. So it might be something broadcast in LAN periodically or router get overloaded each hour or something else.

Maybe sniff packets on router and take a look what packets are sending(and receiving) at restarting point can do some help.

AnatomicFlack

newbie

Activity: 21

Merit: 0

Thanks for the suggestion, definitely not that though.

I have Verizon FiOS and run my own pfsense router/firewall, with multiple VPNs for my business, so I'm well aware of IP changes when they do happen. FiOS is pretty stable in that regard, on average I get a new IP every few months.

I do have 3 other machines mining and they don't have the same issues. However they are all different setups. One is my Win7x64 machine mining LTC on my crossfire 6790s, a second is an rPi with 12 Block Erupters, the last is actually the same machine I'm using as the proxy, running Ubuntu server 13.10, that has my BFL 30GH, single bitfury red 2.7GH, and 7 Block Erupters. I originally bought that machine, a Gigabyte Brix, to replace the rPi as the single USB connected miner, but it has an Intel Haswell USB3 controller, which I learned the hard way has SEVERELY limited resources for the number of connected devices. So even though it's USB3 (with no USB2s) I can only connect about 12 devices before the system bitches it has the controller has no more available resources.

vm1990

legendary

Activity: 1540

Merit: 1002

this might not be a pool problem but it could be your isp. some isps do rotate ips every so often but its normaly once a day not every hour

if you want try bypassing your proxys and point them to my slushs proxy

37.187.72.157
port 8334

everything else stays the same
if it stays stable feel free to use it as a backup proxy (it is always worth having an offsite proxy just incase of a computer crash)

AnatomicFlack

newbie

Activity: 21

Merit: 0

Ok... so at the 56 min mark I watched all 10 blades keep switching servers for about 2 full minutes, but nothing was actually leaving the blade and hitting either BFG or the proxy.

So the blades are just sorta quitting mining and frantically trying to recover by switching servers.

The two servers it's swapping between are both the same IP, so I guess the next step to eliminate any last variables will be to have a different IP in the config.

AnatomicFlack

newbie

Activity: 21

Merit: 0

Update: Ok, so I FINALLY got to watch what happened when it was in this restart mode.

At about the 56 min mark the miners all stop... wattage drops from it's normal 800W to idling at about 340W. It stays there until about 58:30 when it restarts and winds back up again about 20-30 seconds later. So it would appear it's a pool issue... but WHY? In my config I was pointing both to the same config. So now I've change to the primary being my BFGminer and the second being my proxy_miner on the same box... diff ports of course.

So perhaps it's trying to switch pools (why I have no idea) and freaking out when the current and new are the same. The two ways I'll know is if at the 56 mark I see it's switched to the other server on the admin page. If it doesn't change and still reboots, I'm going to have to say this is firmware on the blade asking it to do this ridiculous action.

stacksmasher

full member

Activity: 121

Merit: 100

Someone was posting something in another thread about this being a function if the blade cannot connect to the pool.... I wonder since these are only 10/100 if network congestion could be an issue?

AnatomicFlack

newbie

Activity: 21

Merit: 0

I know I'm mostly talking to myself... Wink

... but taking down the one "bad" blade didn't stop it from rebooting. So dropped it to 7 blades, to see if it makes any difference. T-minus 30 minutes till next reboot, or maybe not.

Thinking I will try 5 next if 7 doesn't work... and frankly if 5 doesn't work, I'm going back to 10, rebooting or not, I'd still rather be mining with reboots than limping along.

AnatomicFlack

newbie

Activity: 21

Merit: 0

So for giggles, I'm going on a hunch, and pulling the single card that's not hashing as well. So now the PSU is only running 9 blades, so if the PSU was slightly overloaded, perhaps this will resolve it. Either way that blade's errors % kept climbing pretty quickly over time. When I first fired up the blade it was around 4.5%, as of now before pulling it the errors were up to 7.6% while the rest are between 0.75 and 1.25%.

Here is the BFGminer output just after pulling that blade:

Code:

bfgminer version 3.6.0 - Started: [2013-11-28 00:47:58] - [ 0 days 15:50:59]
[M]anage devices [P]ool management [S]ettings [D]isplay options [H]elp [Q]uit
Connected to stratum-lb-usa48.btcguild.com diff 64 with stratum as user XXXXXXX_Blade
Block: ...82f92e0f #272015 Diff:609M ( 4.36Ph/s) Started: [16:38:46]
ST:3 F:0 NB:120 AS:0 BW:[ 57/ 47 B/s] E:446.44 I: 3.38mBTC/hr BS:626k
9 | 93.38/101.6/98.10Gh/s | A:20410 R:278+128(2.0%) HW:21226/1.6%
--------------------------------------------------------------------------------
PXY 0: | 9.98/10.18/ 9.90Gh/s | A: 2074 R: 30+ 13(2.0%) HW: 992/.73%
PXY 1: | 10.27/10.19/ 9.86Gh/s | A: 2052 R: 22+ 20(2.0%) HW: 1702/1.3%
PXY 2: | 12.39/10.15/ 9.84Gh/s | A: 2049 R: 26+ 13(1.9%) HW: 1517/1.1%
PXY 3: | 9.34/10.20/ 9.91Gh/s | A: 2088 R: 22+ 14(1.7%) HW: 1549/1.1%
PXY 4: | 11.86/10.13/ 9.88Gh/s | A: 2019 R: 31+ 6(1.8%) HW: 903/.67%
PXY 5: | 9.66/10.21/ 9.93Gh/s | A: 2009 R: 25+ 17(2.0%) HW: 988/.73%
PXY 6: | DEAD /10.00/ 9.06Gh/s | A: 1837 R: 25+ 11(1.9%) HW:10084/7.6%
PXY 7: | 9.51/10.20/ 9.86Gh/s | A: 2124 R: 38+ 11(2.3%) HW: 1471/1.1%
PXY 8: | 10.44/10.20/ 9.94Gh/s | A: 2083 R: 28+ 11(1.8%) HW: 977/.72%
PXY 9: | 8.86/10.18/ 9.90Gh/s | A: 2075 R: 31+ 12(2.0%) HW: 1044/.77%

AnatomicFlack

newbie

Activity: 21

Merit: 0

Quote from: wsoei on November 28, 2013, 12:51:28 PM

A guy on IRC Slush's pool said his was resetting every 2 minutes after he placed all 10 on the back plain. After some testing, they were stable with 7 blades. At which point he shared his PSU. I didnt write down the name of what PSU he had but another member recommended this PSU to him:

http://www.amazon.com/HP-403781-001-DL380-1000W-Supply/dp/B001U0EM1W/ref=sr_1_2?ie=UTF8&qid=1385616853&sr=8-2&keywords=ATSN+7001044-Y000

Thats about as much info as I can give you since i dont own the blades.

That's one of the PSUs I bought to try. So, good!

I can simulate the 2 min interval reboot, which seems to be part of the blade's firmware if it doesn't sense a connection to any pool it reboots after roughly 2 minutes assuming it's "dead/sick."

Where it's such a regular interval I have a hard time believing it's lack of power, but it's certainly possible. When I checked the blade's power usage with the kill-o-watt I saw about 650W at full load. I will have to throw it back on and keep an eye on what it does when the unit reboots. I haven't had a chance to catch it in the act yet to see if the PSU has any change in the LED or something like that.

There has to be a reasonable explanation for this, even if it's a hardware problem.

wsoei

newbie

Activity: 46

Merit: 0

A guy on IRC Slush's pool said his was resetting every 2 minutes after he placed all 10 on the back plain. After some testing, they were stable with 7 blades. At which point he shared his PSU. I didnt write down the name of what PSU he had but another member recommended this PSU to him:

http://www.amazon.com/HP-403781-001-DL380-1000W-Supply/dp/B001U0EM1W/ref=sr_1_2?ie=UTF8&qid=1385616853&sr=8-2&keywords=ATSN+7001044-Y000

Thats about as much info as I can give you since i dont own the blades.

AnatomicFlack

newbie

Activity: 21

Merit: 0

I have a complete setup I had purchased and setup new from eBay just a few days ago. They are (10) V2 blue boards and came with the backplane and the HP power supply.

I'm having a very strange issue. It will mine like a champ for just shy of an hour, then every single blade restarts.

Every hour (58:30) to be exact, the entire setup restarts itself (according to the uptime on each). Even single blade on the unit restarts. I originally thought it might be firmware on on the individual blades, but I reset one board independently to see if it would "miss" the restart (since they all normally power up at the same exact time the power of turned on) and it still reset with the rest of the blades. The only way it could be the blades is if they are communicating somehow across the backplane, but I didn't think that was going on.

So at this point I believe I have narrowed it down to being the PSU. I am familiar with servers, and realize this is just a hot swap power supply for an HP server. I'm wondering if the particular one I have is resetting itself because it's not seeing some other communication from the actual server it's expecting to be plugged into and it's restarting itself in hopes to recover.

I found a couple of the two models of PSU that work on Amazon and picked up one of each (relatively cheap) to hedge the bet that it's something funky going on with the PSU.

Did I miss a step, or does anyone have any insight into what might be going on.

If it makes any difference, the boards are actually from jonesgear, by way of the ebay seller. So on request of the ebay seller I tried to contact jonesgear directly to ask what might be going on.

Any help, or insight, would be greatly appreciated. For completeness, I have been using both BFGminer's stratum proxy, and slush's to try to get this working, and have used both BTCguild and Slush's pools' to see if it pool communication had anything to do with this, both responded the same and the blades restarted regardless of the combination.

Graph of uptime, showing the regular restarts of every blade: (you can also see the efficiency if getting roasted on each restart, and one blade is definitely worse than the rest at about 5% HW)
https://lh6.googleusercontent.com/-HXsLrkJ62-4/Upd43zkfbnI/AAAAAAAAy3k/8rLfBhGMfO4/s800/Chart%2520context%2520menu%2520-%2520Google%2520Chrome%252011282013%2520115238%2520AM.jpg

Config screen from one of the blades:
https://lh5.googleusercontent.com/-3tLsQP9E_jU/Upd444g-QNI/AAAAAAAAy3s/rVpXUvYJrNg/s800/Fullscreen-capture-11282013-115257-AM.png

Topic: 10 ASICminer Blade setup with backplane is rebooting every hour. (Read 6092 times)