Pages:
Author

Topic: Hacking BFL Monarchs and servicing them while times are weird. - page 9. (Read 21034 times)

legendary
Activity: 2212
Merit: 1001
While it's a big effort, I would think "immersion cooling" would solve the problem for the FET's. The fan and radiator could be lifted from the structure, and the actual hashing board could be immersed. This is all off the cuff on my part, since I have never actually done, only seen it done on the Cray-2 Supercomputer in the 1980s.

Just my $.02 on the topic (which is EXTREMELY interesting).

It has been done with a KnC board as a test,not sure if he followed thru on it  Wink

https://bitcointalksearch.org/topic/--7216
alh
legendary
Activity: 1843
Merit: 1050
While it's a big effort, I would think "immersion cooling" would solve the problem for the FET's. The fan and radiator could be lifted from the structure, and the actual hashing board could be immersed. This is all off the cuff on my part, since I have never actually done, only seen it done on the Cray-2 Supercomputer in the 1980s.

Just my $.02 on the topic (which is EXTREMELY interesting).
hero member
Activity: 568
Merit: 500
Are you using the restartoncrash with the .bat file or bfgminer.exe?
legendary
Activity: 3094
Merit: 2239
I fix broken miners. And make holes in teeth :-)
Shoot. Did you try the auto-restarter thing?

C
Don't think that would work, a manual restart gives me a blank bfgminer screen after probing the com ports, I need to restart the pc and powercycle the miners to get something hashing again. Now 4.7 behave erratic too, and I'm on 4.8 again.

Ok. 4.8 is solid here on app restarts and hasn't crashed yet. That 4.2 code worked even hot starting the miner. Haven't tried running it for any length of time on my um.. raspberry pi. Note it takes a minute after the code comes up to start hashing, might be doing speed checks. if you run multiple miners that time may be additive (check back in 10 mins for example)

C
hero member
Activity: 568
Merit: 500
Shoot. Did you try the auto-restarter thing?

C
Don't think that would work, a manual restart gives me a blank bfgminer screen after probing the com ports, I need to restart the pc and powercycle the miners to get something hashing again. Now 4.7 behave erratic too, and I'm on 4.8 again.
legendary
Activity: 3094
Merit: 2239
I fix broken miners. And make holes in teeth :-)
So in the meantime I have been thinking about cranking up the speed on this unit. 700gh is nice, but maybe we can get a bit more. Like 800-900gh.

From what I can see, the big problem isn't heat on the chips; those water blocks are keeping the chips nice and cool. Way cooler than the BFL singles, and because the chips seem to be thinner, they can dump heat up as easily as down. We should be able to warm them up more, especially as it get cold.

The problem is the FETs. Specifically high-side. I noticed even with that big heat sink and fan on the back of the board plus the heat sinks on top the one side was running at 100c. That's hot. Putting a fan on the front side dropped it into the 60's, but it points out that if we go faster we need to dump heat. And fast.

We could put a corsair type water block on the back to cool the plane, but I don't think that's where the heat is. And we would need a double sized one to cool under the chokes as well. So what kind of top cooling solution would provide:

1) A way to dissipate the heat off the tops of those FETs into a water block
2) A way to move heat off the FETs *fast* as it builds up.

That's the trick: A standard heat sink isn't going to cut it. Maybe a heat pipe system designed for memory chips or something. Does anyone make custom heat pipes or something like it?

Then we either need code or a way to figure out how to fake the clock signals. Don't have a clue on either. The VRM looks to be hand-hackable, the reference voltage is set via an 8 bit input. Might be able to hot-wire it.

Hm.

C
legendary
Activity: 3094
Merit: 2239
I fix broken miners. And make holes in teeth :-)
Shoot. Did you try the auto-restarter thing?

C
hero member
Activity: 568
Merit: 500
BFGMiner running without shutting down for 3 days now, on 4.7 and both monarchs on an un-powered  usb hub.

edit;   and 1 hour later it has shut down again. Trying 4.8 now...
legendary
Activity: 3094
Merit: 2239
I fix broken miners. And make holes in teeth :-)
There are 64 'tiles' per chip and 16 engines per tile for a total of 1024 engines per chip.  I've not run across a pair of chips that have all 1024 running, I think the closest was around 2028.

Ok, so each "tile" is kind of like a little BFL chip of yore. Those had 16 little engines in each chip, but were addressed in software as a single "chip". Different chips had different numbers of engines running, a few had all 16, most had 14-15 and a really crappy chip would have only 8 engines (65nm chips).

Question in my mind is what makes a "B" chip. I'd kind of expect engines to be out at random, some cortexes with all engines, some dead as rocks based on the variations in manufacture. Or maybe the missing engines will run at lower clocks, and this is the best balance of power to performance.

Meantime today's runs included stopping the BFGMiner and restarting a few times without powering down the Monarch. I'm running on a single 500 watt Corsair power supply with both plugs plugged in, keeping it simple. No errors at all with the exception of the queue errors that pop up because of stale jobs left when you stop and start. It hums quietly, does not change the hum, does not pop any errors once running. Likewise speed is a solid 690gh, solid as a rock.

So I think I can say the following:

  • if the unit is running and hashing normally the odd errors do not pop up
  • If I restart BFGMiner when there are no errors, the errors do not show up again once running with one exception
  • If I power the unit off then on, the pitch of the FET/Chokes changes and the errors start popping up at random
  • If I leave it off for over an hour the errors go away upon restart and the FETs are quieter

Never dull.

C
hero member
Activity: 658
Merit: 500
CCNA: There i fixed the internet.
Lightfoot Here is a plot for you of the Cortex listings:

LINK

Feel free to leave comments in the sheet and here


Sheet updated with Slok's 2 boards and breakdowns
hero member
Activity: 532
Merit: 500
Also what speed is being reported by Eligius? Say the 22 minute numbers, do they match the lower or higher of the 3 BFG numbers?

The pool is kind of the final solution on speed, you're not getting rejects so I'm curious.
22.5 minutes 809.57 Gh/s 254464sh, went for a drink, refreshed the page, and....22.5 minutes 765.59 Gh/s 240640sh, next refresh and the same 22.5 minutes   809.57Gh/s 254464sh again.

I never considered eligius' status page to be accurate, is the 22 min. number any good as a reference? BFGminer low shows 776, mid 807, high 854 and up till 940
BCP both units have the solid on led, that goes on/off when hashing. One has a second led just below the first one, it blinks fast about 6 times (so fast it's impossible to count the actual number), then it goes off for a sec., it does this from powering up, and keeps doing it. There are more leds, just noticed 1 or 2 between the pcb and the copper plate, solid on, I see the reflection on the sledge's base plate under the right chip.

Lightfoot have you tried getting anything by jtag, or is that what you got as said earlier, with chiliflash?

Ognasty, I restart the pc and miners every time I make a change or after a shut down bfgminer, shouldn't be a problem there. The fan on the fets made no change so far, after 6 hours hashing.

edit; no time for the utp now, but I doubt it will change anything.
There are actually 4 lit LEDs under the water block there (7 total LEDs), they indicate good voltages present for 5.0, 3.3, 1.8 and either 1.5/1.2 volts.

As seen from the data you uploaded, you have a board with 'B' chips on it as the engine counts are roughly 800 engines per chip.

There are 64 'tiles' per chip and 16 engines per tile for a total of 1024 engines per chip.  I've not run across a pair of chips that have all 1024 running, I think the closest was around 2028.

legendary
Activity: 3094
Merit: 2239
I fix broken miners. And make holes in teeth :-)
Quick update, purring like a kitten, 680gh, no errors.

I'll take it apart tomorrow. I have to see what the board looks like. I guess it's bye bye warranty or something.

C
hero member
Activity: 658
Merit: 500
CCNA: There i fixed the internet.
Chart pron this evening once I get off work!

Things I noticed at a glance:

All cores on both boards are pretty significantly below 5G compared to lightfoot's which had a high of like 6.9-7G

Your right board has a totally dead cortex; either a broken data line or malformed hashing engines. Methinks data line see next line.

The cortex and ASIC channel count on right board is 127 vs left(128)

Queue depth differs by 2 as well

This would lead to ~ 3 queue items per cortex (128*3=384; 127*3=381)

-Taug
hero member
Activity: 568
Merit: 500
Thanks Taugeran,

DEVICE: BitForce SC-28nm SHA256         DEVICE: BitForce SC-28nm SHA256
FIRMWARE: 1.3.0                                 FIRMWARE: 1.3.0
Serial Number: 4170662                         Serial Number: 4170752
ASIC Installed: 2                                  ASIC Installed: 2
IAR Executed: NO                                 IAR Executed: NO
PLL Latency: 3                                     PLL Latency: 3
Channel Parallelization: YES @ 16            Channel Parallelization: YES @ 16
Max Queue ID: FFFF                              Max Queue ID: FFFF
Scan Interval: 50ms                              Scan Interval: 50ms
Total Engines: 1652                              Total Engines: 1584
CORTEX-00: 4590MH/s                          CORTEX-00: 3311MH/s
CORTEX-01: 4515MH/s                          CORTEX-01: 4214MH/s
CORTEX-02: 3672MH/s                          CORTEX-02: 4354MH/s
CORTEX-03: 4144MH/s                          CORTEX-03: 4214MH/s
CORTEX-04: 3848MH/s                          CORTEX-04: 4108MH/s
CORTEX-05: 3732MH/s                          CORTEX-05: 2528MH/s
CORTEX-06: 3848MH/s                          CORTEX-06: 3160MH/s
CORTEX-07: 3311MH/s                          CORTEX-07: 2952MH/s
CORTEX-08: 4264MH/s                          CORTEX-08: 3978MH/s
CORTEX-09: 3864MH/s                          CORTEX-09: 3672MH/s
CORTEX-0A: 4424MH/s                          CORTEX-0A: 4214MH/s
CORTEX-0B: 4508MH/s                          CORTEX-0B: 4515MH/s
CORTEX-0C: 4284MH/s                          CORTEX-0C: 4354MH/s
CORTEX-0D: 3010MH/s                          CORTEX-0D: 3672MH/s
CORTEX-0E: 2709MH/s                          CORTEX-0E: 3672MH/s
CORTEX-0F: 3010MH/s                          CORTEX-0F: 2576MH/s
CORTEX-10: 4186MH/s                          CORTEX-10: 3311MH/s
CORTEX-11: 4043MH/s                          CORTEX-11: 3311MH/s
CORTEX-12: 4043MH/s                          CORTEX-12: 3978MH/s
CORTEX-13: 4108MH/s                          CORTEX-13: 4214MH/s
CORTEX-14: 3256MH/s                          CORTEX-14: 3504MH/s
CORTEX-15: 3421MH/s                          CORTEX-15: 2408MH/s
CORTEX-16: 3600MH/s                          CORTEX-16: 2844MH/s
CORTEX-17: 3612MH/s                          CORTEX-17: 3732MH/s
CORTEX-18: 3924MH/s                          CORTEX-18: 3504MH/s
CORTEX-19: 4592MH/s                          CORTEX-19: 3848MH/s
CORTEX-1A: 3660MH/s                          CORTEX-1A: 3256MH/s
CORTEX-1B: 3848MH/s                          CORTEX-1B: 4736MH/s
CORTEX-1C: 4108MH/s                          CORTEX-1C: 3978MH/s
CORTEX-1D: 4018MH/s                          CORTEX-1D: 3732MH/s
CORTEX-1E: 3157MH/s                          CORTEX-1E: 4340MH/s
CORTEX-1F: 4018MH/s                          CORTEX-1F: 3913MH/s
CORTEX-20: 2952MH/s                          CORTEX-20: 2664MH/s
CORTEX-21: 5152MH/s                          CORTEX-21: 3796MH/s
CORTEX-22: 3978MH/s                          CORTEX-22: 3157MH/s
CORTEX-23: 4043MH/s                          CORTEX-23: 4018MH/s
CORTEX-24: 3612MH/s                          CORTEX-24: 3311MH/s
CORTEX-25: 3848MH/s                          CORTEX-25: 3010MH/s
CORTEX-26: 2960MH/s                          CORTEX-26: 4440MH/s
CORTEX-27: 3256MH/s                          CORTEX-27: 3732MH/s
CORTEX-28: 3597MH/s                          CORTEX-28: 3256MH/s
CORTEX-29: 4186MH/s                          CORTEX-29: 3612MH/s
CORTEX-2A: 3864MH/s                          CORTEX-2A: 4515MH/s
CORTEX-2B: 3476MH/s                          CORTEX-2B: 4305MH/s
CORTEX-2C: 3366MH/s                          CORTEX-2C: 4088MH/s
CORTEX-2D: 4515MH/s                          CORTEX-2D: 4515MH/s
CORTEX-2E: 3913MH/s                          CORTEX-2E: 2920MH/s
CORTEX-2F: 4200MH/s                          CORTEX-2F: 4144MH/s
CORTEX-30: 3936MH/s                          CORTEX-30: 4144MH/s
CORTEX-31: 4830MH/s                          CORTEX-31: 3311MH/s
CORTEX-32: 3924MH/s                          CORTEX-32: 4088MH/s
CORTEX-33: 4043MH/s                          CORTEX-33: 3366MH/s
CORTEX-34: 4424MH/s                          CORTEX-34: 3672MH/s
CORTEX-35: 3913MH/s                          CORTEX-35: 3540MH/s
CORTEX-36: 4590MH/s                          CORTEX-36: 3978MH/s
CORTEX-37: 3060MH/s                          CORTEX-37: 4088MH/s
CORTEX-38: 4662MH/s                          CORTEX-38: 2754MH/s
CORTEX-39: 4152MH/s                          CORTEX-39: 2336MH/s
CORTEX-3A: 4662MH/s                          CORTEX-3A: 3311MH/s
CORTEX-3B: 4186MH/s                          CORTEX-3B: 3311MH/s
CORTEX-3C: 3060MH/s                          CORTEX-3C: 3157MH/s
CORTEX-3D: 3792MH/s                          CORTEX-3D: 3913MH/s
CORTEX-3E: 2488MH/s                          CORTEX-3E: 3892MH/s
CORTEX-3F: 2709MH/s                          CORTEX-3F: 4088MH/s
CORTEX-40: 4515MH/s                          CORTEX-40: 2408MH/s
CORTEX-41: 3978MH/s                          CORTEX-41: 3366MH/s
CORTEX-42: 4440MH/s                          CORTEX-42: 3212MH/s
CORTEX-43: 3796MH/s                          CORTEX-43: 3396MH/s
CORTEX-44: 3552MH/s                          CORTEX-44: 4512MH/s
CORTEX-45: 3612MH/s                          CORTEX-45: 3780MH/s
CORTEX-46: 3492MH/s                          CORTEX-46: 0MH/s
CORTEX-47: 3552MH/s                          CORTEX-47: 2849MH/s
CORTEX-48: 4354MH/s                          CORTEX-48: 2142MH/s
CORTEX-49: 4144MH/s                          CORTEX-49: 3783MH/s
CORTEX-4A: 3672MH/s                          CORTEX-4A: 3978MH/s
CORTEX-4B: 3504MH/s                          CORTEX-4B: 4214MH/s
CORTEX-4C: 3848MH/s                          CORTEX-4C: 3245MH/s
CORTEX-4D: 3783MH/s                          CORTEX-4D: 3906MH/s
CORTEX-4E: 4018MH/s                          CORTEX-4E: 3240MH/s
CORTEX-4F: 4018MH/s                          CORTEX-4F: 3614MH/s
CORTEX-50: 4665MH/s                          CORTEX-50: 3311MH/s
CORTEX-51: 4816MH/s                          CORTEX-51: 3060MH/s
CORTEX-52: 4736MH/s                          CORTEX-52: 4380MH/s
CORTEX-53: 3552MH/s                          CORTEX-53: 4284MH/s
CORTEX-54: 4736MH/s                          CORTEX-54: 4380MH/s
CORTEX-55: 4018MH/s                          CORTEX-55: 3060MH/s
CORTEX-56: 3504MH/s                          CORTEX-56: 4088MH/s
CORTEX-57: 3492MH/s                          CORTEX-57: 4088MH/s
CORTEX-58: 4424MH/s                          CORTEX-58: 3212MH/s
CORTEX-59: 3913MH/s                          CORTEX-59: 3492MH/s
CORTEX-5A: 4590MH/s                          CORTEX-5A: 2920MH/s
CORTEX-5B: 4088MH/s                          CORTEX-5B: 4354MH/s
CORTEX-5C: 3731MH/s                          CORTEX-5C: 3256MH/s
CORTEX-5D: 3913MH/s                          CORTEX-5D: 2870MH/s
CORTEX-5E: 2920MH/s                          CORTEX-5E: 2870MH/s
CORTEX-5F: 4365MH/s                          CORTEX-5F: 4018MH/s
CORTEX-60: 3366MH/s                          CORTEX-60: 3396MH/s
CORTEX-61: 4380MH/s                          CORTEX-61: 3256MH/s
CORTEX-62: 4736MH/s                          CORTEX-62: 4320MH/s
CORTEX-63: 3366MH/s                          CORTEX-63: 4365MH/s
CORTEX-64: 4074MH/s                          CORTEX-64: 4018MH/s
CORTEX-65: 4144MH/s                          CORTEX-65: 3906MH/s
CORTEX-66: 4380MH/s                          CORTEX-66: 4365MH/s
CORTEX-67: 4018MH/s                          CORTEX-67: 3552MH/s
CORTEX-68: 4354MH/s                          CORTEX-68: 3731MH/s
CORTEX-69: 4665MH/s                          CORTEX-69: 4170MH/s
CORTEX-6A: 4284MH/s                          CORTEX-6A: 4380MH/s
CORTEX-6B: 3913MH/s                          CORTEX-6B: 3962MH/s
CORTEX-6C: 3256MH/s                          CORTEX-6C: 4365MH/s
CORTEX-6D: 3552MH/s                          CORTEX-6D: 4018MH/s
CORTEX-6E: 4144MH/s                          CORTEX-6E: 3627MH/s
CORTEX-6F: 3962MH/s                          CORTEX-6F: 4380MH/s
CORTEX-70: 4648MH/s                          CORTEX-70: 3731MH/s
CORTEX-71: 4354MH/s                          CORTEX-71: 3201MH/s
CORTEX-72: 3965MH/s                          CORTEX-72: 2960MH/s
CORTEX-73: 4816MH/s                          CORTEX-73: 3796MH/s
CORTEX-74: 4144MH/s                          CORTEX-74: 3311MH/s
CORTEX-75: 3552MH/s                          CORTEX-75: 3504MH/s
CORTEX-76: 4018MH/s                          CORTEX-76: 4380MH/s
CORTEX-77: 3157MH/s                          CORTEX-77: 3146MH/s
CORTEX-78: 3864MH/s                          CORTEX-78: 3913MH/s
CORTEX-79: 4830MH/s                          CORTEX-79: 4590MH/s
CORTEX-7A: 4515MH/s                          CORTEX-7A: 3848MH/s
CORTEX-7B: 4284MH/s                          CORTEX-7B: 4088MH/s
CORTEX-7C: 4284MH/s                          CORTEX-7C: 4214MH/s
CORTEX-7D: 3311MH/s                          CORTEX-7D: 3731MH/s
CORTEX-7E: 3396MH/s                          CORTEX-7E: 3906MH/s
CORTEX-7F: 2664MH/s                          CORTEX-7F: 4018MH/s
Total Processing Power: 502813 MH/s      Total Processing Power: 467971 MH/s
ASIC CORTEX Count: 128                      ASIC CORTEX Count: 127
ASIC Channels: 128                             ASIC Channels: 127
Queue Depth:383                                Queue Depth:381
Critical Temperature: 0                         Critical Temperature: 0
Total ASIC Thermal Cycles: 0                 Total ASIC Thermal Cycles: 0
Total PCB Thermal Cycles: 0                  Total PCB Thermal Cycles: 0
OK                                                    OK
legendary
Activity: 3094
Merit: 2239
I fix broken miners. And make holes in teeth :-)
Thanks. In the meantime I found if I power the miner off, then on I can get the spurious queue errors. If I power off then leave it off for 1.5 hours they don't come back when I power on and mine. I'll let it run today to verify, and start building a Raspberry Pi so I can catch the errors. If anyone else has a dump of the queue errors and can send them to me, PM me.

C
hero member
Activity: 658
Merit: 500
CCNA: There i fixed the internet.
Slok, your units are more interesting now. If you can run the ZCX command and tell us how many and what speed the cortexes are, we can compare them to mine. Maybe your units run all of them slow. Maybe you have B grade chips and whole bunches are not running. If they are all running slow, we might be able to boost.

C
ok, but how, can't find that chiliflash app.

Current Firmware: https://www.dropbox.com/s/zoewijezzhfl3bs/Chili14e.hex
Flash Utility: https://www.dropbox.com/s/xpccbhfkbpinov8/ChiliFlash.exe
Utility Source: https://www.dropbox.com/s/zn8gkojly2f87wx/ChiliFlash.rar

Alternate FW:
Voltage limited to 1.1V, can prevent overcurrent conditions on cards with almost all cores enabled.
https://www.dropbox.com/s/rfa4kuigov99sg8/Chili14e1v1.hex
1V limited, to bring power use down to ~4J/GH
https://www.dropbox.com/s/b9f9ne7ilwl3ap8/Chili14e1v0.hex

Previous FW
https://www.dropbox.com/s/8qhhoqmvtk6i6jj/Chili14d.hex

Flashing instructions:
Open the exe file in Windows (or inspect the source of the VS project and compile yourself), and select the comm port for the card from the drop down list. Note that you need the default FTDI/VCP drivers (ie, the ones you'd use for bfgminer) installed to see a comm port, you can't have the WinUSB drivers at this point.
Open the port, and you should see a bunch of information including the current FW version, the number of cores on each chip and their frequency.
Click browse and navigate to the hex file you want to flash, and then click program. The progress bar will increment and the LEDs on the unit will binary count as it programs, and when it's done the program will prompt you to power cycle the unit, and the inner and outer 4 LEDs on the board will alternatingly flash until you do.


Should you need to program the FTDI chip, you can use the following links
FTDI Programming Template: https://www.dropbox.com/s/gkbcih32atez8qb/BFL%20FTDI.xml
FT_PROG utility: http://www.ftdichip.com/Support/Utilities/FT_Prog_v2.8.2.0.zip
hero member
Activity: 568
Merit: 500
Slok, your units are more interesting now. If you can run the ZCX command and tell us how many and what speed the cortexes are, we can compare them to mine. Maybe your units run all of them slow. Maybe you have B grade chips and whole bunches are not running. If they are all running slow, we might be able to boost.

C
ok, but how, can't find that chiliflash app.
hero member
Activity: 658
Merit: 500
CCNA: There i fixed the internet.
If he throws numbers up I can throw together charts to start a comparison table and averages etc
legendary
Activity: 3094
Merit: 2239
I fix broken miners. And make holes in teeth :-)
No whining sound at all, I'm pretty happy with the units and their speed, just curious what the errors mean and a possible fix, if necessary. About that second led too, what does it indicate?
Running both off a single server psu, I was wondering, could a 60GH SC psu (the BFL ones) cope with the 210-250W draw of a 400GH unit?
Well, hm. If fans make no diff and your units run cool we can deduce that at 200gh per chip the FETs are not getting very hot. At 350gh per chip (me) they start to get hot and need a fan. They can run at 100c, but like on the singles, that's hot.

A limiter might be the FETs, if we put a water block on them and the FET drivers it might allow us to draw off a lot of heat as we go to 800 and beyond. So.... Who do we know that can find water blocks of a size that can cover all six power channels from FET to FET?

That's project 1. Get that in place, I'll order one, and we can start cranking.

Slok, your units are more interesting now. If you can run the ZCX command and tell us how many and what speed the cortexes are, we can compare them to mine. Maybe your units run all of them slow. Maybe you have B grade chips and whole bunches are not running. If they are all running slow, we might be able to boost.

C
hero member
Activity: 568
Merit: 500
No whining sound at all, I'm pretty happy with the units and their speed, just curious what the errors mean and a possible fix, if necessary. About that second led too, what does it indicate?
Running both off a single server psu, I was wondering, could a 60GH SC psu (the BFL ones) cope with the 210-250W draw of a 400GH unit?
Pages:
Jump to: