Author

Topic: Avalon ASIC users thread - page 154. (Read 438516 times)

sr. member
Activity: 490
Merit: 255
June 26, 2013, 08:55:34 AM
ckolivas -

Do you think generated heat or the power capacity of all the other chips is causing instability on these overclocked machines?
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 26, 2013, 08:46:41 AM
I've uploaded new firmware

http://ck.kolivas.org/apps/cgminer/avalon/20130626-1/openwrt-ar71xx-generic-tl-wr703n-v1-squashfs-factory.bin

This one includes a minor calculation bug fix internally, and I changed the web UI to allow you to discretely set any frequency you like instead of the drop-down box, changing the default from 282 to 300 since I don't think anyone runs these at 282 any more. This will allow you to manually fine tune it yourself, should you find a sweet spot for your hardware. I now have the highest hashrate I've ever had on a stock batch 2 at 83.7GH so far (but it's been only 20 mins so that might be just luck). EDIT: it was just luck, but it's still never been higher.
legendary
Activity: 1666
Merit: 1185
dogiecoin.com
June 26, 2013, 07:52:58 AM
Something is wrong.
Cgminer says 80 GH/s advantage but Slushs pool only 70 GH/s...
Pools are just estimates. As long as the shares match on cgminer and pool everything is okay
fhh
legendary
Activity: 1206
Merit: 1000
June 26, 2013, 07:50:11 AM
Something is wrong.
Cgminer says 80 GH/s advantage but Slushs pool only 70 GH/s...

take a look in the rejects in cgminer Status


3.1.1 crashed after about 5 hours running very constantly @325MHz
will now give 350 @ try
sr. member
Activity: 266
Merit: 250
June 26, 2013, 06:55:41 AM
Something is wrong.
Cgminer says 80 GH/s advantage but Slushs pool only 70 GH/s...
legendary
Activity: 1666
Merit: 1185
dogiecoin.com
June 26, 2013, 06:50:58 AM
Can anyone think of a reason not to get this PSU for a 3 module silver and a 3 module black Avalon with overclocking intentions ?
http://www.nexustek.nl/Nexus_RX-1K_Modular_Quiet_Power_Supply_1000W.htm
single rail 12V 80A
ATX 20+4 pins
ATX12V/EPS12V 4+4 pins
4x PCI-E 6+2 pins (only 2 will be used)
Case dimensions - W x H x D (mm) 150 x 86 x 165

TIA for your thoughts!
1) Its not major branded
2) It can only provide 900W on the 12V. You'll see some PSUs rated at 1KW providing 1100W. So consider this an 850W, I wouldn't push it much more.
3) Don't expect it to last 10 years
hero member
Activity: 826
Merit: 1001
June 26, 2013, 05:42:19 AM
Can anyone think of a reason not to get this PSU for a 3 module silver and a 3 module black Avalon with overclocking intentions ?
http://www.nexustek.nl/Nexus_RX-1K_Modular_Quiet_Power_Supply_1000W.htm
single rail 12V 80A
ATX 20+4 pins
ATX12V/EPS12V 4+4 pins
4x PCI-E 6+2 pins (only 2 will be used)
Case dimensions - W x H x D (mm) 150 x 86 x 165

TIA for your thoughts!
fhh
legendary
Activity: 1206
Merit: 1000
June 26, 2013, 03:08:06 AM
so everything is fine with HW 1,2%

but I got a lot of rejected shares from bitparking about 10% the the gain on hashrate with 341 MHz was nearly eaten up

now running at fixed 325 MHz and no rejected shares
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 26, 2013, 02:15:14 AM
Auto tries to keep HW errors below 1.5%. Are you sure you're not mining at a higher diff?

So the shown HW errors are a multiple of the the diff mining at?
having a higher percentage

cgminer restarted in the night, MHz is again at 341

I'm getting a high rate of rejects from the pool so cgminer is showing me nearly 79GHash/s but on the pool bitparking its only around 71GHasch/s like it was at 300 MHz?
watching this
The hardware errors need to be divided by the diff... I'm absolutely sure you're not at 10-20% errors.
fhh
legendary
Activity: 1206
Merit: 1000
June 26, 2013, 02:12:21 AM
Auto tries to keep HW errors below 1.5%. Are you sure you're not mining at a higher diff?

So the shown HW errors are a multiple of the the diff mining at?
having a higher percentage

cgminer restarted in the night, MHz is again at 341

I'm getting a high rate of rejects from the pool so cgminer is showing me nearly 79GHash/s but on the pool bitparking its only around 71GHasch/s like it was at 300 MHz?
watching this
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 26, 2013, 12:58:36 AM
Very much dependent on the chips, so this can only be a wild guess, but... 80+ degrees?

thank you CKolivas! 80C will cook me and my tiny apartment. I think I need to figure a way to vent the heat directly outside without letting the rain and snow in,
Hah, well don't take my word for it, as I said, it's pure speculation.
legendary
Activity: 1512
Merit: 1000
@theshmadz
June 26, 2013, 12:56:21 AM
I am seeing little or no improvement by cooling with a portable A/C.

Unit with A/C
1h 37m 58s  83896.29  temp3 43    freq(auto) 354

Unit without A/C
6h 54m 22s  83111.32   temp3 53    freq(auto) 353
I guessed this might be the case since the temperatures really aren't getting into the error range even with regular air cooling - especially since it's 3 degrees at my home overnight and the hashrate doesn't go up. I suspect the hashrate will only get higher with more voltage given to the chips.

just curious what might the "error range" be? it's getting rather hot in here and I think I might have to buy another AC unit... summer is right around the corner...
Very much dependent on the chips, so this can only be a wild guess, but... 80+ degrees?

thank you CKolivas! 80C will cook me and my tiny apartment. I think I need to figure a way to vent the heat directly outside without letting the rain and snow in,
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 26, 2013, 12:39:41 AM
I am seeing little or no improvement by cooling with a portable A/C.

Unit with A/C
1h 37m 58s  83896.29  temp3 43    freq(auto) 354

Unit without A/C
6h 54m 22s  83111.32   temp3 53    freq(auto) 353
I guessed this might be the case since the temperatures really aren't getting into the error range even with regular air cooling - especially since it's 3 degrees at my home overnight and the hashrate doesn't go up. I suspect the hashrate will only get higher with more voltage given to the chips.

just curious what might the "error range" be? it's getting rather hot in here and I think I might have to buy another AC unit... summer is right around the corner...
Very much dependent on the chips, so this can only be a wild guess, but... 80+ degrees?
legendary
Activity: 1512
Merit: 1000
@theshmadz
June 26, 2013, 12:34:56 AM
I am seeing little or no improvement by cooling with a portable A/C.

Unit with A/C
1h 37m 58s  83896.29  temp3 43    freq(auto) 354

Unit without A/C
6h 54m 22s  83111.32   temp3 53    freq(auto) 353
I guessed this might be the case since the temperatures really aren't getting into the error range even with regular air cooling - especially since it's 3 degrees at my home overnight and the hashrate doesn't go up. I suspect the hashrate will only get higher with more voltage given to the chips.

just curious what might the "error range" be? it's getting rather hot in here and I think I might have to buy another AC unit... summer is right around the corner...
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 26, 2013, 12:10:03 AM
I am seeing little or no improvement by cooling with a portable A/C.

Unit with A/C
1h 37m 58s  83896.29  temp3 43    freq(auto) 354

Unit without A/C
6h 54m 22s  83111.32   temp3 53    freq(auto) 353
I guessed this might be the case since the temperatures really aren't getting into the error range even with regular air cooling - especially since it's 3 degrees at my home overnight and the hashrate doesn't go up. I suspect the hashrate will only get higher with more voltage given to the chips.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 25, 2013, 10:11:07 PM
Thanks for the detailed info! I noticed that during the first 10 or so hours the overclocked avalon was stable, but then it becomes more and more unstable, even the outside temp dropped significantly during night, cgminer restarted repeatedly, I feel that instability might comes from FPGA. What could be the cause of that? Have you observed same accumulated instability over time?

P.S. also sent 1B to you, cgminer still rules  Cool
And thank you Wink

I'm sure instability can manifest in any number of ways, and it's probably either resetting the device regularly due to the chips failing or idling frequently due to the PSU not keeping up or something along those lines.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 25, 2013, 10:08:19 PM
How is the auto balancing between maximising clock speed but minimising fan speed? What is the hierarchy?
Unlike the GPU code, they're totally independent as. Clock speed is determined solely by hardware errors whereas fanspeed is determined by temperature. HW errors tend to run hand in hand with temperature rise on this sort of hardware whereas GPUs are designed to be deterministic right up to failure so hw errors are meant to almost never happen.
legendary
Activity: 1988
Merit: 1012
Beyond Imagination
June 25, 2013, 10:00:26 PM
A few notes about the auto-clocking approach.

First and foremost, you can fry your hardware as you are running your avalon out of specification, especially if you try it on a batch 1 device with its lower power and quality PSU.

As is virtually always the case, manually fine tuning the final result will always be better than an automated process that guesses. With time I wish to get rid of the requirement to have fixed intervals and allow the user to specify any arbitrary value for the frequency, though the interface coping with it is a bit of an issue at the moment.

Ironically some people are finding the frequency a little too high and others a little too low. I suspect everyone is looking at a different endpoint for what is an ideal frequency in their eyes. The targets I've set are based on hardware error as a percentage, with hysteresis of +/- 0.25% - this is because a .5% increase in hardware errors works out to the amount the hashrate would rise with 2Mhz increments; i.e. if your hardware error count is going up at the same rate as the hashrate should rise, you are wasting energy. Ideally, a regression plot is what would be needed, getting the hashrate rise with each increment and the hw error percentage rise, and seeing when one grows faster than the other, but this is absurd stats to try to go looking for, especially when the values fluctuate wildly under normal circumstances only. By default with avalon-auto, you will get hardware errors of 1~1.5% . When looking at the hardware error count, make sure you are comparing it to the diff1 shares and not the accepted since you will almost certainly be mining at higher diff. Hardware errors are harmless in their own right but indicative of how hard you're pushing the chips for their available voltage and cooling. It sounds like these chips are capable of much more with more voltage but no one's done said mod yet.

The way to calculate hardware error percentage is:
HW * 100 / (diff1 + HW)

It's also worth mentioning that to simplify the calculation of different frequencies, the values passed to the avalon with this latest firmware on the "regular values", i.e. 300 and below, is slightly lower than the values that would have been passed to it, but it should make only a negligible difference to hashrate, lost in the noise of normal variance that happens with hashrate. The "timeout" value passed is also smaller now, which means you may hit the limit at lower speeds than you used to - but the old timeouts were too high, and even if you apparently had a higher hashrate, if you go back and check your stats you may find you were getting more rejects. This is because the higher timeouts were leading to duplicate shares being generated so it is only a disadvantage.

A sure fire sign that you're overdoing it is cgminer repeatedly being restarted by the avalon watchdog, or periods of hashrate dropping, or smoke coming out of your PSU.

Thanks for the detailed info! I noticed that during the first 10 or so hours the overclocked avalon was stable, but then it becomes more and more unstable, even the outside temp dropped significantly during night, cgminer restarted repeatedly, I feel that instability might comes from FPGA. What could be the cause of that? Have you observed same accumulated instability over time?

P.S. also sent 1B to you, cgminer still rules  Cool
legendary
Activity: 1666
Merit: 1185
dogiecoin.com
June 25, 2013, 09:56:35 PM
How is the auto balancing between maximising clock speed but minimising fan speed? What is the hierarchy?
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
June 25, 2013, 08:44:48 PM
Hi ckolivas, thanks for everything. What I have noticed is STROMBOM's firmware + 325 mhz is giving about 1-2% HW errors. On the other hand the latest you put out with the --auto and temp targetting is throwing 15-20% HW errors at the same clock. Is there any way to combine the best of both worlds and get strombom's level of HW errors, but with the ability to control the temp? Would be greatly appreciated.
No idea why that would be the case. Auto tries to keep HW errors below 1.5%. Are you sure you're not mining at a higher diff?

Right? No idea here either. I literally have the exact settings saved and switched between the two firmwares. Auto was setting the clock to 327, but even without Auto and manually set to 325, the HW error was still 15-20%, compared to strombom's 2%. So weird!
Try restarting it a few times from the interface perhaps? I find it a bit less reliable to start up normally. But yeah, I don't know why that would be the case...

Tried restarting multiple times from the interface, still seeing 15-20% HW errors. Soo weird.
Hmm... Auto wont start changing clocks unless the actual nonces returned are within 10% of expected, so perhaps try enabling auto and start at lower clocks like 300.
Jump to: