Pages:
Author

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 24. (Read 119429 times)

donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
It's been running almost 6 hours now, and it's not getting the performance I was getting with .92b:

H:143/71,0,71 X:231 C:162,140,160 E:0/0,0,0 T:1m   |  H:178/60,59,58 E:18/22,11,20 A:777 R:14 T:5h17m3s  

With eligius reporting 3h avg. of just ~180Mh/s. Lot's of invalids: [ztex:0:2  ]   invalid nonce: 0x8b170ab6

It's probably the clock calibration code.

What kind of results do you get when you set the frequencies manually and disable clock calibration with

  java -Dtriconemining.recalibrate_clock=false -jar tml.jar ztex:0

?
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Code:
Caused by: java.lang.IndexOutOfBoundsException: Device number out of range. Valid numbers are 0..-1
        at ztex.ZtexScanBus1.device(ZtexScanBus1.java:174)
        ... 8 more

Yes, I get this too from time to time.  It's a bug in ztex's USB interface.  Sometimes it just gets into a weird state and refuses to appear on the USB bus.  I don't have the time/resources/etc to track it down, but he should.  Anyways, power-cycling the board always fixes it.


I then restarted the computer, powered off all boards and tried again. Working nicely now, I get quite many errors, but it's trying to run at crazy frequencies Smiley. I'll leave it to mine for a while and report stats later when in converges to optimal clock rate. I installed some RAM heat sinks on the underside aswell.

Thanks!  You're running 0.95, right?  0.93 definitely has a bug in the clock calibration code.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Exception in thread "main" java.io.IOException: java.lang.RuntimeException: Error sending data, ztex error -22

That's an error from ztex's code.

Any ideas?

Try powering the board off for 5 seconds and powering it back on.

Doesn't help.
This issue is 100% reproducible for me.


Nobody else has reported this, and -22 is not a valid error code for libusb.

Try a JTAG cable and see if that works.

Also, please report this to ztex as a bug in his USB interface (don't worry, it's far from being the first one!)
hero member
Activity: 560
Merit: 500
It's been running almost 6 hours now, and it's not getting the performance I was getting with .92b:

H:143/71,0,71 X:231 C:162,140,160 E:0/0,0,0 T:1m   |  H:178/60,59,58 E:18/22,11,20 A:777 R:14 T:5h17m3s  

With eligius reporting 3h avg. of just ~180Mh/s. Lot's of invalids: [ztex:0:2  ]   invalid nonce: 0x8b170ab6                                                                     
hero member
Activity: 560
Merit: 500
Had some difficulties starting up initally:

Code:
Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at com.triconemining.board.Ztex$ZtexChip.(Ztex.java:114)
        at com.triconemining.board.Ztex$ZtexBoard.(Ztex.java:44)
        at com.triconemining.board.Ztex.getBoard(Ztex.java:32)
        at com.triconemining.bitcoin.miner.Main.main(Main.java:358)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.triconemining.board.Ztex$ZtexChip.(Ztex.java:96)
        ... 3 more
Caused by: java.lang.IndexOutOfBoundsException: Device number out of range. Valid numbers are 0..-1
        at ztex.ZtexScanBus1.device(ZtexScanBus1.java:174)
        ... 8 more

I then restarted the computer, powered off all boards and tried again. Working nicely now, I get quite many errors, but it's trying to run at crazy frequencies Smiley. I'll leave it to mine for a while and report stats later when in converges to optimal clock rate. I installed some RAM heat sinks on the underside aswell.
sr. member
Activity: 448
Merit: 250
Exception in thread "main" java.io.IOException: java.lang.RuntimeException: Error sending data, ztex error -22

That's an error from ztex's code.

Any ideas?

Try powering the board off for 5 seconds and powering it back on.

Also, make sure his miner isn't running in the background.

Doesn't help.
This issue is 100% reproducible for me.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
at default clock everything was invalid and the 'beta' is stuck at 0.001, meaning clock keept increasing.

Okay, the good news is that I think I fixed this.  I've posted 0.95 which includes the fix.

The bad news is that it still takes about three hours to "find" the optimal frequency.  Yeah, I know this is not acceptable.  I'm working on getting it to converge faster.  The problem is that it needs "enough" samples just above and just below the optimal frequency in order to be convinced it's in the right place -- otherwise it will keep wandering around.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I had no luck with newest bitstreams, 92a and 92b couldn't run at more than 100mhz, yet 91 was working at 142 IIRC.

Yes, I apologize for the lack of new bitstreams the last few days.  Xilinx's tools have a huge pile of undocumented rules about BUFGMUX placement and if you run afoul of them you don't find out until ~20hrs into the build.  This has happened three times now.  I appreciate your patience.  Under ordinary circumstances there should be a new build every night.  In my experience simply experimenting with different bitstreams on different chips tends to get you an extra 10MH/s simply due to random variations from chip to chip and bitstream to bitstream aligning -- although you have to have several to try and right now there are only three (two of which have known defects!).

This is one reason why I put a lot of effort into ensuring that every release of the software is compatible with all previously-released bitstreams.  Your mixing-and-matching bitstreams and jarfiles is actually supported!  I will try to maintain this.

I know it's probably tiresome to hear me keep saying "wait for the next build" but I'm pretty much in the same boat. Smiley


I ramped up the voltage quickly from 1.3 to 1.33 and I am seeing some very nice results.... at 164/172/165  (X:250) right now

Yowza.
hero member
Activity: 714
Merit: 500
Psi laju, karavani prolaze.
Heya,

I had no luck with newest bitstreams, 92a and 92b couldn't run at more than 100mhz, yet 91 was working at 142 IIRC.
93 was giving me problems as well (only java code is different) - at default clock everything was invalid and the 'beta' is stuck at 0.001, meaning clock keept increasing.
Now, I tried this with the 91/92 combo but there was no difference, this time, the 91 bitstream with 93 java code ran all accepted with its default clocks so I ramped up the voltage quickly from 1.3 to 1.33 and I am seeing some very nice results.... at 164/172/165  (X:250) right now, shares are accepted and its only running for 15m, so clocks are still going up and down.... I can't post any final results but this is way better than anything before for me.

How to do it:

Code:
jar xvf tml-0.91.jar
jar xvf tml-0.93.jar
cp /com/triconemining/board/tml-ztex.bit from 91 to 93
jar cvf tml-custom.jar *
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Licensing Clarification

At Luke-Jr's request…

If you've looked at the TML host-side software source code, you'll notice that several files have public domain headers, but most have "all rights reserved" headers.  The only reason I do not public-domain the whole thing (host-side) is that I do not want to wind up in a situation where somebody creates a more-popular fork and my users demand that I support somebody else's fork of my own code.  That is the one and only reason why the licenses are anything other than public-domain.  I've left them as "all rights reserved" since I don't know the right way to achieve this effect via licensing, and at the moment I have many other more pressing things to deal with than figuring it out.

I'd like to make it absolutely clear that reading the code in order to understand the protocol spoken between the TML and the Java code is expressly permitted and encouraged, although I make no guarantee about the stability of that interface over time.  The steps performed by the Java code while "mediating" the signcryption conversation between the TML and my servers are very likely to evolve.

I am not going to sue anybody over use or misuse of the Java code.  I just don't want to deal with supporting third-party forks right now; my plate is full as it is.

Also, at some point the Java code will export a JSON interface so you can "drive" it from code written in any language you like.  However, this is currently a fairly low-priority feature and I would not expect it to be ready in the next month or even two months.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Exception in thread "main" java.io.IOException: java.lang.RuntimeException: Error sending data, ztex error -22

That's an error from ztex's code.

Any ideas?

Try powering the board off for 5 seconds and powering it back on.

Also, make sure his miner isn't running in the background.
sr. member
Activity: 448
Merit: 250
I followed the instructions on the tricone-mining site, and this is what happened:

   *  it means your power supply is sagging.                        *
   *                                                                *
   ******************************************************************

[ztex:0    ] programming FPGA
[ztex:0    ]   done programming FPGA
Exception in thread "main" java.io.IOException: java.lang.RuntimeException: Error sending data, ztex error -22
        at com.triconemining.board.Ztex$ZtexChip.flush(Ztex.java:207)
        at com.triconemining.board.Ztex$ZtexChip.scan(Ztex.java:167)
        at com.triconemining.board.MiningChip.read(MiningChip.java:48)
        at com.triconemining.bitcoin.miner.Miner.checkMagicNumber(Miner.java:166)
        at com.triconemining.bitcoin.miner.Miner.(Miner.java:34)
        at com.triconemining.bitcoin.miner.Main$1.(Main.java:359)
        at com.triconemining.bitcoin.miner.Main.main(Main.java:359)
Caused by: java.lang.RuntimeException: Error sending data, ztex error -22
        at com.triconemining.board.Ztex$ZtexChip.flush(Ztex.java:193)
        ... 6 more


Any ideas?
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
TML 0.93 is posted.  Many new software features, no bitstream change.  TML 0.94 will change the bitstream but not the software.

Changelog:

Code:
30.Jun.2012  Release v0.93, no new bitstream
             Automatic clock calibration using logarithmic regression
             New, more-compact status line
             ANSI colorization
             Support for nexus6 boards
             added triconemining.num_tml_read_attempts
             DCM: only print mult/div when slow-starting
             derate default clocks for 0x4fea1574 to 162/150/148
             force load new job after invalid nonce
             Bug fixes

Screenshot:

donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
enterpoint has sold over a 100 boards (without a decent bitstream)

How did that happen?!?
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Maybe you should try port it to cairnsmore1 next? I think enterpoint has sold over a 100 boards (without a decent bitstream), so there's a market Smiley

I will be happy to provide them a bitstream within 24 hours of them submitting a BDK implementation.  It's really easy stuff, fill-in-the-blank.

I am not going to implement the software glue code for any more boards, including ztex 1.15y, so please don't ask.  This is a lot like how Linus and Microsoft don't write BIOS code -- they have the motherboard manufacturers do it.  If you want to sell a motherboard, you have to write a few lines of BIOS code.  It's not a big deal.

The boardmakers choose the chips to use as "glue logic".  Most of them used chips which are massive, massive, massive overkill for the task at hand -- far more complicated than is necessary for something as simple as bitcoin mining.  They picked these chips, they should deal with the complexity -- eat your own dog food.  After the headaches of dealing with ztex's interface (which is a whole microcontroller with it's own freaking instruction set and compiler!) I realize that this is an incredibly inefficient use of my time.  Also, I don't have any other boards to test on anyways (and I don't want free boards so please don't offer them -- donating a free board seems to make hardware manufacturers feel entitled to something in return).  I did the ztex 1.15x implementation mainly so that I could see how my homebrew boards compare to something professionally designed/manufactured.

The BDK is basically two files of "fill-in-the-blank" code.  The Ztex implementation was only 20 lines before I converted it to use reflection (so it can compile without the ztex jar file) and I'll do the reflection-conversion for future BDK submissions myself.  Boardmakers already have code that does all of this stuff; it's just a matter of pasting the right code into the right blank and debugging it.  If they picked the "simplest glue chip that could possibly work" the debugging will be easy.  Sadly most of them didn't do this.

So, please petition your boardmakers, not me.  That said, I'll accept BDK submissions from anybody -- so if you have a pile of boards from manufacturer XYZ and can't get XYZ to cooperate it might be a good idea to simply spend an afternoon doing it yourself.

Thanks.

End rant.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I'll try heatsink + 92b next, but after that I think it needs more power to go higher?

0.92b (which is 0.92 official) has a timing error in ring #2, although it's not as bad as the one in 0.92a.  Next bitstream should fix this and let ring #2 run faster.

There are also a few other improvements, mostly dealing with how the PLLs and BUFGMUXes are arranged.

We've had a run of bad luck with the last few builds (toolchain runs all the way up to the late stages of PAR before barfing), so this afternoon I will be releasing 0.93 with lots of new software features but the same bitstream.  As soon as we have a new bitstream that will be 0.94.

I think I'll try to maintain this pattern of odd-numbered releases being software-only changes and even numbered releases being bitstream-only changes.

Also, I have one backplane (6 chips) of my mine running the mainline codebase now.  Once that is stable for a few days I will switch the rest of the mine over, and once that runs for a few days I will declare things ready for production use (i.e. ok to expect 99.99% uptime from the signcryption servers).
hero member
Activity: 560
Merit: 500
It's been running overnight now and looks like it's going to converge to 227ish:



I'll try heatsink + 92b next, but after that I think it needs more power to go higher? Maybe you should try port it to cairnsmore1 next? I think enterpoint has sold over a 100 boards (without a decent bitstream), so there's a market Smiley
hero member
Activity: 560
Merit: 500
Is that on an unmodified 1.15x?

Yes.  Only modification is a few VGA coolers stuck to the bottom of the board.

Cool! I'll stick some leftover sinks to mine tomorrow aswell.

I'm running unmodified 1.15x with 0.92a @ 160,150,144 and getting 3hour avg of 206 with 0.18% stales on eligius.

What is your error rate?  At those clock frequencies you should be getting 227MH/s.  Also, 0.92b (aka 0.92 official) is preferred; 0.92a has a lot of problems.

0% errors. I might be unlucky (Born under a bad sign). I decided to try a, as b gave me errors on the third ring (2) (I was running it faster tho). I'll try b tomorrow with underside heatsinks installed, but let a run overnight. Might just be my luck.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Is that on an unmodified 1.15x?

Yes.  Only modification is a few VGA coolers stuck to the bottom of the board.


I'm running unmodified 1.15x with 0.92a @ 160,150,144 and getting 3hour avg of 206 with 0.18% stales on eligius.

What is your error rate?  At those clock frequencies you should be getting 227MH/s.  Also, 0.92b (aka 0.92 official) is preferred; 0.92a has a lot of problems.
hero member
Activity: 560
Merit: 500

At 164/152/146 = 232MH/s we were getting 0.5% errors, so we downclocked to 162/150/146 = 230MH/s and let it run overnight -- 0% errors.  Here's the hashrate measured at Eligius: 225MH/s of shares and 2% stales is exactly 230MH/s.  We're working on improving the stales to something more like 1%.  Power consumption is 10.68W at the 12V input meaning efficiency is 21.17MH/J.


Is that on an unmodified 1.15x? I'm running unmodified 1.15x with 0.92a @ 160,150,144 and getting 3hour avg of 206 with 0.18% stales on eligius.
Pages:
Jump to: