[PicoStocks] 100TH/s bitcoin mine [100th] - page 87.

cedivad

legendary

Activity: 1176

Merit: 1001

How is the IPO going?

buzzdave

vip

Activity: 472

Merit: 250

Its refreshing to see some detail being posted on the project - great stuff guys!

For my part, I have experience building server rooms, racking, cabling and interconnects. I have already identified a build team for the May/June timeframe that will help me bring up 100TH in as short a time as possible. The low heat dissipation is comforting, but with high density racks, I'll be looking closely at our cooling requirements & options. This will be a new adventure with totally new challenges. As we get nearer to reality, I will post up some stats and pictures of the facility.

Megabigpower.com is a startup company tailored to support 100TH as the first large scale mining customer. This will be my primary focus. Additional support for smaller mining rigs or other manufacturers will be added over time.

The megabigpower.com hosting facility is 2,500 sq. feet, expandable to 10,000 sq. feet and enjoys commercial power rates of 2.4c kWh down to as low as 1.7c kWh. I expect it to be one of only a few viable large scale mining options in the long term.

tytus

sr. member

Activity: 250

Merit: 250

For those who like pictures. This is the image of the full custom 55nm 64 round sha2 hashing core. Skilled in the art will recognize vertical ROR lines, bypass capacitance next to flip flop, etc.

tytus

sr. member

Activity: 250

Merit: 250

So to summarize ...

Due to optimized full custom design [more details will be revealed in few weeks], relatively advanced 55nm node and a design choice to operate the chip at 0.6V instead of 1.0V we get an approximate power dissipation of 1Watt from a 3.3GH/s chip [my values are 3.5GH/s but this is not so important].

The project combines 2 vital and complementary assets:
1. bitfury's brilliant chip [best powers dissipation values and lowest chip costs on the market]
2. Dave's low power cost hosting solution [lowest energy prices on the market]

Even though the two persons providing both assets don't know each other personally, I can guaranty that the 100TH-mine investors will effectively benefit from these synergies.

I will try to convince bitfury to stop posting long comments for a while :-) but in 2 months we will come back to the board management tools and I am sure it is better to leave this development to original authors of mining software [Kano/Luke-Jr].

bitfury

sr. member

Activity: 266

Merit: 251

Quote from: 2112 on February 07, 2013, 11:32:41 PM

Quote from: kano on February 07, 2013, 08:45:19 PM

Now ... 0.3ms is too small IMO - doubling to have one job pending - 0.6ms is also too small IMO (the BFL queue design says 20 work items)
So if your queue only allows one work item waiting in it then the code still has to hit a target, that it is going to (sometimes/often?) be late for, due to USB and OS constraints

Thanks for an informative post.

I'm wondering why even use Linux to control chips via SPI? What is the point? I haven't worked on ARM recently, but I did on Xilinx and Microblaze. Going standalone for SPI and I2C access was a major win in terms of power usage, I didn't even bother to measure speed: it was much faster, but not critical in what I did. Only lwIP is somewhat harder to use than the network interface through sockets.

Did Linux SPI driver had any recent major improvements?

I wonder what bitfury has to say about it, or will he just shut up and smile to avoid disclosing some other, much better, solution to the competition.

Very simple. But better code speaks for itself - www.bitfury.org/chainminer.zip - you all can see there hardware definition in spictrl_hw subdirectory - there's spictrl.vhd file, this chip (for second spartan deployment - not rs485 bus like in racks) - altera cpld installed on every board and boards connected in chains. you can see HOW SIMPLE IT IS... and also benefits like predetermined addressing - i.e. I can always tell in frames which board is not working, etc, and easily track it in chain. works like charm. BUT - you have to understand that linux SPI can't do well with bitbanging - so basically you're moving in/out large frames - 4 KB or 16 KB - don't remember - better to look in code. so if you're sending tiny messages to every chip - mechanism of addressing and dispatching should be timeless and work relatively to spi bitstream that linux sends. this is what I achieved actually there - state machine of all spictrl chips on boards is driven exactly by bitstream sent from linux/spi. I see this thing to be superior to rs485 automated address assigning (or usb automated addressing) when something not works and you cannot immediately tell where the trouble is. Manual address assigning is big pain for bulk installations (i.e. with jumpers or whatever). I like when physical position determines address (as in this case).

2 kano - I think different ASIC vendors will make different protocols, because hey would look at problem differently... as from fpga times I think nobody actually tried to solve complexity of building rigs. say what you see now with Avalon - there's difficulties to build them :-) not goes fast.... what I want (why I mentioned passive solution) - that I can say order in Taiwan 10'000 boards and they are manufactured used automated production line, and then with minimal efforts assembled into devices, with minimal labor. Of course with added labor heatsinks and such stuff can be assembled - but that won't scale well to 500-kilowatt installations - so this is probably point where DIY-oriented people will have edge - because they could spend their time to tune equipment, however such tuning at large scale would produce significant delays as big farm maintainers unlikely will have as much incentive as DIYer would. But the main problem I see that with years passed paradigm should shift a bit and mining should produce about 20% ROI per year, not like current expectations to return money ASAP or within 6 month. This would produce situation where attack against network protected by proof-of-work mechanism would be prohibited not by say maskset/NRE cost (as point we are now - we have significant share of NRE investment and low demand for wafers), but with overall chain cost - including wafers AND time to produce (i.e. make it unfeasible even to start another miner project). But this is not today of course - first we should scale and "land" to 28 or 22nm tech node. Also it would be very nice that such device could be bought with guarantees in usual retail electronic shops like you can buy AMD GPU and not with some kind of "preorder" batch which is difficult to fullfill at this stage.

2 2112 - I don't understand why 100 ms is too short period ? Pipelining 20 things in work queue I think is not good - anyway what you need to pipeline is just to compensate communication lag - i.e. to make sure that hashing core is always busy. tolerance to seconds of delay is overkill I think. ARM cpu is perfectly capable to handle 50-100 ms latencies, however not microsecond-scale latencies. I won't laugh, and don't fear to disclose this to competition. There's really tons of little details in this project that actually put this project far ahead of anything I seen now :-) Anyway these would be known when devices will be delivered. This communication mechanism is just one nice finding, tested in real equipment that I liked. I am not against if others will replicate this approach instead of doing more expensive and complicated solutions.

2 mrb - yes - it is me, account is not hacked. I still control web site, and posted some sourcecode. If it is interesting - I may disclose bitstreams for FPGAs - I don't consider them as secret now. But really don't have time to polish sources. So if someone is interested to maintain github or opencores version control - I would be glad to upload. At this time I don't consider that it will impact any way my full-custom design - there's many more new fascinating things invented that are beyond these bitstreams, however fundamentals of rolled hasher design still have roots there.

Tech node is scaled 65nm - so drawn as 65nm but scaled down to 55nm _optically_. So to get 3.8x3.8 die I draw 4.2x4.2 mm sealring+padframe in GDS. Then it is scaled at foundry. I suppose that others who do 65nm has the same as this is trivial part and still labeled as 65nm. About power estimations - actually tytus took my early estimations when I had no core drawn for 65nm, but scaled down "optically" core from 150nm that I was worked before. Main cause to increase power - was that I used dynamic nodes (http://en.wikipedia.org/wiki/Flip-flop_%28electronics%29 - http://en.wikipedia.org/wiki/File:TSPC_FF_R.png - this one is example - TSPC flip-flop - single clock and holds bit in charge of transistor gate), but in 65nm later I have found that I should increase frequency to about 2 Ghz to make that work because of high leakages. I of course could use different threshold voltage transistors (it's available) - but then it kills performance. So finally I dropped dynamic nodes and switched to classic master-slave latch flip-flop. This in turn increased power consumption roughly 1.5 times and increased by about 20% core size. So efficency dropped. But this sacrifice I think is OK, because it dramatically decreases chances that device would not work - I have really good margins everywhere (these margins also tells that really if several tapeouts are affordable I can improve performance maybe twice, but - that's too much time - likely for further evolution this would be done). Also there exists unusual logic styles (resonant logic), that with sacrifice of area could give dramatic power systems - but that's mainly like analog design - so chances to get it working core with single tapeout are low.

But anyway - that means only that I work with tytus, I have trust in him, work with him and know him personally. This means that this project would likely get working chips. But I still clearly say that I haven't worked with Dave and cannot say how smooth this installation will go on his side. So if you're going to invest significant amount there - I think it is best to check it personally with Dave and clarify this yourself. If tytus is trusting Dave - it is his decision, and not mine. For me this 100 Th/s mine is bulk order that of course get good pricing for equipment. And also I see this as a good case how mining should be done - I expect in future expansion of more professional installations of mines - in special places with good electricity prices or at homes where this is used at heating... So it would be again distributed but at a larger scales... Don't think of it like in 2014 100Th/s mine would be something very rare and unique. It makes no sense to operate equipment where electricity is expensive and environment is hot. Those who have cheaper electricity would have edge over time, and significant edge. Please remember what Satoshi said - that mining in indefinite interval would be barely more profitable than electricity spent on mining. That is what all of you should remember, that all higher revenues you get are temporary in nature maybe years... but not forever :-) This means that efforts to design fancy gamer-style end-user devices makes sense only to pump money and doing that for fun. Because when you imagine collocation of massive BFL minirigs + their maintenance - you'll understand why I am saying this - as this product looks glossy, but have more expenses to maintain than rack or passive heater. So if you today compare solutions and check for best opportunity - this is right of course to hunt for best tech, but I think about next block halving you'll compare electricity prices in different locations and not only performance of equipment suppliers but also abilities to scale, maintenance, etc. Maybe at that time I could even disclose design as such chip and design methodology would be something not cutting edge, but well-understood and tech itself would not contain any magic for anyone. Progress is happening really quickly :-)

PS. What I am thinking also about future versions of chips - I can implement hardware chip protection-encryption. That chip would want remote activation code. This could give equipment owners possibility to collocate them without problems for example - as if activation code is not presented - chips won't compute. Before delivery to your collocation - chips can be fused with your key to verify it against challenge from you. Bitcoin itself allows to track nTime - that can be used for timed activation (i.e. solve blocks only when nTime <= - that it supplied by you in signed activation message to chip). This way also fully transparent and zero-trust mining markets could be made. When there's thirdparties that not own equipment maintain security in datacenters. This is just idea - it is neither easy, nor fast to implement it - so it won't be done in first chips, but if there's demand for this idea - likely in next generations I'll implement this. Basically in few words - chip knows its owner.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: 2112 on February 07, 2013, 11:32:41 PM

Quote from: kano on February 07, 2013, 08:45:19 PM

Now ... 0.3ms is too small IMO - doubling to have one job pending - 0.6ms is also too small IMO (the BFL queue design says 20 work items)
So if your queue only allows one work item waiting in it then the code still has to hit a target, that it is going to (sometimes/often?) be late for, due to USB and OS constraints

Thanks for an informative post.

I'm wondering why even use Linux to control chips via SPI? What is the point? I haven't worked on ARM recently, but I did on Xilinx and Microblaze. Going standalone for SPI and I2C access was a major win in terms of power usage, I didn't even bother to measure speed: it was much faster, but not critical in what I did. Only lwIP is somewhat harder to use than the network interface through sockets.

Did Linux SPI driver had any recent major improvements?

I wonder what bitfury has to say about it, or will he just shut up and smile to avoid disclosing some other, much better, solution to the competition.

As I said - I'm a software guy, so I look at it from a software POV

No doubt if you did implement it in hardware/firmware it wouldn't be an issue - but then you'd still need some software interface or I'd expect you'd be spending a long time working on it to write a full miner in firmware+hardware

Just having the miner as thin client to a Stratum pool may be an interesting possibility.

Using something other than USB? I'm not sure I've not dealt with it from that angle.

If it was connected via USB ...
The software issue is simply that you cannot guarantee to always have a constant small time frame that small over a USB bus with USB1.1 or USB2.0 using Bulk/Interrupt transfers - maybe possible with Isochronous or with USB3.0 but I'm not sure since I've not dealt with anything USB3.0 yet or any USB2.0 device with an Isochronous end point.

The right size queue removes that issue almost completely anyway.

I've got a lot of timing logging in cgminer with the USB code that clearly shows with some of the current FPGA hardware (BFL and MMQ) single Bulk transactions are always (USB1.1) or often (USB2.0) above those time frames.
Throw away the design/USB chips they used in the past and yes suddenly this may all be a non-issue.

Until I actually get my hands on any of the ASIC hardware, I can of course only speculate.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: kano on February 07, 2013, 08:45:19 PM

Now ... 0.3ms is too small IMO - doubling to have one job pending - 0.6ms is also too small IMO (the BFL queue design says 20 work items)
So if your queue only allows one work item waiting in it then the code still has to hit a target, that it is going to (sometimes/often?) be late for, due to USB and OS constraints

Thanks for an informative post.

I'm wondering why even use Linux to control chips via SPI? What is the point? I haven't worked on ARM recently, but I did on Xilinx and Microblaze. Going standalone for SPI and I2C access was a major win in terms of power usage, I didn't even bother to measure speed: it was much faster, but not critical in what I did. Only lwIP is somewhat harder to use than the network interface through sockets.

Did Linux SPI driver had any recent major improvements?

I wonder what bitfury has to say about it, or will he just shut up and smile to avoid disclosing some other, much better, solution to the competition.

mrb

legendary

Activity: 1512

Merit: 1028

Quote from: bitfury on February 07, 2013, 09:19:48 AM

Quote from: 2112 on February 01, 2013, 12:56:13 AM

Dude, bitfury is the lead designer at their shop. If I know anything about the people of bitfuty's calibre I'll say that he will have driver's core already debugged on the simulator before the tapeout. This really isn't looking like another seat-of-the-pants outfit.

Yes. this is why I have not posted.

[...]

Performance: 3.3 GH/s _rated performance_, about 7 GH/s maximum
Power consumption: 1 W at _rated_ performance @ 0.6 V, 6 W _maximum_ performance @ 1.0 V.
Thermal characteristics of package: 2 K / W junction-to-pcb and 34 K / W junction-to-ambient.

I am surprised to see you confirm this ASIC project. Assuming that your bitcointalk.org account wasn't hacked (and I don't think it was, given that I am familiar with your writing style), I withdraw my accusation of this project being a scam.

Also, your power estimate range (1170-3300 Mhash/Joule) is more realistic than what tytus posted (3300-5000 Mhash/Joule).

Anyway. You posted a lot of details, but you didn't say precisely what process node you were targetting. 110nm? 90nm? 65nm?

Luke-Jr

legendary

Activity: 2576

Merit: 1186

Don't mind kano's trolling, he doesn't really understand what he's talking about anyway.

BFGMiner 3's new device API (still in development) should have the threading issues we hit last time worked out.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: bitfury on February 07, 2013, 09:19:48 AM

...

Quote from: kano on January 31, 2013, 09:00:55 PM

Waiting for a dev board so I can write the cgminer driver for you

Edit: of course, contact me if you want any suggestions about the MCU design (not doing the design, just optimal details of it's design)

Well. The chips will work in strings with SPI protocol using state-machine. This was tested and found to be nice in second generation of my FPGA boards. I.e. instead of device addresses I just have prefix code that triggers state machine of devices and allow to access chain. From software point of view - sending new jobs and getting results is just as feeding big buffer into SPI and simultaneously reading values where will be answers. This can be done in single thread very efficiently even on slow ARM CPUs.

The goal that I have is about single ARM cpu per 1200-1500 chips that's 3.6 - 4.5 TH/s. So the question is that code should be quite efficient to handle that. Also requests are double-buffered (this means that while one job is processed in chip, another job is pipelined). With ASIC unlike of FPGA job processing would take about 0.3 - 0.4 milliseconds of time. This means that there should be likely not less than one communication every 0.15 ms.

Last time I tried to adapt cgminer for that purpose for much smaller task - 24 spartans, I had to make 48 threads for double-buffering. And to me that seems as complete nonsense. As for 1200 chips it won't work (2400 threads).

Likely I plan that code structurally would look like asynchronous state machine for I/O with bitcoind/pool with protocol like stratum or Luke's getblocktemplate. Second thing - that job generation could be done quickly from template in synchronous fashion when making up request buffer to chips. Then separate thread for SPI I/O - i.e. prepare request buffer, spit it out to SPI while simultaneously reading back data, parsing answer buffer and either send updates to chip and send results to network. I think that cgminer codebase is not well-suited for that - a lot of work would be required to redesign. However cgminer's monitoring is nice compared to what I typically wrote :-)

Yes the performance and original design of the 'old' FPGA code in cgminer is directly related to FPGA (and had no foresight)
To be blunt, serial-USB sux.
That is why over the last 2 months I've been rewriting all that, getting it ready for ASIC - direct USB - which will also give the option to use any USB I/O available with the device - not just the simple serial-USB back and forward that hides anything else available.

The GPU code, on the other hand, is very well designed and handes I/O to a device with MUCH tighter requirements.

Two different people did the original design of those two pieces of code ... ckolivas GPU, Luke-Jr FPGA ... yes I'll stop there and let the code speak for itself.

The current work handling code is based on the idea that a device can only handle one item of work at a time.
ckolivas and I will be rewriting that shortly, since the BFL MCU device has a 20 work input queue (and the Avalon requires ~24 work items at a time) and thus dealing with a device that handles more than one item of work at a time will also make it simple to resolve issues of thread counts etc.

Now ... 0.3ms is too small IMO - doubling to have one job pending - 0.6ms is also too small IMO (the BFL queue design says 20 work items)
So if your queue only allows one work item waiting in it then the code still has to hit a target, that it is going to (sometimes/often?) be late for, due to USB and OS constraints
However, if you are only designing it for in-house use, not a general board to be sold to users, then you can of course optimise the choice of the hardware talking to the USB device and thus minimise the problem there of course.

Anyway, making the MCU queue (or whatever you call it in your device) larger means the code has a much wider target to hit and a much lower chance of it not keeping the queue from going empty.
When the queue is empty, there is idle time, and making the queue a bit bigger will help ensure maximum performance by reducing the possibility of that idle time.
BFL have specified a queue on both work and replies.

It would also be good to have a secondary I/O end point to wait on replies (and have a queue there also)
So two separate threads, one for sending work (and performing device status work requests and replies) and a second one getting work answers

I am going to BFL in a bit over a week to see what hardware they really have and hopefully point a new cgminer at it and get results

Though I'm a software guy, not a hardware guy ... as should be obvious from the above.

tytus

sr. member

Activity: 250

Merit: 250

So to summarize Bitfury's post:

We can lower power consumption 6 times (from 6Watt to 1Watt) by reducing the speed 2 times (from 7GH/s to 3.5GH/s).
Of course, just changing the frequency is not enough. The trick is the reduction of the voltage (from standard 1V to 0.6V).

There are other important elements such as the reliability of the clock tree due to autonomous clocking of very small elements [hashing cores] in contrast to the very complex clock tree in unrolled design. Our chips does not even need a PLL for correct operation (!).

There are other technical aspects that we will disclose later.

bitfury

sr. member

Activity: 266

Merit: 251

Quote from: 2112 on February 01, 2013, 12:56:13 AM

Dude, bitfury is the lead designer at their shop. If I know anything about the people of bitfuty's calibre I'll say that he will have driver's core already debugged on the simulator before the tapeout. This really isn't looking like another seat-of-the-pants outfit.

Yes. this is why I have not posted. Actually initially before successful tapeouts of ASICMINER and then Avalon I have thought that things that were told to me by experienced designers that basically for logic I'll get in hardware same as in simulation is not true - I simply was too conservative to trust such claims. So initially I have choosen to go cheaper technology node to perform tests and so core was implemented and optimized for 150nm node. But - AFTER other successful tapeouts and knowing that simulation match well hardware results for 130nm and 110nm I have changed mind and started to trust it.

So what simulations are performed:
1) Functional simulation - that is - that core as a whole works and computes correctly with correct timing in all corners (i.e. typical, slow, fast wafer);
2) Flip-flop setup/hold time simulation - this is more tricky to explain - but under no circumstances you should have fast-paths or clock skew that violates holds - THIS CANNOT BE FIXED BY LOWERING FREQUENCY, while setup violation can. This is simulated in all corners + monte corners simulations to understand how variations affect sampling + voltage variations due to CMOS logic power consumption; To clarify - fast-path - is wire between flip-flops where signal gets too quickly from one flip-flop to another - and violates HOLD requirement. HOLD requirement is time that signal should not change in flip-flop input during (and after) clock edge. This is typically very short period of time - but unfortunately if this is violated then design won't work even at 1.0 Hz. Clock skew - happens if clock distribution delays (and especially tolerances - as chip components are not precise inside) do not violate sampling time of flip-flops;
3) Power grid simulation - actually simple - to confirm that there's sufficient count of bypass capacitors and no parasitic resonances appear of magnitude that can affect hash core performance - that's it - unfortunately 26%-28% of DIE AREA is just capacitors Sad

not transistors... not logic... that's big sacrifice and it won't be stable especially in low voltage without that... capacitors placed near flip-flops;

For 65nm unlike 150nm all design is static CMOS - i.e. dynamic nodes had to be removed, because of too high leakage.

What simulations are beyond of my experience now - is say yield prediction... This is quite speculative, but we expect not worse than 90% chips that works completely and maybe about 0.1% of chips totally malfunctioning. Also performance variation would be around +- 20% chip to chip.

Finally die dimensions chosen: 3.8x3.8mm
Package: QFN48
Performance: 3.3 GH/s _rated performance_, about 7 GH/s maximum
Power consumption: 1 W at _rated_ performance @ 0.6 V, 6 W _maximum_ performance @ 1.0 V.
Thermal characteristics of package: 2 K / W junction-to-pcb and 34 K / W junction-to-ambient.

So - at _RATED_ performance chip would work without any heatsinks or so - having 40 degrees in room, there would be 75 degrees in chip - very good - without fans, without heatsinks. (THIS IS PERFORMANCE THAT ACTUALLY ALL DEALS WITH THIS CHIPS ARE MADE).

However - MAXIMUM performance is not overvoltage - basically it is still in envelope of gate oxide reliability (oxide thickness is 20 angstrom - so you can apply 1.0 V for long longevity, 2.0 V for likely half year to one year operations).

So - with QFN48 it would be quite challening to get 6 W - as basically you would really work on chilling and likely use Al PCB. This is say up to Dave how to deal with that. In my experience this thing would be very labor-intensive... So maybe they would convert/upgrade equipment later, as they have cheap electricity. However do not take this as endorsment - I haven't worked with Dave yet and cannot guarantee anything except that chips will work. With chips - I can guarantee that it would work, with about 96% during first launch without delays, with 4% with delays + I provide for this mine backup chips from other vendor in case of unacceptable delays. So overall risks that this project would be without chips should be less that 0.1%. Rests of risks that you should asses - that building such MINES are not simple, for example to build single BitFury 110 GH/s rack - it took 2 weeks of labor. This is what should be prepared on side of Dave. Labor is expensive in US. Doing it on his own would take likely 4 weeks hard work to assembly it.

I've thought to post image of core, but looking that BFL posted core image with that black boxes put over there - I decided finally not to post and wait till their tape-out would be confirmed :-) I want to be absolutely sure that they have maskset

Quote from: 2112 on February 01, 2013, 12:56:13 AM

I didn't know that he's Polish or had choosen to work through a Polish scientific establishment. Through Europractice he should have no problem accessing the latest 40nm and 28nm processes at deep discount.

Choosing scientific establishment is very good for such small orders. Really - this thing is more research than production. Because I would like everyone to understand that what we do - i.e. $500k order or even $5m order is peanuts for foundries. Especially on smaller tech nodes. To get seriously treated as direct foundry customer you would either have to be somehow important to them or your order volume should be at least >200 wafers monthly (to be small but direct customer). that's more than even BFL with all of their preorders could sell :-) 3.3 _PETAHASH_ per month :-) Plus - wafers are cheap but assembly is not, and runtime is not as well. To explain why this is small - SINGLE SEMICONDUCTOR FABRICATION PLANT (i.e. single tech node _LINE_) produces typically 40'000 - 80'000 _WAFERS_ per month. With orders like 6 or 12 wafers without regular demand - you're too small. I really hope that some day demand for proof-of-work chips will be high and that mining devices would be available globally. But this is not today. today we're small, and with small steps we should go forward.

Quote from: kano on January 31, 2013, 09:00:55 PM

Waiting for a dev board so I can write the cgminer driver for you

Edit: of course, contact me if you want any suggestions about the MCU design (not doing the design, just optimal details of it's design)

Well. The chips will work in strings with SPI protocol using state-machine. This was tested and found to be nice in second generation of my FPGA boards. I.e. instead of device addresses I just have prefix code that triggers state machine of devices and allow to access chain. From software point of view - sending new jobs and getting results is just as feeding big buffer into SPI and simultaneously reading values where will be answers. This can be done in single thread very efficiently even on slow ARM CPUs.

The goal that I have is about single ARM cpu per 1200-1500 chips that's 3.6 - 4.5 TH/s. So the question is that code should be quite efficient to handle that. Also requests are double-buffered (this means that while one job is processed in chip, another job is pipelined). With ASIC unlike of FPGA job processing would take about 0.3 - 0.4 milliseconds of time. This means that there should be likely not less than one communication every 0.15 ms.

Last time I tried to adapt cgminer for that purpose for much smaller task - 24 spartans, I had to make 48 threads for double-buffering. And to me that seems as complete nonsense. As for 1200 chips it won't work (2400 threads).

Likely I plan that code structurally would look like asynchronous state machine for I/O with bitcoind/pool with protocol like stratum or Luke's getblocktemplate. Second thing - that job generation could be done quickly from template in synchronous fashion when making up request buffer to chips. Then separate thread for SPI I/O - i.e. prepare request buffer, spit it out to SPI while simultaneously reading back data, parsing answer buffer and either send updates to chip and send results to network. I think that cgminer codebase is not well-suited for that - a lot of work would be required to redesign. However cgminer's monitoring is nice compared to what I typically wrote :-)

PS.

2 BFL Trolls here - I first thought to troll you hard, but then ... eh... I have not so much time for this fun... Let me finish and get to tape-out, I would gladly troll you in my spare time :-) it's so fun :-) For now - better troll BFL to release chips :-) I already put bet with money against it :-) If you're so BFL-oriented - then - put 'Yes' for BFL - it would be better than trolling :-)))) http://bitbet.us/bet/7/bfl-will-deliver-asic-devices-before-march-1st/

Hope you also understood how W / GH/s metrics works :-)))))) And how voltages works :-))) And that actually these claims of BGA packages because they can't fit in QFN with power requirements look like complete nonsence :-)))) As this is engineering choice actually - if you have same chip - what W / GH/s choose.... However they may be too greed to downrate chips and make them more stable...

tytus

sr. member

Activity: 250

Merit: 250

Sorry for the delay, but I don't want to post without information content.

PicoStocks is not going according to plan mainly due to the 72th-mess. I have an asset that I want to float but I have no time to finish rewriting the confidential business plan into something that can be posted on PicoStocks. I hope to be able to do this this week. It will take few more months for PicoSocks to look more serious.

Bitfury will post simulation results shortly. We will discuss what information can be disclosed tomorrow during an intercontinental bitcoin mining cartel meeting :-). The power dissipation of the chips is not so important for the mine as we get 4 times less costs just by using Dave's hosting. It is more important for the designer. Also (as I mentioned before) at the same node you can get much better results if you operate the chips at lower frequency but of course you need more chips [device costs increase]. We can get the same dissipation level [GH/J] as announced by BFL if we overclock the chips extremely and use extensive cooling (now only in simulations).

Monster Tent

full member

Activity: 238

Merit: 100

Quote from: RHA on February 04, 2013, 11:21:49 AM

You are a direct competitor for them, so you should sit quietly and don't troll them here.
Let the people make the assessment, not you. You can supply us with facts (with links), not an opinion (which is biased).

The market for bitcoin stock exchanges is already saturated and there arent that many quality stocks to go around which is why you see so many passthroughs.

At this point opening yet another SE is akin to starting a new forum and having only one person posting anything. You also need to trust that the exchange wont fly by night and neither will the stock listed. Its especially bad when both the SE and its only stock are owned by the same people.

The main reason to use third party exchanges is for accountability but the way this whole operation of "picostocks" is structured seems like massive avoidance of accountability and thus the appearance of a giant scam.

MPOE-PR

hero member

Activity: 756

Merit: 522

Quote from: RHA on February 04, 2013, 11:21:49 AM

You are a direct competitor for them,

No douchey, that's not how things work. They would possibly like to represent themselves as a competitor to BTCT, BF, Havelock etc, but so far it's just a pipe dream. If that doesn't get through and you'd like a simpler explanation with less words and more visual cues, see this older post made in reply to similar nonsense.

RHA

sr. member

Activity: 392

Merit: 250

You are a direct competitor for them, so you should sit quietly and don't troll them here.
Let the people make the assessment, not you. You can supply us with facts (with links), not an opinion (which is biased).

MPOE-PR

hero member

Activity: 756

Merit: 522

Quote from: RHA on February 04, 2013, 08:26:17 AM

I think the PicoStocks project is the main one. A group of people is investing quite big money to make it running.
The 100TH/s bitcoin mine is an auxiliary project, started as a mean to promote PicoStocks.

A "group of people" investing "a lot of money" in "a project" that makes no sense is pretty much the sales pitch for every investing scam there ever was or ever will be.

These folks are muchly behind the curve, tis not 2011 anymore.

RHA

sr. member

Activity: 392

Merit: 250

I think the PicoStocks project is the main one. A group of people is investing quite big money to make it running.
The 100TH/s bitcoin mine is an auxiliary project, started as a mean to promote PicoStocks.

Monster Tent

full member

Activity: 238

Merit: 100

Just kick people in the balls. It will have the same effect as "investing" in mining operations but be less painful.

MrTeal

legendary

Activity: 1274

Merit: 1004

Quote from: mrb on February 03, 2013, 11:27:47 PM

I completely agree with MPOE: tytus' proposal seems to be the one of a scammer.

Other data point: he claims "0.2-0.3 Watt per GH/s". This is completely unrealistic. This means 3300-5000 MHash per Joule. To get anywhere close to this, you would need to design the chip around the 22-32nm process node!

With a similar level of optimization as BFL, 32nm would probably do it. That is of course assuming BFL hits its power targets. It's not impossible that they could come in with a more optimized design than BFL has on a 40nm node and meet those specs.

That being said, I haven't seen bitfury post here and there's nothing to suggest that they will produce anything close to that in the timeline they suggest. If it's true I wish them the best, but there's all kinds of sayings about the burden of proof on exceptional claims, and there's nothing here to back that up yet.

Topic: [PicoStocks] 100TH/s bitcoin mine [100th] - page 87. (Read 470203 times)