Goliath Miner | Bitcointalksearch.org

VinCeCream

member

Activity: 89

Merit: 10

Quote from: CryptoCluster on September 18, 2013, 09:36:41 AM

Quote from: eve on September 18, 2013, 09:32:12 AM

Quote from: shapemaker on September 18, 2013, 07:27:13 AM

Quote from: joeventura on September 18, 2013, 07:13:40 AM

I predict BTC68 will be the cost

$14 USD a GH

At that price there wouldn't be much point in buying. I'd guess 45-50 BTC would be a decent spot.

20-30 btc will be more attractive

And 0.5-1 BTC even more attractive.

Indeed ...

yohan

sr. member

Activity: 462

Merit: 251

Quote from: tigerbit on September 18, 2013, 04:07:01 PM

CM3 being Avalon based, right? Or was that the super FPGA hybrid?

Is there even any interest in Avalon based options any longer? Surely it's no longer viable.

It is not much extra work over what we have to do for CM4 to make CM3 hash so we think it is worth completing that work. We have enough stock of chips to give us some return for doing the work. It is also possible that the second generation Avalon will appear as promised and if those are compatible with gen1 footprints we can build CM3 with gen2 chips as soon as they are available or make small design changes to accomodate the gen2.

tigerbit

member

Activity: 80

Merit: 10

CM3 being Avalon based, right? Or was that the super FPGA hybrid?

Is there even any interest in Avalon based options any longer? Surely it's no longer viable.

spiccioli

legendary

Activity: 1379

Merit: 1003

nec sine labore

Quote from: rocks on September 18, 2013, 03:27:33 PM

Quote from: spiccioli on September 18, 2013, 03:00:32 PM

Chips are 22,5 EUR each (plus VAT if you're an end user).

Chip price is insane, those chips should cost less than one EUR each to make (maybe even less).

Agreed, these chip prices are currently priced at a steep markup to the actual costs. Most of this is due to the fact that the chip designers need to recoup their development & NRE costs, plus make a profit.

I think they're priced so high because chip designers need to keep competition low, they're all setting up private pools which is where their earnings will come from.

Avalon batch #1 were priced 1500 USD and each unit uses 240 chips and has a full aluminum case and heatsinks weighting 20 Kg plus a decent PSU.

At 22 EUR each chip an Avalon had to be 6500 USD just in chips... they did try with batch 3 to price them in this way, but I think that in the end they found that mining with the units was better than selling them because selling units increases difficulty for everyone.

spiccioli

yohan

sr. member

Activity: 462

Merit: 251

Quote from: bit_wizard on September 18, 2013, 03:49:36 PM

Any word on the CM3?

We are working on CM3 in parallel to CM4 and we have development prototypes that look a bit like the picture of the CM4 where we are working on 1 cluster of chips. A lot of the software control functions are actually common with CM4 and I expect both products will be ready in a similar timeframe.

bit_wizard

sr. member

Activity: 314

Merit: 250

Any word on the CM3?

rocks

legendary

Activity: 1153

Merit: 1000

Quote from: spiccioli on September 18, 2013, 03:00:32 PM

Chips are 22,5 EUR each (plus VAT if you're an end user).

Chip price is insane, those chips should cost less than one EUR each to make (maybe even less).

Agreed, these chip prices are currently priced at a steep markup to the actual costs. Most of this is due to the fact that the chip designers need to recoup their development & NRE costs, plus make a profit.

When mining reaches break-even with limited ROI, purchases of new chips/rigs will slow or stop. This happened in the GPU/FPGA era. When this happens all of these chip vendors will be forced by competition to drop prices much closer to their actual manufacturing cost to keep making some sort of profit, and pricing will become more stable and reasonable.

spiccioli

legendary

Activity: 1379

Merit: 1003

nec sine labore

Quote from: rocks on September 18, 2013, 12:41:46 PM

Competitive pricing is by far the most important issue here.

Chips are 22,5 EUR each (plus VAT if you're an end user).

Chip price is insane, those chips should cost less than one EUR each to make (maybe even less).

Enterpoint had a good product with CM1s and first units where priced right at 520 EUR/each + VAT or thereabout.

If CM4s end up costing more than 50-55 BTC it is very difficult to breakeven and if they price them cheaper they'll end up selling a ton, so making breakeven more difficult as well.

spiccioli

shapemaker

full member

Activity: 238

Merit: 100

I run Linux on my abacus.

Quote from: rocks on September 18, 2013, 12:41:46 PM

Competitive pricing is by far the most important issue here.

I (and most people I suspect) just want a simple board populated with chips. In other words the exact same model as the Cairnsmore1 FPGA boards, where most of the cost was the FPGAs themselves and everything else was minimized.

My issue with the Cairnsmore2-3 proposals were all of the expensive racking and other 'engineering' put in raised the cost over the price of the base chips.

If this project and pricing goes the same way as Cairnsmore1 FPGA boards, which maximized the chip BOM vs. everything else, then I would be interested.

Agreed. There are quite a few BF based products coming very soon now. The primary differentiating factors now are
a) how cheaply a complete product can be made, and
b) how much juice one can get out of the chips.

Performance ties neatly into a) since if you can minimize the amount of chips used while still getting a decent hashrate, you can push price lower than competition. I would say simplicity at this point is key, not fancy pants features that just increase fail rate.

The chip is an interesting one, since it is very power efficient and apparently can be pushed to at least 4GH/s. I wonder how much more one could get by just die shrinking...

rocks

legendary

Activity: 1153

Merit: 1000

Competitive pricing is by far the most important issue here.

I (and most people I suspect) just want a simple board populated with chips. In other words the exact same model as the Cairnsmore1 FPGA boards, where most of the cost was the FPGAs themselves and everything else was minimized.

My issue with the Cairnsmore2-3 proposals were all of the expensive racking and other 'engineering' put in raised the cost over the price of the base chips.

If this project and pricing goes the same way as Cairnsmore1 FPGA boards, which maximized the chip BOM vs. everything else, then I would be interested.

CryptoCluster

member

Activity: 84

Merit: 10

Quote from: eve on September 18, 2013, 09:32:12 AM

Quote from: shapemaker on September 18, 2013, 07:27:13 AM

Quote from: joeventura on September 18, 2013, 07:13:40 AM

I predict BTC68 will be the cost

$14 USD a GH

At that price there wouldn't be much point in buying. I'd guess 45-50 BTC would be a decent spot.

20-30 btc will be more attractive

And 0.5-1 BTC even more attractive.

eve

full member

Activity: 210

Merit: 100

Quote from: shapemaker on September 18, 2013, 07:27:13 AM

Quote from: joeventura on September 18, 2013, 07:13:40 AM

I predict BTC68 will be the cost

$14 USD a GH

At that price there wouldn't be much point in buying. I'd guess 45-50 BTC would be a decent spot.

20-30 btc will be more attractive

shapemaker

full member

Activity: 238

Merit: 100

I run Linux on my abacus.

Quote from: joeventura on September 18, 2013, 07:13:40 AM

I predict BTC68 will be the cost

$14 USD a GH

At that price there wouldn't be much point in buying. I'd guess 45-50 BTC would be a decent spot.

markm

legendary

Activity: 2940

Merit: 1090

So a loss of eighteen to minus-two bitcoins, then? Seems like averaging across that range you're more likely to make a loss than a gain...

-MarkM-

joeventura

hero member

Activity: 854

Merit: 500

I predict BTC68 will be the cost

$14 USD a GH

shapemaker

full member

Activity: 238

Merit: 100

I run Linux on my abacus.

Quote from: yohan on September 18, 2013, 04:26:03 AM

One of our test boards with the new concept Cairnsmore4 and Controller1 module fitted. This board supports 16 Clusters of up to 9 Bitfury ASICs. We will talk more about the spec and pricing when we are happy with the firmware/software, thermal solution and are ready to ship. Meanwhile enjoy.

So it has 144 BF ASICs. At 22,5 eur per chip, the chip cost alone is 3240 euros without bulk discounts. If you manage to get 4 GHash/s from each chip (as burnin already has), we're looking at 576 GH/s. At maybe 4 W per chip, the full unit would be using around 600 Watts of power.

If you manage to price that competitively, I'm sure you will have sales. The pricing is what will decide if people want that or not. Time to market is essential at the moment though so don't take too long.

edit: If you manage to deliver in October, that unit should be able to mine 50-75 BTC between Oct and May, depending on, of course, how harshly the difficulty rises in the next year. That will leave some wiggle room in pricing, so if we deduct chip price, we're looking at maybe 3000-3500 euros ROI. Now you just need to decide how much you want from that 3k euros and how much the customer should get.

yohan

sr. member

Activity: 462

Merit: 251

One of our test boards with the new concept Cairnsmore4 and Controller1 module fitted. This board supports 16 Clusters of up to 9 Bitfury ASICs. We will talk more about the spec and pricing when we are happy with the firmware/software, thermal solution and are ready to ship. Meanwhile enjoy.

rocks

legendary

Activity: 1153

Merit: 1000

Quote from: hf_developer on August 20, 2013, 12:40:06 PM

Here is one portion of scrypt core:

for (i = 0; i < 1024; i += 2)
{
      memcpy(&V[i * 32], X, 128);

      salsa20_8(&X[0], &X[16]);
      salsa20_8(&X[16], &X[0]);

      memcpy(&V[(i + 1) * 32], X, 128);

      salsa20_8(&X[0], &X[16]);
      salsa20_8(&X[16], &X[0]);
   }

As you can see, you have to memcopy 128x8bit = 1024 bit. If you have those 64 bit rambusses you will need 16 of them to make this operation work in one cycle. This has to be done 2 times per loop, looping 1024 times (in full scrypt algorithm). You cannot hard-unroll this loop as you can with SHA256. You see, even if you had rambusses of 1024 bit, you need at least 2048 ram operations per scrypt. Assume your hardware runs at 500MHz. Divide this by 2048. Even then you cannot get higher rates than 250 kh/s. (...and you will never see an FPGA with 1024 bit rambusses)

So essentially you either need 2Mb/sec bandwidth per hash, or ~128KB on chip memory per hashing core. This means that an FPGA with 2 Gb/sec total memory and perfect pipelining would only achieve 1kHash/sec. OK the difficulty is clear, thanks.

This also means a GPU card achieving ~250kHash/sec has over 500Gb/sec of usable memory bandwidth, that is very impressive.

It also looks like the scrypt parameters litecoin chooses are optimized to take full advantage of common high-end GPU characteristics and no more, with a balanced ratio between GPU cores to B/W per core. If litecoin selected slightly larger parameters it seem likely GPUs that would be much less efficient and not be able to utilize all their available cores, but as it GPU bandwidth is just able to feed all the GPU cores...

yohan

sr. member

Activity: 462

Merit: 251

Quote from: hf_developer on August 20, 2013, 12:40:06 PM

Quote from: rocks on August 19, 2013, 07:28:33 PM

Quote from: hf_developer on August 19, 2013, 06:28:16 PM

Yes, there was a significant increase of hash rate for FPGAs in SHA256, but this is an effect of the possibility to fully unroll the SHA256 core. It then goes one hash per clock. MHz = MHash/s

This is algorithmically impossible for scrypt as this algorithm was specially designed to be resistant against that. In most linear algorithms you have two possibilities: Get a speed up with the need of more ressources, or save ressources but achieve a lower computation rate. If one side decreases, the other one increases and vice versa. Not so for scrypt. Both sides increase nearly equally.

Here you can see a scrypt demonstration on FPGA with hashrates ~ 2kh/s. (You need to be very experienced to make it twice that speed!):

https://github.com/kramble/FPGA-Litecoin-Miner

Thank you hf_developer, this was very helpful. I was not aware of this effort and look forward to reading and understanding the code better.

The design only used the on-chip FPGA RAM of an LX150, which is fairly limited. With many FPGAs you can have multiple 64-bit external memory ports that all run at full speed, for example 4 ports * 64bits/port * 200MHz optimized design yields 6.4 GBytes/sec of memory bandwidth.

Even a basic LX150 includes integrated Memory Controller blocks for DDR1-DRR3 memories at up to 12.8 Gb/s peak bandwidth (from spec sheet). This chip may or may not be optimal for scrypt, other chips offer higher max bandwidth. I think the main point is memory bandwidth shouldn't be a bottle neck if done right.

Here is one portion of scrypt core:

for (i = 0; i < 1024; i += 2)
{
      memcpy(&V[i * 32], X, 128);

      salsa20_8(&X[0], &X[16]);
      salsa20_8(&X[16], &X[0]);

      memcpy(&V[(i + 1) * 32], X, 128);

      salsa20_8(&X[0], &X[16]);
      salsa20_8(&X[16], &X[0]);
   }

As you can see, you have to memcopy 128x8bit = 1024 bit. If you have those 64 bit rambusses you will need 16 of them to make this operation work in one cycle. This has to be done 2 times per loop, looping 1024 times (in full scrypt algorithm). You cannot hard-unroll this loop as you can with SHA256. You see, even if you had rambusses of 1024 bit, you need at least 2048 ram operations per scrypt. Assume your hardware runs at 500MHz. Divide this by 2048. Even then you cannot get higher rates than 250 kh/s. (...and you will never see an FPGA with 1024 bit rambusses)

Actually indirectly we have already done a 1024 memory interface/FPGA in our HPC product Merrick4 that has 1024 bit memory interface and 16GB of local DDR3. What is different here is that this is done with 16 S6 FPGAs working together. Cost base is also expensive before anyone asks.

Once we have more time we will look at the viability of doing Litecoin on all of our HPC products. There are some better that than Merrick4 in the pipeline that have much more memory bandwidth and will trash GPUs in many applications and quite possibly Litecoin too.

hf_developer

member

Activity: 66

Merit: 10

Quote from: rocks on August 19, 2013, 07:28:33 PM

Quote from: hf_developer on August 19, 2013, 06:28:16 PM

Yes, there was a significant increase of hash rate for FPGAs in SHA256, but this is an effect of the possibility to fully unroll the SHA256 core. It then goes one hash per clock. MHz = MHash/s

This is algorithmically impossible for scrypt as this algorithm was specially designed to be resistant against that. In most linear algorithms you have two possibilities: Get a speed up with the need of more ressources, or save ressources but achieve a lower computation rate. If one side decreases, the other one increases and vice versa. Not so for scrypt. Both sides increase nearly equally.

Here you can see a scrypt demonstration on FPGA with hashrates ~ 2kh/s. (You need to be very experienced to make it twice that speed!):

https://github.com/kramble/FPGA-Litecoin-Miner

Thank you hf_developer, this was very helpful. I was not aware of this effort and look forward to reading and understanding the code better.

The design only used the on-chip FPGA RAM of an LX150, which is fairly limited. With many FPGAs you can have multiple 64-bit external memory ports that all run at full speed, for example 4 ports * 64bits/port * 200MHz optimized design yields 6.4 GBytes/sec of memory bandwidth.

Even a basic LX150 includes integrated Memory Controller blocks for DDR1-DRR3 memories at up to 12.8 Gb/s peak bandwidth (from spec sheet). This chip may or may not be optimal for scrypt, other chips offer higher max bandwidth. I think the main point is memory bandwidth shouldn't be a bottle neck if done right.

Here is one portion of scrypt core:

for (i = 0; i < 1024; i += 2)
{
      memcpy(&V[i * 32], X, 128);

      salsa20_8(&X[0], &X[16]);
      salsa20_8(&X[16], &X[0]);

      memcpy(&V[(i + 1) * 32], X, 128);

      salsa20_8(&X[0], &X[16]);
      salsa20_8(&X[16], &X[0]);
   }

As you can see, you have to memcopy 128x8bit = 1024 bit. If you have those 64 bit rambusses you will need 16 of them to make this operation work in one cycle. This has to be done 2 times per loop, looping 1024 times (in full scrypt algorithm). You cannot hard-unroll this loop as you can with SHA256. You see, even if you had rambusses of 1024 bit, you need at least 2048 ram operations per scrypt. Assume your hardware runs at 500MHz. Divide this by 2048. Even then you cannot get higher rates than 250 kh/s. (...and you will never see an FPGA with 1024 bit rambusses)

Topic: Goliath Miner (Read 10823 times)