Okay let me see what I can find.
here is a pdf this is first post made in the thread
I made the pdf on Dec 9 2018
"Author Topic: Acorn M.2 FPGA based GPU Accelerator (Read 36631 times)
GPUHoarder
Member
Activity: 154
Merit: 36
Trust:
0: -0 / +0
Ignore
Acorn M.2 FPGA based GPU Accelerator
June 01, 2018, 02:06:38 AM
Merited by vapourminer (5), suchmoon (5), 64dimensions (5)
+Merit #1
This information all existed in the discord but I wanted to share it with everyone.
So we’ve developed an FPGA accelerator over the past few months in M.2 (same
as nVME drives) form factor designed to operate both standalone and in
conjunction with GPUs.
The first version to be released has 4x high speed PCIe lanes to communicate
between the system/GPUs as well as 512MB or 1GB of onboard DDR3 along with a
100k+ LE or 200k+ LE FPGA of high speed grade. We’ve named it the Acorn, and
the three models are the CLE-101, CLE-215, and CLE-215+
General expectation is it will provide performance roughly scaled with
price/performance of the VCU1525, but it has a unique role and is not applicable
to all of the same algorithms. Its performance in this role is dominated by its
interconnect bandwidth and not its processing power.
It is capable of providing up to 30MH of lift to a mining system with GPUs on a
hand full of algorithms or operate independently at higher-than-GPU level
hashrates for other non-memory intensive algorithms (Keccak, etc). I will be
releasing it alongside our mining software and bitstreams to support hybrid GPU
acceleration. This project was not developed commercially, it was developed out of
a product for my day job for internal use in our own mining systems to give an
edge to traditional PCs and gaming systems turned miners.
The accelerator works by streaming high bandwidth hash state between GPUs and
the FPGA over PCIe., allowing each piece of hardware to handle the portion of the
algorithm it is best at. In general this means memory bandwidth or area heavy
portions of the algorithm may be handled by the GPU, and hash algorithms
designed for hardware implementations are handled by the FPGA. This approach
works for any algorithm whose internal state is 256 bit (60Mh gains) or 512 bit
(x16r, Lyra2Rev2, etc.) or smaller. The accelerator supports rapidly reconfiguring
its algorithms from on-board DDR to enable handling of per-block or period
(TimeTravel10) re-sequencing. It was designed originally to provide performance
gains (especially for older GPUs with poor cores) and power savings for ETH by
way of offloading the opening and closing Keccak calculations, as well as hashselection
to improve locality of reference for early ETH rounds.
Given the anticipated path of ETH itself regarding POS and other fork possibilities
please consider all those things if ETH is your target. It may be the most popular
coin for GPUs, that does not mean it is the best use of FPGA or hybrid tech.
I’ve decided to make this hardware available to community at near cost, given all
the FPGA interest lately, alongside my belief that broadly available general
purpose acceleration hardware at its true market cost (not low volume industry
specific dev boards) is the best defense against complete ASIC centralization. You
will see this philosophy reflected in my activity around the VCU1525 board as well.
Anticipated pre-order prices of $199 for the CLE-101 512MB variant and $329 on
the high end highest speed grade CLE-215+ 1GB DRAM version. On-board power
consumption is nominally 15W. It will include a heatsink adequate for this
dissipation level with reasonable airflow. It is important to note that to fit the
FPGA this adapter is slightly outside of the 2280 M.2 specification, weighing in at
2380. The vast majority of M.2 slots should not have an issue with this.
I am also pursuing making available well priced options for individual PCIe x4 to
M.2 M-key host boards (these are broadly available for $10-15), as well as Quad-
M.2 PCIe switched and Bifurcated x16 host boards for those who do not have the
available M.2 M-Key slots or require up to 240MH of acceleration.
I won’t post exact per algorithm stats or performance until I can do final testing of
the actual boards to be shipped with the release hardware/heatsink/thermal
management pieces in place, at which point I’ll accept pre-orders. This device
requires quite a bit of testing to cover the list of common GPUs, PCIe
configurations, and supported algorithms. I have no desire to sell anyone anything
not useful to them, or to push a board at all, let alone one based on 3D renders,
prototype parts pictures, or choppy YouTube videos, so I believe this full set of
data along with final product pictures and overview must be published before I will
take any preorders. I am sorry if that tests your patience.
Prototypes exist and I’ve already secured most of the hardware for a first batch so
lead time will only be PCB + assembly.
At the time of shipping I will be releasing our internal miner software in closed
source form for Windows and Linux that supports GPU only as well as Hybrid
acceleration. You’re also welcome to develop your own bitstreams for the
accelerator, and will have all the specifications necessary to do so.
I will also be publishing the interface for the bitstreams so that open source
miners that wish to can use the FPGA directly.
We are handling all CE, FCC, RoHS, and other certifications as well as ITAR and
export compliance, so we will be able to ship to all non-US embargo’d countries.
Taxes and import duties will fall on the purchaser. We will be offering at least a 90
day warranty.
All feedback is welcome. This is not my source of income, nor that of the rest of
my team, and we don’t want anyone’s money unless they are happy with what
we’re offering. I’m also happy to continue conversations I am already having with
coin devs and miner developers on how or if FPGAs fit into their plans for their
coin and/or ASIC Resistance strategies. This community is about choice, and I will
respect the choices of those teams.
So all I would like from all of you beyond the feedback, is for anyone interested to
hit our pre-order registration survey at
http://www.squirrelsresearch.com to help
us ensure we’re covering your needs and wants and have all the appropriate
hardware secured. Based on that info very detailed performance information and
full device photos (spoiler - it looks like an SSD with a heatsink on it!) will be
published at the time preorders open, expected in mid-June.
- David
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
Advertise here.
trillobeat
Newbie
Activity: 35
Merit: 0
Trust:
0: -0 / +0
Ignore
Re: Acorn M.2 FPGA based GPU Accelerator
June 01, 2018, 02:15:44 AM
+Merit #2
Good news!
I want to ask again , is the use of the M.2 accelerator helping only a single GPU
on the motherboard or all GPUs will benefit?
The MBs with two M.2 slots can use two accelerator units ? ( in the case a single
GPU per accelerator)
Report to moderator
GPUHoarder
Member
Activity: 154
Merit: 36
Trust:
0: -0 / +0
Ignore
Re: Acorn M.2 FPGA based GPU Accelerator
June 01, 2018, 02:16:51 AM
+Merit #3
Quote from: trillobeat on June 01, 2018, 02:15:44 AM
Good news!
I want to ask again , is the use of the M.2 accelerator helping only a single GPU on the
motherboard or all GPUs will benefit?
The MBs with two M.2 slots can use two accelerator units ? ( in the case a single GPU per
accelerator)
One Acorn can help multiple GPUs depending on algorithm, you can use two in two
M.2 slots.
Report to moderator"
I do not know if the first post is around any more