Looking for PoW recommendations. - page 3.

Cryddit

legendary

Activity: 924

Merit: 1132

My current plan is to release with a linux daemon and linux qt client with built-in CPU miner only. The source code will be available (except for the genesis block nonce) four weeks before launch. After launch, there'll be a 4-week period of 1/10x normal rewards, then ten weeks of ramping up to normal reward levels at ten percent per week. I *think* that's enough time for somebody who has a Windows machine to port things to Windows, somebody who has a mac to port things to whatever they're calling their operating system these days (are they still on cats?), somebody who has cudaminer to write the appropriate cudaminer plugins, etc, before they miss out on much. Anyway, after that ramp-up period, the proof-of-work award will be a constant amount per block forever, so it's not like somebody's going to be missing out on a huge instamine if they don't have their mining client ready.

Proof-of-stake awards will exist, but they'll be mostly for the users rather than the miners, and will only start to matter very, very, slowly. When a transaction is made, the input coins will get ten percent stake award per year for up to two years since the tx that created them. Most of that will go to the people making the transaction, but a tiny fraction will go (along with tx fees) to the miners. Under certain circumstances (txIn unspent for over a year) the miner would get up to half of the stake award for a particular txIn, but I think that'll be rare. People will usually make a tx to themselves, rather than enter a no-interest (for them) period a year in.

Right now my plan is that blocks will always be formed on the basis of proof-of-work. It'll take about eight years before there are as many coins created by proof-of-stake (and mostly awarded to those who make transactions) as there are formed by proof-of-work (and awarded to miners). You can think of that, if you like, as an eight year reward halving interval for the miners.

Both (Transactions as) proof-of-stake and proof-of-work will matter for purposes of resolving forks though, so if you want to come out on top in a fork, you include as many recent transactions as possible.

Also, because the block header is changed in both length and format (I wanted to move the extranonce and current-difficulty to the header where they belong, among other things), there aren't any presently extant ASICs that will mine it. So there'll be a long period of GPU and CPU co-dominance before any ASICs exist (if mining the thing ever becomes valuable enough to justify ASIC development at all). Still, I want to have at least one hashing algo that will be obvious "low hanging fruit" for the ASIC developers so that if and when they get there they can leave the other algos alone.

For the memory-hard hashes I have been considering momentum, cuckoo cycle and scrypt, all with some tougher parameters than usually found.

I particularly like momentum hash, because somebody who can build a complete table should asymptotically approach one momentum hash per colliding-hash performed. This makes it very efficient for a particular amount of memory, with no advantage for greater amounts, so it seems like a good way to tune for machines with different levels of memory. I've seen how it broke for Protoshares, though, and that's a pretty general break. It means GPU memory counts for about 10x main memory in terms of hashing speed, so if the complete momentum table is less than 20Gbytes, it'll be faster on GPU. That's a pretty tough spec for a high-end home machine, but if GPUs have no advantage there, the GPU miners will be working on a different algorithm I hope. If I pop it up to 40Gbytes and 60Gbytes, those can be for the server-class-machines (for now - home machines in a few more years).

At 20Gbytes it's good for a custom-built high-end home machine - GPUs might be as fast there, but GPUs have easier things to do like concentrate on the SHA256D stuff.

I don't care as much for cuckoo cycle, because it doesn't approach anything near the asymptotic hashing efficiency for large-memory machines that momentum does. Using a classic example from Graph theory was a great idea though; it inspires me to look into solvable instances of other classic problems like factoring, etc.

Factoring, and modular factoring in particular, are very interesting problems that I'd really like to see more serious public research on and publicly developed ASICs for. The *public* development of ASICs for that problem would force everybody to use better cryptographic keys, which they need to do anyway because I'm pretty sure that *secret* ASIC development has enabled certain parties to read their private communications for a long time now given the size keys some of them are using.

I'm less knowledgeable about the general capabilities of GPU's; I haven't done much programming on them. The idea of using Floating-point for anything like consensus sort of gives me hives, and Intel's 80-bit floats are notorious among cryptographers and hard-math people for their inconsistent results on CPUs. They mysteriously get the bottom 16 bits of mantissa zeroed if the compiler decides to do a register spill to memory - which optimizing compilers do sometimes and don't sometimes. And If somebody comes out with another FDIV bug or something it could mean a chain-fork from hell. Still, if *checking* the PoW is done with some very explicit integer math that has well-defined behavior, it could work.

Bombadil

hero member

Activity: 644

Merit: 500

If you want to make/find a separate Nvidia and AMD friendly algo, you'll need to make sure every algo is 100% optimized before you'll see the differences.
But >90% optimized algos are hard to find, the last one was Scrypt probably. Many new algos keep on popping up these days, so no algo will perform at max speed, always room for new improvements.
So key would be to pick heavily optimized algos and compare them when that happened. Older & popular algos are more prone to be optimized (X11, quark, etc..)

I can already tell you that the Whirlpool algo performs pretty well on older (=< compute 3.5, =< 780TI) Nvidia cards, while the newer Nvidia cards and most AMDs are in the same region. Quark does great on recent Nvidias, but not by that much difference. It all depends on optimization levels, I suppose.
I'm glad you're thinking about us cudaminers too Cheesy

Many devs release new algos with just an opencl miner. It works on Nvidias, but mostly at 1/3 it could do when programmed in Cuda.
Some people also think only AMDs are worth it while mining, while my records are showing otherwise.

Don't forget about Scrypt for ASICs too Wink

There's also this peculiar algo that requires at least 15GB of RAM, called Ramhog.

Maybe add PoS too, as it enlarges the mining scene to those who don't have mining equipment. A low % reward (1-10) or even a fixed reward might be the right way. High % PoS is only interesting for pure PoS coins.

And what about systems like Curecoin? They take 45% of a mined block and reward that to folders, off-chain. They also take 10% and use that for the back-end of that system. That latter is an argument why it might be less popular, but it's an interesting coin and distribution system nonetheless.

smolen

hero member

Activity: 524

Merit: 500

Let's see. There is Momentum function, based on birthday paradox, with an interesting idea that hashing speed increases during nonce range scan as memory gets filled. Parameters chosen for Protoshares coin were too weak to keep GPU away Wink

There is FPGA-frienly Blake-256. A good proposal to use hard problem from graph theory, Cuckoo Cycle.

Also

Quote from: 2112 on September 21, 2011, 12:23:55 AM

Hi again, SolidCoin developers!

After a dinner I came up with another two ideas:

1) Instead of evaluating a total recursive function over a commutative ring of integers you could try a simpler thing. Require evaluating a specific value of primitive recursive function over the field of reals that has fractional dimension. Good starting example:

http://en.wikipedia.org/wiki/Logistic_map

just pick the value of r that is in the chaotic region.

Implement reals as long double that is 80-bit (10-bytes) on the Intel/AMD CPUs. Not all the C++ compilers really support long double, but the logistic map function is so trivial that you can include in-line assembly which will be in the order of 10 lines.

2) This is a variant of the above that embraces the enemy instead of fighting it. Implement (1) with reals as cl_double which is supported in OpenCL by NVidia cards but not by AMD cards. Then short AMD stock and go long NVDA for additiona gains.

Again, good luck!

EDIT: Oops, forgot about this discussion

Quote from: smolen on September 01, 2013, 09:50:32 PM

Quote from: tacotime on September 01, 2013, 11:34:47 AM

Quote

2) This is a variant of the above that embraces the enemy instead of fighting it. Implement (1) with reals as cl_double which is supported in OpenCL by NVidia cards but not by AMD cards. Then short AMD stock and go long NVDA for additiona gains.

I'm pretty sure this isn't true, cl_double is called in the example source code bundled with AMD's APP SDK

The difference in IEEE 754 implementation between nVidia and AMD GPUs is that nVidia tends to be congruent with the results that x86/x64 processors spit out, while AMD's are not because the FP precision is less accurate.

But as far as I know if you can do it in the FPUs and it's invertible, you can do it in integer logic too. Whether the FPU operations would hold an advantage in that case, I do not know.

Indeed, seems AMD has catched up with double-precision.
If there is some math operation which output is bit-to-bit same across all NVIDIA hardware but differs in least significant bits from x86 and AMD, such operation can be used for NVIDIA-only coin. Simulating those least significant bits would be possible, but expensive. Perhaps IEEE 754 floating point math is deterministic and the same across different hardware, but in fast math mode NVIDIA and AMD could be different.

Cryddit

legendary

Activity: 924

Merit: 1132

I'm looking for a set of ~10 or so hashing algorithms suitable for securing a block chain, and if at all possible I'd like them to cater for machines with strengths as *varied* as possible. Ideally people with different machines should have different inflection points along the scale of hashes-per-minute to current-difficulty-for-each-algorithm for switching from one to another, and in the "steady state" where each algorithm is getting the same number of blocks and difficulty remains constant, different types of machine would be working on each algorithm and nobody would have a reason to switch.

In the long run, The difficulty adjusting semi-independently for each algorithm would mean that each gets an approximately equal number of blocks.

So I'm looking for hashing algorithms...
one more efficient for CPUs with server-sized memory,
one more efficient for CPU's on high-end home machines,
one more efficient for ASICs and small-memory devices
(this is easy -- SHA256D)
two more efficient for GPU's, (because there are two main
families of GPUs out there with different strengths)
one more efficient for single-threaded (like ARM) processors

...etc.

As miners I figure you guys probably know a bit about this stuff, so I would like to ask for your recommendations, if you have any.

I don't really believe there's much of anything that a high-end home machine is actually more efficient at doing than a server-class machine, but my intent is that the server-class machines should usually have a different option available that saves them the effort of competing with the home machines, so the home machines should have at least one algorithm pretty much to themselves most of the time and only deal with competition from the server-class crew when their algorithm's difficulty dips extra-low.

"Efficiency" here is therefore sort of game-theoretic; it would mean that anybody who can do what you can better, *usually* has something more suited to their own machine's strengths that they can be doing instead. More capable machines (well, other than ASICs) give you more options for switching between algorithms as their respective difficulties vary, but each type should have a "niche" where everybody else usually has something better to do.

Anyway, I think this is a worthwhile thing because it would keep a good variety in what kind of machines (and therefore what socio-economic classes of people) can get blocks.

Thanks for any response you have; even if you don't have specific recommendations I'd like to hear your thoughts.

Topic: Looking for PoW recommendations. - page 3. (Read 2589 times)