Pages:

Author

Topic: Some 'technical commentary' about Core code esp. hardware utilisation (Read 4926 times)

Last of the V8s

legendary

Activity: 1652

Merit: 4393

Be a bank

Locking the thread now because asked and answered.
@gmaxwell's responses seemed very clear, thanks
interesting which trolls and which sane people turned up

wiffwaff

newbie

Activity: 6

Merit: 0

Quote from: Last of the V8s on July 07, 2017, 04:04:29 AM

Quote from: wiffwaff on July 07, 2017, 03:49:14 AM

Quote from: tspacepilot on July 07, 2017, 01:49:24 AM

Troll Buster is pointing out poor decisions that can be improved upon and people here are trying to find something of their's to be able to bash. This is typical Core tactics, whereby they fail to address the issue being highlighted and instead attempt to launch person attacks on the person stepping forward.

This is exactly why bitcoin development fragmented under the fifth column attacks that forced out the best and brightest, leaving us with the cesspit we have today.

Go on, fire up some BIP148 hashing power. I double-dare you.

https://bitcointalksearch.org/user/wiffwaff-221980

Quote from: wiffwaff on December 28, 2016, 05:00:48 AM

Quote

What is the benefit of bitcoin core?

Bitcoin Core will signal support for and recognise SegWit enabled blocks, amongst other additions. Depending on your stance in the max block size issue, you might like to consider using one of the other many bitcoin clients such as https://www.bitcoinunlimited.info/ which supports bigger maximum block sizes as a solution to the full blocks we currently are experiencing.

Gosh Roger Ver/fake satoshi Craig Wright you forked out for quite an old account.
frighted?

Wrong and wrong, sir. I am just one more bitcoin supporter. It is telling that you go for the diversion and failed to make any clear point.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: NewLiberty on July 07, 2017, 10:50:33 AM

It is very rare these days that someone outside the core team is so helpful to them as in this post. Most outside core would be just as happy to see core fade away instead of being so very helpful as TB is being here, albeit TB was masterfully trolled by GM into doing GM's job for him, it is just as likely that GM's ego or outside incentives will prevent him from taking any of this helpful advice to heart.

Thanks for demonstrating your lack of either clue or integrity for the record-- some people might have been mistaking your endorsement of Wright as a moment of bamboozle rather than a deeper character flaw.

But just in case you missed the response to his claims there:

Quote from: gmaxwell on July 06, 2017, 05:58:09 PM

There was an incomplete PR for that, it was something like a 5% performance difference for initial sync at the time; it would be somewhat more now due to other optimizations. Instead we spent more time eliminating redundant sha256 operations in the codebase, which got a lot more speed up then this final bit of optimization will. It's used in the fibre codebase without autodetection. Please feel free to finish up the autodetection for it. It's a perfect project for a new contributor. We also have a new AMD host so that x86_64 sha2 extensions can be tested on it.

So, dumping some output of a google search citing code that we already had-- isn't exactly "showing us the wound", would you say?

Not to mention your highlight of the use of sha256^2 which is part of the protocol definition from day one and not something we could change without invalidating every transaction and block. (Nor is it entirely pointless...) But I guess as Craig Wright's partner in crime you already know all about that because he's totally Satoshi. (lol)

Quote from: NewLiberty on July 07, 2017, 10:50:33 AM

Most interesting here is that TB has found where Core team have added the Big-O Quadratic Sighash bug which is their big issue of why then need SegWit and can't scale.

So, you're telling us that "TB" is Craig Wright? Because that easily debunked claim is Wright's as far as I know.

hv_

legendary

Activity: 2548

Merit: 1055

Clean Code and Scale

Quote from: zanza on July 07, 2017, 11:42:09 AM

Bitcoin Core Dev's are the smartest developers in the world.

Sure. Spending little too much time and efforts at beeing so.. Sigh

zanza

newbie

Activity: 40

Merit: 0

Bitcoin Core Dev's are the smartest developers in the world.

tspacepilot

legendary

Activity: 1456

Merit: 1083

I may write code in exchange for bitcoins.

Quote from: ?? on ??

I get it, Troll is a self-proclaimed close-to-the-metal ninja coder. This makes him very useful in his niche, and his advice could be taken if it is truly not just trolling. But when you're very intimate with hammers, every problem starts to look like a nail, and you wonder why other coders don't just do as you would do in their place. A broader perspective may turn up some valid reasons.

And when each response to "show us something you've worked on" or "why don't you take this on and join the project" is replied to with "You fools! MWA HAHA", then that self-proclamation starts to look incredibly weak, imo.

NewLiberty

legendary

Activity: 1204

Merit: 1002

Gresham's Lawyer

Quote from: Troll Buster on July 06, 2017, 02:23:41 PM

You know decades ago people invented this little thing call #ifdef right?

Just use #ifdef _MSC_VER/#else/#endif around the inline assembly if you want to bypass MSVC.

Even simple crap like switching to --i instead of ++i, will reduce assembly instructions regardless of what optimization flags you use on the compiler.

Quote from: ComputerGenie on July 05, 2017, 11:11:25 PM

Quote from: Troll Buster on July 05, 2017, 09:32:48 PM

Why the hell is Core still stuck on LevelDB anyway?

The same reason BDB hasn't ever been replaced, because even after a softtfork and a hard fork, new wallets must still be backwards-compatible with already nonfunctional 2011 wallets. Roll Eyes

Most CPU is running idle most of the time, and SSD is still expensive.
So just use RocksDB, or just toss in a lz4 lib, add an option in the config and let people with decent CPU enable compression and save 20G and more.

I just copied the entire bitcoind dir (blocks, index, exec, everything) onto a ZFS pool with lz4 compression enabled and at 256k record size it saved over 20G for me.

Works just fine, and ZFS isn't even known for its performance.

Doesn't matter how many life time people spent on it, when you see silly shit like sha256() twice, you know it's written by amateurs.

Here, your internal sha256 lib, the critical hashing function all encode/decode operation relies on, the one that hasn't been updated since 2014:

https://github.com/bitcoin/bitcoin/blob/master/src/crypto/sha256.cpp

SHA256 is one of the key pieces of Bitcoin operations, the blocks use it, the transactions use it, the addresses even use it twice.

So what's your excuse for not making use of SSE/AVX/AVX2 and the Intel SHA extension? Aesthetics? Portability? Pfft.

There are mountains of accelerated SHA2 libs out there, like this one,
Supports Intel SHA extension, supports ARMv8, even has MVSC headers:

Quote

https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-x86.c

SHA-1, SHA-224 and SHA-256 compression functions using Intel SHA intrinsics and ARMv8 SHA intrinsics

For AVX2, here is one from Intel themselves:

Quote

https://patchwork.kernel.org/patch/2343841/

Optimized sha256 x86_64 routine using AVX2's RORX instructions

Provides SHA256 x86_64 assembly routine optimized with SSE, AVX and
AVX2's RORX instructions. Speedup of 70% or more has been
measured over the generic implementation.

Signed-off-by: Tim Chen <[email protected]>

There is your 70% speed up for a single critical operation on your hot-path.

This isn't some advanced shit, that Intel patch was created in March 26, 2013, your sha256 lib was last updated in Dec 19, 2014, so the patch existed over a year before your last update. We have even faster stuff now using Intel SHA intrinsics.

QFT and mined for the gold.

It is very rare these days that someone outside the core team is so helpful to them as in this post. Most outside core would be just as happy to see core fade away instead of being so very helpful as TB is being here, albeit TB was masterfully trolled by GM into doing GM's job for him, it is just as likely that GM's ego or outside incentives will prevent him from taking any of this helpful advice to heart.

Most interesting here is that TB has found where Core team have added the Big-O Quadratic Sighash bug which is their big issue of why then need SegWit and can't scale.

GM shoots Bitcoin in the leg, TB shows him the wound, the blood, the gunpowder marks on his hand and hands him the surgical kit.
GM sits there mocking him and shrugging.
Bitcoin's blood leaking out while GM stares around wondering why he doesn't get more love.

tspacepilot

legendary

Activity: 1456

Merit: 1083

I may write code in exchange for bitcoins.

Quote from: wiffwaff on July 07, 2017, 03:49:14 AM

Quote from: tspacepilot on July 07, 2017, 01:49:24 AM

My reading was that Troll Buster's points were all replied to by gmaxwell. Once you cut away all of the "stupid", "worthless", etc, invective there were a few criticisms in there and I'm pretty sure they were addressed. Then, when Mr. Buster continued with the "you're all so stupid" style posts, I think people naturally respond with "ok, can you, like, show us why you're so smart", to which Mr. Buster sorta ran away screaming:

Quote

Like they said, "I could tell you but then I'd have to kill you."
Too much hassle.

Quote

This is exactly why bitcoin development fragmented under the fifth column attacks that forced out the best and brightest, leaving us with the cesspit we have today.

Go on, fire up some BIP148 hashing power. I double-dare you.

Hardly seems unreasonable to reply with technical answer to the few bits of detail in Troll Buster's post (which gmaxwell did) and then to address his invective and screaming by asking him to show something more productive than insults.

Last of the V8s

legendary

Activity: 1652

Merit: 4393

Be a bank

Quote from: wiffwaff on July 07, 2017, 03:49:14 AM

Quote from: tspacepilot on July 07, 2017, 01:49:24 AM

https://bitcointalksearch.org/user/wiffwaff-221980

Quote from: wiffwaff on December 28, 2016, 05:00:48 AM

Quote

What is the benefit of bitcoin core?

Gosh Roger Ver/fake satoshi Craig Wright you forked out for quite an old account.
frighted?

wiffwaff

newbie

Activity: 6

Merit: 0

Quote from: tspacepilot on July 07, 2017, 01:49:24 AM

Troll Buster

newbie

Activity: 42

Merit: 0

Quote from: tspacepilot on July 07, 2017, 01:49:24 AM

but I think we can go ahead and recognize that Troll Buster isn't going to be contributing anything more than the chest thumping.

Shit, you mean I could end up just like you?

tspacepilot

legendary

Activity: 1456

Merit: 1083

I may write code in exchange for bitcoins.

Quote from: Troll Buster on July 06, 2017, 06:35:17 PM

Quote from: gmaxwell on July 06, 2017, 05:58:09 PM

There is a PR for that, it was something like a 5% performance difference for initial sync at the time; it would be somewhat more now due to other optimizations. It's used in the fibre codebase without autodetection. Please feel free to finish up the autodetection for it.

I would have made patches a long time ago if the whole project wasn't already rotten to the core.

So here you're just admitting that you're only here to troll?

Quote from: Troll Buster on July 06, 2017, 11:34:47 PM

Like they said, "I could tell you but then I'd have to kill you."
Too much hassle.

There was this other thing that they said, something about talk being cheap. Then there was another one I heard once that went something like 'put up or shut up'. Maybe those are relevant here.

At this point it's pretty clear to me that Troll Buster is just here to spew bile. It's really striking how puffed up he is about his skills and badassery and then when someone asks him to point to a project he's worked on or generally to prove his talk with something more than a google search his reply is all 'hey, look over there!'

I'll keep watching this thread because amongst all the chest thumping are some interesting technical details, but I think we can go ahead and recognize that Troll Buster isn't going to be contributing anything more than the chest thumping.

Troll Buster

newbie

Activity: 42

Merit: 0

Quote

Fun fact: Mike Hearn contributed a grand total of something like 6 relatively minor pull requests-- most just changing strings. It's popular disinformation that he was some kind of major contributor to the project. Several of his changes that weren't string changes introduced remote vulnerabilities (but fortunately we caught them with review.)

Irrelevant.
You challenged me to find one person who quit because of you. I gave you a whole team.
Here is another team, the Bitcoin Classic team, they left for similar reasons.

Quote

Yes, I've been using Bitcoin pretty much its entire life and I can easily demonstrate it. My expertise is well established, why is it that you won't show us yours though you claim to be so vastly more skilled than everyone here?

I didn't claim anything about myself.
Your code sucks, you said nope they're great, so I shown you where, I shown you how to improve it.
Then you went all "Says the few days old account", "i spent years on a porn codec" ego authority bullshit.
I don't care what you think about yourself or me.
Stick to the tech or stfu.

Quote

From 2009? ... you know that the blocks are not accessed at all, except by new peers that read all of them right? They're not really accessed any less accessed than blocks from 6 months ago. (they're also pretty much completely incompressable with lz4, since unlike modern blocks they're not full of reused addresses).

As to why? Because a 10% decrease in size isn't all that interesting esp at the cost of making fetching blocks for bloom filtered lite nodes much more cpu intensive, as that's already a DOS vector.

So now you're going to use new peers as an excuse to not compress the blocks?

That is so stupid.

When compression is enabled, and a new peer requests an old block.
Just send him the entire compressed block as is and let him process it.
It'll actually save bandwidth and download time.

Just add the compression feature and setting.
Some user would like to save 20G on their SSD by changing a 0 to 1, some wouldn't.
Just add the feature and move on, what's so complicated.
Compression is standard stuff, don't argue over stupid shit.

Quote

Uh, sounds like you're misinformed on this too: Pruning makes absolutely no change in the security, privacy, or behavior of your node other than that you no longer help new nodes do their initial sync/scanning. Outside of those narrow things a pruned node is completely indistinguishable. And instead of only reducing the storage 10%, it reduces it 99%.

Who said anything about security or privacy.
To suggest pruning over simple compression was silly enough.
One minute you go all "My expertise is well established"
Next minute you talk total nonsense.
It's like amateur hour.

Quote

Lz4 is fine stuff, but it isn't the right tool for Bitcoin almost all the data in Bitcoin is cryptographic hashes which are entirely uncompressed. This is why a simple change to more efficient serialization can get over 28% reduction while your LZ4 only gets 10%. As far as other things-- no we won't: block data is not like ordinary documents and traditional compressors don't do very much with it.

(And as an aside, every one of the items in your list are exceptionally slow. lol, for example I believe the top item in it takes it about 12 hours to decompress its 15MB enwiki8 file. heh way to show off your ninja recommendation skills)

If you'd like to work on compression, I can point you to the compacted serialization spec that gets close to 30%... but if you think you're going to use one of the paq/ppm compressors ... well, hope you've got a fast computer.

Look, here is the bottom line.
Compression is a common feature used everywhere for decades.
It's not some new high tech secret, why talk so much bullshit making it sound so complicated.

The point is you're already a few years late.
10%, 20%, 30%, Lz4, not Lz4, who gives a shit, in the end it's a space/time trade off.
If you can't decide what settings to use, just offer 3 settings, low/medium/high.
If you can't decide which algorithm to use, let user choose 1 out of 3 algorithms, give users the choice.
Compression is simple, libs and examples are everywhere, just figure it out.
Stop giving stupid excuses and stop mumbling irrelevant bullshit.

Quote

Can you show us a non-trivial patch you made to any other project anywhere?

Like they said, "I could tell you but then I'd have to kill you."
Too much hassle.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: Troll Buster on July 06, 2017, 06:35:17 PM

XT team for starters:

Quote

Right, if the logic doesn't work, just fall back to using registration date and post counts to establish authority.

Quote

At the time I didn't even know you guys were stupid enough to not compress the 150G of blocks, until someone reminded me in that thread. Seriously what is the point leaving blocks from 2009 uncompressed? SSD is cheap these days but not that cheap.

Quote

So after all the talk about your l33t porn codec skills, your solution to save space is to just prune the blocks? LOL. You might as well say "Just run a thin wallet".

Quote

Why do you think compression experts around the world invented algorithms like Lz4? Why do you think it's part of ZFS? Because it is fast enough and it works, it is simple proven tech used by millions of low power NAS around the world for years.

Here, there are over 100 compression algorithms, all invented and benchmarked for you.
You'll easily find one that has a size/speed/mem profile that just happen to work great on bitcoin block files and is better than LZ4.

Lz4 is fine stuff, but it isn't the right tool for Bitcoin almost all the data in Bitcoin is cryptographic hashes which are entirely uncompressable. This is why a simple change to more efficient serialization can get over 28% reduction while your LZ4 only gets 10%. As far as other things-- no we won't: block data is not like ordinary documents and traditional compressors don't do very much with it.

(And as an aside, every one of the items in your list are exceptionally slow. lol, for example I believe the top item in it takes it about 12 hours to decompress its 15MB enwiki8 file. heh way to show off your ninja recommendation skills)

If you'd like to work on compression, I can point you to the compacted serialization spec that gets close to 30%... but if you think you're going to use one of the paq/ppm compressors ... well, hope you've got a fast computer.

Quote

I would have made patches a long time ago if the whole project wasn't already rotten to the core.

Can you show us a non-trivial patch you made to any other project anywhere?

cr1776

legendary

Activity: 4298

Merit: 1317

How about just posting a link (as I've asked 3 times now) to where you advocate "switching to --i instead of ++i"?

I am quite aware of what "goes on inside a CPU" and have actually done several CPU designs. Although I think you need to drop the "Buster" since you are just trolling us.

Quote from: Troll Buster on July 06, 2017, 05:29:01 PM

Quote from: cr1776 on July 06, 2017, 05:11:09 PM

Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i.

Yes you did, I even highlighted it:

Quote from: cr1776 on July 06, 2017, 02:55:40 PM

Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way.

Well technically you posted ++i and i++, but this whole time I've been talking about ++i and --i, that was what you were responding to, and you stated that compilers can handle everything, they can't, and that's entry level knowledge.

Quote from: cr1776 on July 06, 2017, 05:11:09 PM

And regarding your "talentless coders talking about credentials" you again seem to have a huge chip on your shoulder. I spoke about my experience - when people come in and start insulting, attacking, denigrating with a lot of hand-waving and a big chip on their shoulder and no specifics, they are ignored (or not hired) in my experience at big (22000 plus people) and small organizations (3+). And rightly so. I think everyone would appreciate specifics instead of baseless, groundless, inaccurate attacks.

Here is a tip, if you don't want to be mocked, next time don't start an argument with:
"As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs)"

You walked in here knowing you had no idea wtf was going on inside a CPU, thrown out a bunch of titles, made a bunch of false claims while making demands, and you want to talk about etiquette?

Your code sucks, everyone else is doing better, I shown you the proof, I pointed you to the right direction, take it or leave it.

You're a nothing burger with 50 stickers on it and I simply don't give a shit what you think.

Troll Buster

newbie

Activity: 42

Merit: 0

Quote from: gmaxwell on July 06, 2017, 05:58:09 PM

Quote

And many people on the project quit because they didn't like working with you, what's your point?

Name one.

How about the entire XT team for starters:

Quote from: gmaxwell on July 06, 2017, 05:58:09 PM

Says the few days old account...

Right, if the logic doesn't work, just fall back to using registration date and post counts to establish authority.

Like the guy above you who claimed to have "30 years experience" while demonstrating less knowledge about CPU and compilers than a snot nosed newbie drone programmer.

Quote from: gmaxwell on July 06, 2017, 05:58:09 PM

If you care about how much space the blocks are using, turn on pruning and you'll save 140GB.

So after all the talk about your l33t porn codec skills, your solution to save space is to just prune the blocks? LOL. You might as well say "Just run a thin wallet".

Why do you think compression experts around the world invented algorithms like Lz4? Why do you think it's part of ZFS? Because it is fast enough and it works, it is simple proven tech used by millions of low power NAS around the world for years.

Quote from: gmaxwell on July 06, 2017, 05:58:09 PM

I would have made patches a long time ago if the whole project wasn't already rotten to the core.

I see you just added this part:

Quote from: gmaxwell on July 06, 2017, 05:58:09 PM

LZ4 is a really inefficient way to compress blocks-- it mostly just exploits repeated pubkeys from address reuse Sad the compact serilization we have better (28% reduction) but it's not clear if its worth the slowdown, especially since you can just prune and save a lot more.

Here, there are over 100 compression algorithms, all invented and benchmarked for you.
You'll easily find one that has a size/speed/mem profile that just happen to work great on bitcoin block files and is better than LZ4.

Just pick ONE.

Quote

http://mattmahoney.net/dc/text.html
Large Text Compression Benchmark

Program Options enwik8
------- ------- ----------
cmix v13 15,323,969
durilca'kingsize -m13000 -o40 -t2 16,209,167
paq8pxd_v18 -s15 16,345,626
paq8hp12any -8 16,230,028
drt|emma 0.1.22 16,679,420
zpaq 6.42 -m s10.0.5fmax6 17,855,729
drt|lpaq9m 9 17,964,751
mcm 0.83 -x11 18,233,295
nanozip 0.09a -cc -m32g -p1 -t1 -nm 18,594,163
cmv 00.01.01 -m2,3,0x03ed7dfb 18,122,372

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: gmaxwell on July 06, 2017, 06:28:18 AM

Inefficient data storage Oh please. Cargo cult bullshit at its worst. Do you even know what leveldb is used for in Bitcoin? What reason do you believe that $BUZZWORD_PACKAGE_DEJURE is any better for that? Did it occur to you that perhaps people have already benchmarked other options? Rocks has a lot of feature set which is completely irrelevant for our very narrow use of leveldb-- I see in your other posts that you're going on about superior compression in rocksdb: Guess what: we disable compression and rip out out of leveldb, because it HURTS PERFORMANCE for our use case. It turns out that cryptographic hashes are not very compressible.

Everyone knows compression costs performance, it's for space efficiency, wtf are you even on about.

Most people's CPU is running idle most of the time, and SSD is still expensive.

So just use RocksDB, or just toss in a lz4 lib, add an option in the config and let people with a decent CPU to enable compression and save 20+G.

Reading failure on your part. The blocks are not in a database. Doing so would be very bad for performance. The chainstate is not meaningfully compressible beyond key sharing (and if it were, who would care, it's 2GBish). The chainstate is small and entirely about performance. In fact we just made it 10% larger or so in order to create a 25%-ish initial sync speedup.

If you care about how much space the blocks are using, turn on pruning and you'll save 140GB. LZ4 is a really inefficient way to compress blocks-- it mostly just exploits repeated pubkeys from address reuse Sad

the compact serilization we have better (28% reduction) but it's not clear if its worth the slowdown, especially since you can just prune and save a lot more.

Especially since if what you want is generic compression of block files you can simply use a filesystem that implements it... and it will helpfully compress all your other data, logs, etc.

Quote

So what's your excuse for not making use of SSE/AVX/AVX2 and the Intel SHA extension? Aesthetics? Portability? Pfft.

Troll Buster

newbie

Activity: 42

Merit: 0

Quote from: cr1776 on July 06, 2017, 05:11:09 PM

Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i.

Yes you did, I even highlighted it:

Quote from: cr1776 on July 06, 2017, 02:55:40 PM

In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.

Stop talking bullshit.

Quote from: cr1776 on July 06, 2017, 05:11:09 PM

cr1776

legendary

Activity: 4298

Merit: 1317

Perhaps you should try reading and understanding prior to attacking. I never "argued" with you about --i and ++i. I asked for specifics in the code about which you were referring - which should be easy to provide - and pointed out compilers are quite smart about optimizations, but without knowing what code you are referencing it is impossible to review.

Easy question: where is this "switching to --i instead of ++i" which you are speaking about? Just post a link to it on GitHub.

And regarding your "talentless coders talking about credentials" you again seem to have a huge chip on your shoulder. I spoke about my experience - when people come in and start insulting, attacking, denigrating with a lot of hand-waving and a big chip on their shoulder and no specifics, they are ignored (or not hired) in my experience at big (22000 plus people) and small organizations (3+). And rightly so. I think everyone would appreciate specifics instead of baseless, groundless, inaccurate attacks.

Without more detail no one can evaluate whether you are good at coding or just insulting.

Quote from: Troll Buster on July 06, 2017, 03:32:03 PM

Quote from: cr1776 on July 06, 2017, 02:55:40 PM

As someone who has 30 years of experience plus a BS in CS and CE, and an MS in CS (from top 10 US CS/CE programs), this kind of language isn't a way to (a) make your point, and (b) get anyone to listen to you with any degree of respect.

In open source projects, if you have something like your --i and ++i change, open a pull request or at minimum link to the specific code you are talking about. Most well written, non-student compilers will handle cases like that and there will be no different between things like ++i and i++ and the code generated except perhaps in a class that obfuscates the operation in some extremely obscure way. But, as I said, if it is that easy, please point out what you are talking about.

If greg wants to be treated with respect, he shouldn't begin and end a reply with insults.

This --i and ++i is basic stuff and you want to argue about it? wtf have you been doing for the past 30 years?

And it's not just the speed, it's the smaller byte code which allow you to pack more code into the tiny L0 instruction cache and reduce cache miss, which still costs you 4cycles when you re-fetch it from L1 to L0.

It also means you can fit more code in that tiny 32kb L1 instruction cache, so your other loops/threads can run faster by not being kicked out of the cache by other codes. It also saves power on embedded systems.

This is what I was talking about, the world is flooded with "experts" with "30 years experience" and "50 alphabet soup titles" but still have absolutely no idea wtf actually happens inside a CPU.

Only talentless coders talk about credentials instead of the code.

This is not some super advanced stuff, this is entry level knowledge that's not even up for debate.
The information is everywhere, this took 1 second to find, look:

Quote

https://stackoverflow.com/questions/2823043/is-it-faster-to-count-down-than-it-is-to-count-up/2823164#2823164

Which loop has better performance? Increment or decrement?

What your teacher have said was some oblique statement without much clarification. It is NOT that decrementing is faster than incrementing but you can create much much faster loop with decrement than with increment.

int i;
for (i = 0; i < 10; i++){
   //something here
}

after compilation (without optimisation) compiled version may look like this (VS2015):

-------- C7 45 B0 00 00 00 00 mov dword ptr ,0
-------- EB 09 jmp labelB
labelA 8B 45 B0 mov eax,dword ptr
-------- 83 C0 01 add eax,1
-------- 89 45 B0 mov dword ptr ,eax
labelB 83 7D B0 0A cmp dword ptr ,0Ah
-------- 7D 02 jge out1
-------- EB EF jmp labelA

The whole loop is 8 instructions (26 bytes). In it - there are actually 6 instructions (17 bytes) with 2 branches. Yes yes I know it can be done better (its just an example).

Now consider this frequent construct which you will often find written by embedded developer:

i = 10;
do{
   //something here
} while (--i);

It also iterates 10 times (yes I know i value is different compared with shown for loop but we care about iteration count here). This may be compiled into this:

00074EBC C7 45 B0 01 00 00 00 mov dword ptr ,1
00074EC3 8B 45 B0 mov eax,dword ptr
00074EC6 83 E8 01 sub eax,1
00074EC9 89 45 B0 mov dword ptr ,eax
00074ECC 75 F5 jne main+0C3h (074EC3h)

5 instructions (18 bytes) and just one branch. Actually there are 4 instruction in the loop (11 bytes).

The best thing is that some CPUs (x86/x64 compatible included) have instruction that may decrement a register, later compare result with zero and perform branch if result is different than zero. Virtually ALL PC cpus implement this instruction. Using it the loop is actually just one (yes one) 2 byte instruction:

00144ECE B9 0A 00 00 00 mov ecx,0Ah
label:
   // something here
00144ED3 E2 FE loop label (0144ED3h) // decrement ecx and jump to label if not zero

Do I have to explain which is faster?

Here is more on the L0 and uops instruction cache:

Quote

http://www.realworldtech.com/haswell-cpu/2/

Sandy Bridge made tremendous strides in improving the front-end and ensuring the smooth delivery of uops to the rest of the pipeline. The biggest improvement was a uop cache that essentially acts as an L0 instruction cache, but contains fixed length decoded uops. The uop cache is virtually addressed and included in the L1 instruction cache. Hitting in the uop cache has several benefits, including reducing the pipeline length by eliminating power hungry instruction decoding stages and enabling an effective throughput of 32B of instructions per cycle. For newer SIMD instructions, the 16B fetch limit was problematic, so the uop cache synergizes nicely with extensions such as AVX.

The Haswell uop cache is the same size and organization as in Sandy Bridge. The uop cache lines hold upto 6 uops, and the cache is organized into 32 sets of 8 cache lines (i.e., 8 way associative). A 32B window of fetched x86 instructions can map to 3 lines within a single way. Hits in the uop cache can deliver 4 uops/cycle and those 4 uops can correspond to 32B of instructions, whereas the traditional front-end cannot process more than 16B/cycle. For performance, the uop cache can hold microcoded instructions as a pointer to microcode, but partial hits are not supported. As with the instruction cache, the decoded uop cache is shared by the active threads.

Troll Buster

newbie

Activity: 42

Merit: 0

Quote from: tspacepilot on July 06, 2017, 03:04:49 PM

Couldn't you, like, fix a few of the 'basic level silly choices' in order to strengthen your argument?

As far as I can tell you've been invited to offer improvements rather than just insults, but it seems that you chose to reply with further insults.

If, for some reason, you can't provide a patch but can provide some helpful discussion which might lead to improvements then it seems like you might need to alter your approach.

I'm not worshipping at anyone's "church" here, I'm just noticing the dynamic: you've been invited to prove the worth of your assumptions, but your reply doesn't seem to be headed in that direction.

By "you can't provide a patch" you mean things like the Intel sha256 patch I posted at the end?

LOL what kind of bullshit echo chamber is this? You guys are funny.