So I would assume it has to do with the file system as well. I guess not all systems keep data written in the same way.
I don't think anyone would count file system overhead as blockchain size.
People will do lots of things that you wouldn't think they'd do.
I'm not sure about this but maybe it's related to segwit stuff?
The real blockchain includes all the segwit data. Excluding that would be very wrong, and by now segwit has existed for 10 months. I don't think these sites would be that negligent.
There are plenty of sites that are negligent in many ways. I wouldn't be surprised if some of the sites you are looking at are "that negligent".
maybe this is the cause of the discrepancies, byte vs vbyte?
vbyte
should be just the number used for fee calculations instead of the actual byte size (
weight/4).
"Should be" doesn't mean "is by this site". Unless a site has given you undeniable proof that they are doing something a particular way, you should consider the possibility that they are doing things in a way that you don't think they should.
A naive way of measuring would be to just take the size of the datadir for a bitcoind instance.
Sounds too crude to be likely?
There are plenty of sites that are crude in many ways. I wouldn't be surprised if some of the sites you are looking at are likely to do something that crude.
may receive more orphan blocks than another site which means that they are storing more blocks on disk. They may be measuring this as well
That sounds possible.
Correct. There is no such thing as
THE blockchain. Everyone has their own blockchain. Each individual's blockchain is the result of that individual's experiences while connected to the bitcoin network.
So there is no site known to be fully accurate in its reporting, or at least more accurate than the rest?
They are all accurately reporting exactly what they've chosen to report. You'll need to talk to the programmers of each site to understand exactly what process they are using, how they are generating their calculations, and what decisions they've made.
Then, once you've collected all that information from them (if you can find them and they are even willing to share it), then you can decide for yourself whose count is the closest to the way that YOU would want the size counted.