Pages:
Author

Topic: Bitcoin based Blockchain compression algorithm - page 2. (Read 3705 times)

hero member
Activity: 980
Merit: 1010
Blockchain engineer
If you want to see this in Bitcoin Core, I suggest that you open a pull request with your changes at https://github.com/bitcoin/bitcoin/pulls. Then see how the discussion goes with the actual developers of Core. However, since 0.13 has reached its feature freeze, your change would not make it to a release until 0.14 at the earliest, which will be in roughly 6 months.

Yes, we hope that the bitcoin development team use this compression algorithm,
Thank you very much for your advice.
staff
Activity: 3458
Merit: 6793
Just writing some code
If you want to see this in Bitcoin Core, I suggest that you open a pull request with your changes at https://github.com/bitcoin/bitcoin/pulls. Then see how the discussion goes with the actual developers of Core. However, since 0.13 has reached its feature freeze, your change would not make it to a release until 0.14 at the earliest, which will be in roughly 6 months.
hero member
Activity: 980
Merit: 1010
Blockchain engineer
But is it better than Pied Piper's compression algorithm?

I don't have compare other people's algorithms,
Maybe you can make a comparison to see which of the compression ratio is higher.

Moreover, our algorithm can save the same network traffic through the transmission compressed blocks protocol.
legendary
Activity: 2786
Merit: 1031
But is it better than Pied Piper's compression algorithm?
hero member
Activity: 980
Merit: 1010
Blockchain engineer
Yesterday, I ported the compression algorithm code to bitcoin version 0.8.6,
The compression effect is obvious.

In order to view the compression effect, i changed the functions inside main.cpp.
Each block file (blkxxxxx.dat) contains 10,000 blocks.

Code:
FindBlockPos bool (&state CValidationState, &pos CDiskBlockPos, int nAddSize unsigned, int nHeight unsigned, nTime Uint64, fKnown bool = false)
{
...
        /* while (infoLastBlockFile.nSize + nAddSize >= MAX_BLOCKFILE_SIZE) { */
        if( ((nHeight / 10000) > 0) && ((nHeight % 10000) == 0) ) {
            printf("nHeight = [%d], Leaving block file %i: %s\n", nHeight, nLastBlockFile, infoLastBlockFile.ToString().c_str());
            FlushBlockFile(true);
            nLastBlockFile++;
            infoLastBlockFile.SetNull();
            pblocktree->ReadBlockFileInfo(nLastBlockFile, infoLastBlockFile); // check whether data for the new file somehow already exist; can fail just fine
            fUpdatedLast = true;
        }
...
}

Here is the test data:

blk00000.dat (include 0 ~ 9999 blocks), Original size is 2,318,345 bytes, After compression is 2,116,328 bytes, compression ratio is 8.7%,
blk00001.dat (include 10000 ~ 19999 blocks), Original size is 2,303,141 bytes, After compression is 2,103,239 bytes, compression ratio is 8.6%,
blk00002.dat (include 20000 ~ 29999 blocks), Original size is 2,440,262 bytes, After compression is 2,224,608 bytes, compression ratio is 8.8%,
blk00003.dat (include 30000 ~ 39999 blocks), Original size is 2,500,372 bytes, After compression is 2,278,627 bytes, compression ratio is 8.86%,
blk00004.dat (include 40000 ~ 49999 blocks), Original size is 2,775,946 bytes, After compression is 2,527,266 bytes, compression ratio is 8.95%,
blk00005.dat (include 50000 ~ 59999 blocks), Original size is 4,611,316 bytes, After compression is 3,927,464 bytes, compression ratio is 14.8%,
blk00006.dat (include 60000 ~ 69999 blocks), Original size is 6,788,315 bytes, After compression is 5,763,507 bytes, compression ratio is 15%,
blk00007.dat (include 70000 ~ 79999 blocks), Original size is 8,111,206 bytes, After compression is 6,493,703 bytes, compression ratio is 19.9%,
blk00008.dat (include 80000 ~ 89999 blocks), Original size is 7,963,189 bytes, After compression is 7,048,131 bytes, compression ratio is 11.49%,
blk00009.dat (include 90000 ~ 99999 blocks), Original size is 20,742,813 bytes, After compression is 13,708,206 bytes, compression ratio is 33.9%,
blk00010.dat (include 100000 ~ 109999 blocks), Original size is 23,122,509 bytes, After compression is 19,481,570 bytes, compression ratio is 15.7%,
blk00011.dat (include 110000 ~ 119999 blocks), Original size is 50,681,392 bytes, After compression is 40,918,962 bytes, compression ratio is 19.2%,
blk00012.dat (include 120000 ~ 129999 blocks), Original size is 107,469,564 bytes, After compression is 88,319,322 bytes, compression ratio is 17.8%,
blk00013.dat (include 130000 ~ 139999 blocks), Original size is 231,631,119 bytes, After compression is 188,562,481 bytes, compression ratio is 18.59%,
blk00014.dat (include 140000 ~ 149999 blocks), Original size is 215,720,950 bytes, After compression is 174,676,348 bytes, compression ratio is 19%,
blk00015.dat (include 150000 ~ 159999 blocks), Original size is 173,452,632 bytes, After compression is 139,074,101 bytes, compression ratio is 19.8%,
blk00016.dat (include 160000 ~ 169999 blocks), Original size is 212,377,235 bytes, After compression is 164,287,461 bytes, compression ratio is 22.6%,
blk00017.dat (include 170000 ~ 179999 blocks), Original size is 263,652,393 bytes, After compression is 205,578,322 bytes, compression ratio is 22%,
blk00018.dat (include 180000 ~ 189999 blocks), Original size is 887,112,287 bytes, After compression is 612,296,114 bytes, compression ratio is 30.9%,
blk00019.dat (include 190000 ~ 199999 blocks), Original size is 925,036,513 bytes, After compression is 638,670,092 bytes, compression ratio is 30.9%,
hero member
Activity: 980
Merit: 1010
Blockchain engineer
Original Bitcoin Genesis Block Hex Code:


Compression Bitcoin Genesis Block Hex Code:
hero member
Activity: 980
Merit: 1010
Blockchain engineer
In the windows environment, the impact of CPU can be ignored,
It can save 20%~25% and even more disk space,

My harddisk is 1TB
Right now 328 GB is free space. I think that this would be enough for 3-5 years for me.
What is a reason to compress blockchain and increase the work for CPU?
I see no reasons for compressing data on disk.

This algorithm not only saves disk space, it can also save the same network traffic. Double Smiley
It's true that disk space is not a big problem for nodes, but traffic is ! Saving 25%+ on traffic can be really interesting !
I guess it sends compressed blocks to other clients using the same protocol.

Yes, you're right.
hero member
Activity: 616
Merit: 503
★Bitvest.io★ Play Plinko or Invest!
In the windows environment, the impact of CPU can be ignored,
It can save 20%~25% and even more disk space,

My harddisk is 1TB
Right now 328 GB is free space. I think that this would be enough for 3-5 years for me.
What is a reason to compress blockchain and increase the work for CPU?
I see no reasons for compressing data on disk.

This algorithm not only saves disk space, it can also save the same network traffic. Double Smiley
It's true that disk space is not a big problem for nodes, but traffic is ! Saving 25%+ on traffic can be really interesting !
I guess it sends compressed blocks to other clients using the same protocol.
hero member
Activity: 980
Merit: 1010
Blockchain engineer
In the windows environment, the impact of CPU can be ignored,
It can save 20%~25% and even more disk space,

My harddisk is 1TB
Right now 328 GB is free space. I think that this would be enough for 3-5 years for me.
What is a reason to compress blockchain and increase the work for CPU?
I see no reasons for compressing data on disk.

This algorithm not only saves disk space, it can also save the same network traffic. Double Smiley
hero member
Activity: 980
Merit: 1010
Blockchain engineer
Compression features usually come with an increase of computational power. Have you done any tests to see how much more CPU power would someone need to run this and how much disk space would be saved ?
i store some blockchain data in squashfs (with xz compression), it store 75G data in a 57G file, so compression rate is 24%.
cpu usage is indiscernible.

This algorithm is dynamic compression and decompression of each block,
The greater the block, the higher the compression rate.
hero member
Activity: 525
Merit: 531
Compression features usually come with an increase of computational power. Have you done any tests to see how much more CPU power would someone need to run this and how much disk space would be saved ?
i store some blockchain data in squashfs (with xz compression), it store 75G data in a 57G file, so compression rate is 24%.
cpu usage is indiscernible.
legendary
Activity: 1260
Merit: 1019
In the windows environment, the impact of CPU can be ignored,
It can save 20%~25% and even more disk space,

My harddisk is 1TB
Right now 328 GB is free space. I think that this would be enough for 3-5 years for me.
What is a reason to compress blockchain and increase the work for CPU?
I see no reasons for compressing data on disk.
hero member
Activity: 980
Merit: 1010
Blockchain engineer
Compression features usually come with an increase of computational power. Have you done any tests to see how much more CPU power would someone need to run this and how much disk space would be saved ?

The compression algorithm has been used in Vpncoin with LZMA(7zip),
In the windows environment, the impact of CPU can be ignored,
It can save 20%~25% and even more disk space,
As you know, the more content, the higher the compression rate,
The bigger the block, the higher the compression rate,
I think there's a higher compression rate on bitcoin,
Because bitcoin's block is relatively large.
hero member
Activity: 980
Merit: 1010
Blockchain engineer
I can't understand, does it could be used with bitcoin core?

Yes.
hero member
Activity: 616
Merit: 503
★Bitvest.io★ Play Plinko or Invest!
Compression features usually come with an increase of computational power. Have you done any tests to see how much more CPU power would someone need to run this and how much disk space would be saved ?
member
Activity: 98
Merit: 10
I can't understand, does it could be used with bitcoin core?
hero member
Activity: 980
Merit: 1010
Blockchain engineer
Some correlation function:
 
Code:
#include "lz4/lz4.h"
#include "lzma/LzmaLib.h"

int StreamToBuffer(CDataStream &ds, string& sRzt, int iSaveBufSize)
{
int bsz = ds.size();
int iRsz = bsz;
if( iSaveBufSize > 0 ){ iRsz = iRsz + 4; }
sRzt.resize(iRsz);
char* ppp = (char*)sRzt.c_str();
if( iSaveBufSize > 0 ){ ppp = ppp + 4; }
ds.read(ppp, bsz);
if( iSaveBufSize > 0 ){ *(unsigned int *)(ppp - 4) = bsz; }
return iRsz;
}

int CBlockToBuffer(CBlock *pb, string& sRzt)
{
CDataStream ssBlock(SER_DISK, CLIENT_VERSION);
ssBlock << (*pb);
int bsz = StreamToBuffer(ssBlock, sRzt, 0);
return bsz;
}

int writeBufToFile(char* pBuf, int bufLen, string fName)
{
int rzt = 0;
std::ofstream oFs(fName.c_str(), std::ios::out | std::ofstream::binary);
if( oFs.is_open() )
{
if( pBuf ) oFs.write(pBuf, bufLen);
oFs.close();
rzt++;
}
return rzt;
}

int lz4_pack_buf(char* pBuf, int bufLen, string& sRzt)
{
int worstCase = 0;
int lenComp = 0;
    try{
worstCase = LZ4_compressBound( bufLen );
//std::vector vchCompressed;   //vchCompressed.resize(worstCase);
sRzt.resize(worstCase + 4);
char* pp = (char *)sRzt.c_str();
lenComp = LZ4_compress(pBuf, pp + 4, bufLen);
if( lenComp > 0 ){ *(unsigned int *)pp = bufLen;   lenComp = lenComp + 4; }
}
    catch (std::exception &e) {
        printf("lz4_pack_buf err [%s]:: buf len %d, worstCase[%d], lenComp[%d] \n", e.what(), bufLen, worstCase, lenComp);
    }
return lenComp;
}

int lz4_unpack_buf(const char* pZipBuf, unsigned int zipLen, string& sRzt)
{
int rzt = 0;
unsigned int realSz = *(unsigned int *)pZipBuf;
if( fDebug )printf("lz4_unpack_buf:: zipLen [%d], realSz [%d],  \n", zipLen, realSz);
sRzt.resize(realSz);
char* pOutData = (char*)sRzt.c_str();

    // -- decompress
rzt = LZ4_decompress_safe(pZipBuf + 4, pOutData, zipLen, realSz);
    if ( rzt != (int) realSz)
    {
            if( fDebug )printf("lz4_unpack_buf:: Could not decompress message data. [%d :: %d] \n", rzt, realSz);
            sRzt.resize(0);
    }
return rzt;
}

int CBlockFromBuffer(CBlock* block, char* pBuf, int bufLen)
{
CDataStream ssBlock(SER_DISK, CLIENT_VERSION);
ssBlock.write(pBuf, bufLen);   int i = ssBlock.size();
ssBlock >> (*block);
return i;
}

int lz4_pack_block(CBlock* block, string& sRzt)
{
int rzt = 0;
string sbf;
int bsz = CBlockToBuffer(block, sbf);
if( bsz > 12 )
{
char* pBuf = (char*)sbf.c_str();
rzt = lz4_pack_buf(pBuf, bsz, sRzt);
//if( lzRzt > 0 ){ rzt = lzRzt; }  // + 4; }
}
sbf.resize(0);
return rzt;
}

int lzma_depack_buf(unsigned char* pLzmaBuf, int bufLen, string& sRzt)
{
int rzt = 0;
unsigned int dstLen = *(unsigned int *)pLzmaBuf;
    sRzt.resize(dstLen);
unsigned char* pOutBuf = (unsigned char*)sRzt.c_str();
    unsigned srcLen = bufLen - LZMA_PROPS_SIZE - 4;
SRes res = LzmaUncompress(pOutBuf, &dstLen, &pLzmaBuf[LZMA_PROPS_SIZE + 4], &srcLen, &pLzmaBuf[4], LZMA_PROPS_SIZE);
if( res == SZ_OK )//assert(res == SZ_OK);
{
//outBuf.resize(dstLen); // If uncompressed data can be smaller
rzt = dstLen;
}else sRzt.resize(0);
if( fDebug ) printf("lzma_depack_buf:: res [%d], dstLen[%d],  rzt = [%d]\n", res, dstLen, rzt);
return rzt;
}

int lzma_pack_buf(unsigned char* pBuf, int bufLen, string& sRzt, int iLevel, unsigned int iDictSize)  // (1 << 17) = 131072 = 128K
{
int res = 0;
int rzt = 0;
unsigned propsSize = LZMA_PROPS_SIZE;
unsigned destLen = bufLen + (bufLen / 3) + 128;
    try{
sRzt.resize(propsSize + destLen + 4);
unsigned char* pOutBuf = (unsigned char*)sRzt.c_str();

res = LzmaCompress(&pOutBuf[LZMA_PROPS_SIZE + 4], &destLen, pBuf, bufLen, &pOutBuf[4], &propsSize,
                                      iLevel, iDictSize, -1, -1, -1, -1, -1);  // 1 << 14 = 16K, 1 << 16 = 64K
  
//assert(propsSize == LZMA_PROPS_SIZE);
//assert(res == SZ_OK);
if( (res == SZ_OK) && (propsSize == LZMA_PROPS_SIZE) )  
{
//outBuf.resize(propsSize + destLen);
*(unsigned int *)pOutBuf = bufLen;
rzt = propsSize + destLen + 4;
}else sRzt.resize(0);

}
    catch (std::exception &e) {
        printf("lzma_pack_buf err [%s]:: buf len %d, rzt[%d] \n", e.what(), bufLen, rzt);
    }
if( fDebug ) printf("lzma_pack_buf:: res [%d], propsSize[%d], destLen[%d],  rzt = [%d]\n", res, propsSize, destLen, rzt);
return rzt;
}

int lzma_pack_block(CBlock* block, string& sRzt, int iLevel, unsigned int iDictSize)  // (1 << 17) = 131072 = 128K
{
int rzt = 0;
string sbf;
int bsz = CBlockToBuffer(block, sbf);
if( bsz > 12 )
{
unsigned char* pBuf = (unsigned char*)sbf.c_str();
rzt = lzma_pack_buf(pBuf, bsz, sRzt, iLevel, iDictSize);
//if( lzRzt > 0 ){ rzt = lzRzt; }  // + 4; }
}
sbf.resize(0);
return rzt;
}

int bitnet_pack_block(CBlock* block, string& sRzt)
{
if( dw_zip_block == 1 )  return lzma_pack_block(block, sRzt, 9, uint_256KB);
else if( dw_zip_block == 2 ) return lz4_pack_block(block, sRzt);
}

bool getCBlockByFilePos(CAutoFile filein, unsigned int nBlockPos, CBlock* block)
{
bool rzt = false;
int ips = nBlockPos - 4;  // get ziped block size;
if (fseek(filein, ips, SEEK_SET) != 0)
return error("getCBlockByFilePos:: fseek failed");
filein >> ips; // get ziped block size;
if( fDebug )printf("getCBlockByFilePos:: ziped block size [%d] \n", ips);
string s;   s.resize(ips);   char* pZipBuf = (char *)s.c_str();
filein.read(pZipBuf, ips);
string sUnpak;
int iRealSz;
if( dw_zip_block == 1 ) iRealSz = lzma_depack_buf((unsigned char*)pZipBuf, ips, sUnpak);
else if( dw_zip_block == 2 ) iRealSz = lz4_unpack_buf(pZipBuf, ips - 4, sUnpak);
if( fDebug )printf("getCBlockByFilePos:: ziped block size [%d], iRealSz [%d] \n", ips, iRealSz);
if( iRealSz > 0 )
{
pZipBuf = (char *)sUnpak.c_str();
rzt = CBlockFromBuffer(block, pZipBuf, iRealSz) > 12;
/*if( fDebug ){
if( block->vtx.size() < 10 )
{
printf("\n\n getCBlockByFilePos:: block info (%d): \n", rzt);
block->print();
}else printf("\n\n getCBlockByFilePos:: block vtx count (%d) is too large \n", block->vtx.size());
}*/
}
s.resize(0);   sUnpak.resize(0);
return rzt;
}

bool getCBlocksTxByFilePos(CAutoFile filein, unsigned int nBlockPos, unsigned int txId, CTransaction& tx)
{
bool rzt = false;
CBlock block;
rzt = getCBlockByFilePos(filein, nBlockPos, &block);
if( rzt )
{
if( block.vtx.size() > txId )
{
tx = block.vtx[txId];
if( fDebug ){
printf("\n\n getCBlocksTxByFilePos:: tx info: \n");
tx.print(); }
}else rzt = false;
}
return rzt;
}
hero member
Activity: 980
Merit: 1010
Blockchain engineer
To cut a long story short, i directly show the source code.

Add code to init.cpp
 
Code:
int dw_zip_block = 0;
int dw_zip_limit_size = 0;
int dw_zip_txdb = 0;

bool AppInit2()
{
...
    // ********************************************************* Step 2: parameter interactions

#ifdef WIN32
    dw_zip_block = GetArg("-zipblock", 1);
#else
    /* 7Zip source code in the Linux system needs to improve, It can work, but sometimes it will crash. */
    dw_zip_block = GetArg("-zipblock", 0);
#endif
    dw_zip_limit_size = GetArg("-ziplimitsize", 64);
    dw_zip_txdb = GetArg("-ziptxdb", 0);
    if( dw_zip_block > 1 ){ dw_zip_block = 1; }
    else if( dw_zip_block == 0 ){ dw_zip_txdb = 0; }

...
}
 

Add code to main.h
 
Code:
extern int bitnet_pack_block(CBlock* block, string& sRzt);
extern bool getCBlockByFilePos(CAutoFile filein, unsigned int nBlockPos, CBlock* block);
extern bool getCBlocksTxByFilePos(CAutoFile filein, unsigned int nBlockPos, unsigned int txId, CTransaction& tx);
extern int dw_zip_block;

class CTransaction
{
...
    bool ReadFromDisk(CDiskTxPos pos, FILE** pfileRet=NULL)
    {
        CAutoFile filein = CAutoFile(OpenBlockFile(pos.nFile, 0, pfileRet ? "rb+" : "rb"), SER_DISK, CLIENT_VERSION);
        if (!filein)
            return error("CTransaction::ReadFromDisk() : OpenBlockFile failed");

        if( dw_zip_block > 0 )
{
//if( fDebug ) printf("CTransaction::ReadFromDisk():: pos.nFile [%d], nBlockPos [%d], nTxPos [%d], pfileRet [%d] \n", pos.nFile, pos.nBlockPos, pos.nTxPos, pfileRet);
getCBlocksTxByFilePos(filein, pos.nBlockPos, pos.nTxPos, *this);
}else{
        // Read transaction
        if (fseek(filein, pos.nTxPos, SEEK_SET) != 0)
            return error("CTransaction::ReadFromDisk() : fseek failed");

        try {
            filein >> *this;
        }
        catch (std::exception &e) {
            return error("%s() : deserialize or I/O error", __PRETTY_FUNCTION__);
        }}

        // Return file pointer
        if (pfileRet)
        {
            if (fseek(filein, pos.nTxPos, SEEK_SET) != 0)
                return error("CTransaction::ReadFromDisk() : second fseek failed");
            *pfileRet = filein.release();
        }
        return true;
    }
...
}

class CBlock
{
...
    bool WriteToDisk(unsigned int& nFileRet, unsigned int& nBlockPosRet, bool bForceWrite = false)
    {
        // Open history file to append
        CAutoFile fileout = CAutoFile(AppendBlockFile(nFileRet), SER_DISK, CLIENT_VERSION);
        if (!fileout)
            return error("CBlock::WriteToDisk() : AppendBlockFile failed");

        // Write index header
        unsigned int nSize = fileout.GetSerializeSize(*this);

        int nSize2 = nSize;
string sRzt;
        if( dw_zip_block > 0 )
        {
// compression blcok +++
nSize = bitnet_pack_block(this, sRzt);  // nSize include 4 byte( block Real size )
// compression blcok +++
        }

        fileout << FLATDATA(pchMessageStart) << nSize;

        // Write block
        long fileOutPos = ftell(fileout);
        if (fileOutPos < 0)
            return error("CBlock::WriteToDisk() : ftell failed");
        nBlockPosRet = fileOutPos;

        if( dw_zip_block == 0 ){ fileout << *this; }
        else{
   //if( fDebug ) printf("main.h Block.WriteToDisk:: nFileRet [%d], nBlockSize [%d], zipBlockSize [%d], nBlockPosRet = [%d] \n", nFileRet, nSize2, nSize, nBlockPosRet);
// compression blcok +++
   if( nSize > 0 ){
fileout.write(sRzt.c_str(), nSize);
}
sRzt.resize(0);
// compression blcok +++
        }

        // Flush stdio buffers and commit to disk before returning
        fflush(fileout);
        if( bForceWrite || (!IsInitialBlockDownload() || (nBestHeight+1) % 500 == 0) )
            FileCommit(fileout);

        return true;
    }

    bool ReadFromDisk(unsigned int nFile, unsigned int nBlockPos, bool fReadTransactions=true)
    {
        SetNull();
unsigned int iPos = nBlockPos;
if( dw_zip_block > 0 ){ iPos = 0; }

        // Open history file to read
        CAutoFile filein = CAutoFile(OpenBlockFile(nFile, iPos, "rb"), SER_DISK, CLIENT_VERSION);
        if (!filein)
            return error("CBlock::ReadFromDisk() : OpenBlockFile failed");
        if (!fReadTransactions)
            filein.nType |= SER_BLOCKHEADERONLY;

        // Read block
        try {
            if( dw_zip_block > 0 )
            {
getCBlockByFilePos(filein, nBlockPos, this);
}else{ filein >> *this; }
        }
        catch (std::exception &e) {
            return error("%s() : deserialize or I/O error", __PRETTY_FUNCTION__);
        }

        // Check the header
        if (fReadTransactions && IsProofOfWork() && !CheckProofOfWork(GetPoWHash(), nBits))
            return error("CBlock::ReadFromDisk() : errors in block header");

        return true;
    }

...
}
hero member
Activity: 980
Merit: 1010
Blockchain engineer
To moderator gmaxwell:
These source code just for test, it can compile and run in ubuntu and windows,
And it is compatible with bitcoin, does not fork bitcoin.
Please don't move it, thanks.



Hello,
  I am the Vpncoin's dev, nice to meet you.
We invented a blockchain compression algorithm,
It can be reduce about 25% of the disk space and reduce network traffic,
We are happy to share it and it is free,
And the increase of the source code is compatible, will not fork bitcoin,

The core compression algorithm is LZMA (7zip) and LZ4,
Our compression code has been applied on the Vpncoin, and run stable.
If someone want to use these code,
Please indicate the author (Vpncoin development team, Bit Lee).


If you are interested in this, please post here,
I will publish the relevant source code.
Thanks.


Pages:
Jump to: