Pages:
Author

Topic: How to pull big data from the blockchain? (Read 1819 times)

legendary
Activity: 2128
Merit: 1074
September 23, 2016, 06:31:08 PM
#23
You lost me? Is that a line from Minority Report?
Yes, not a line, but a reference.

That also still does not answer my question. Maybe something was lost in translation. Say a person wanted to find out how many bitcoins were "moved" in a 24 hour period on the blockchain, or a 7 day period or 30 days, so they could compare that day, week, month to previous days weeks and months, are you saying that the data in the blockchain which would contain the amount of bitcoins moved or involved in transactions can not be pulled from the blockchain itself anymore?
When you move a $20 bill from your left pocket to your right pocket do you consider that a transaction?

When you pay for a $3 pumpkin spice latte with a $20 bill what is the transacted volume?
member
Activity: 72
Merit: 10
September 23, 2016, 06:25:56 PM
#22
That also still does not answer my question. Maybe something was lost in translation. Say a person wanted to find out how many bitcoins were "moved" in a 24 hour period on the blockchain, or a 7 day period or 30 days, so they could compare that day, week, month to previous days weeks and months, are you saying that the data in the blockchain which would contain the amount of bitcoins moved or involved in transactions can not be pulled from the blockchain itself anymore?
You would be able to see how much Bitcoin was moved. However, it is impossible to tell the difference between Bitcoin sent as payments and Bitcoin just being moved around i.e. change.

Thanks, I am more interested in movements than what was being spent or considered change. I appreciate you clearing that up for me.
staff
Activity: 3458
Merit: 6793
Just writing some code
September 23, 2016, 06:23:57 PM
#21
That also still does not answer my question. Maybe something was lost in translation. Say a person wanted to find out how many bitcoins were "moved" in a 24 hour period on the blockchain, or a 7 day period or 30 days, so they could compare that day, week, month to previous days weeks and months, are you saying that the data in the blockchain which would contain the amount of bitcoins moved or involved in transactions can not be pulled from the blockchain itself anymore?
You would be able to see how much Bitcoin was moved. However, it is impossible to tell the difference between Bitcoin sent as payments and Bitcoin just being moved around i.e. change.
member
Activity: 72
Merit: 10
September 23, 2016, 06:22:09 PM
#20
Are you saying you cannot accurately tell how many bitcoins are sent in a given period any more? If so, then bitcoin is far more interesting as a fiat alternative than i could have ever imagined. 
Yep, my nose can sniff pre-crime before the criminal can even conceive the details of the future crime.  Cheesy

And I don't live in a pool of physiological salt solution.  Grin


You lost me? Is that a line from Minority Report?


That also still does not answer my question. Maybe something was lost in translation. Say a person wanted to find out how many bitcoins were "moved" in a 24 hour period on the blockchain, or a 7 day period or 30 days, so they could compare that day, week, month to previous days weeks and months, are you saying that the data in the blockchain which would contain the amount of bitcoins moved or involved in transactions can not be pulled from the blockchain itself anymore?
legendary
Activity: 2128
Merit: 1074
September 23, 2016, 05:25:40 PM
#19
Are you saying you cannot accurately tell how many bitcoins are sent in a given period any more? If so, then bitcoin is far more interesting as a fiat alternative than i could have ever imagined. 
Yep, my nose can sniff pre-crime before the criminal can even conceive the details of the future crime.  Cheesy

And I don't live in a pool of physiological salt solution.  Grin
member
Activity: 72
Merit: 10
September 23, 2016, 05:20:38 PM
#18
I'm not even going to pretend i understood all of that as although i have heard of MtGox in passing, i know nothing of Dwolla, as i am fairly new (less than a year) to bitcoin. Are you saying that somehow they used Excel spreadsheets to scam bitcoins from them? It was my understanding that in order to steal bitcoins (without getting someone to send them to you then not delivering a good) you needed to have the private key, and thats what made BTC secure.

Not that i am a super genius or anything, but in my short experience, i dont see how harvesting data from the blockchain allows you to steal bitcoins. Isn't that data available to anyone with the right piece of software? Maybe you could fill me in a little more, not that i am trying to pry, but by what you have stated about scammers asking about how to pull data from the chain, i don't see how having that data would allow them to scam. Any insight would be appreciated, In my short time with bitcoin, I realized how little I know, and how much I have yet to learn, which was part of my curiosity about all the data in the blockchain.
The case MtGox vs. Dwolla wasn't about stealing bitcoins, it was about stealing dollars. They had some sort of net-transfer agreement to needlessly transferring dollars back and forth. The reports/invoices used in those bank transfers were created using bogus harvesting algorithms. The bogusness wasn't complete, it sums in the spreadsheets were close enough to real ones that the respective CEOs/CFOs signed them off as "true and accurate under the penalty of perjury". I don't remember further details, they were briefly described in some exhibits to that lawsuit.

Ask yourself a question: you were planning on computing some sort of "transaction volume" aggregate. The bugs that allowed to distinguish "payment" output from "change" output were fixed years ago, when Hal Finney was still alive. So what were you planing on doing to truthfully compute that in 2016?


Are you saying you cannot accurately tell how many bitcoins are sent in a given period any more? If so, then bitcoin is far more interesting as a fiat alternative than i could have ever imagined. 
legendary
Activity: 2128
Merit: 1074
September 23, 2016, 05:04:58 PM
#17
I'm not even going to pretend i understood all of that as although i have heard of MtGox in passing, i know nothing of Dwolla, as i am fairly new (less than a year) to bitcoin. Are you saying that somehow they used Excel spreadsheets to scam bitcoins from them? It was my understanding that in order to steal bitcoins (without getting someone to send them to you then not delivering a good) you needed to have the private key, and thats what made BTC secure.

Not that i am a super genius or anything, but in my short experience, i dont see how harvesting data from the blockchain allows you to steal bitcoins. Isn't that data available to anyone with the right piece of software? Maybe you could fill me in a little more, not that i am trying to pry, but by what you have stated about scammers asking about how to pull data from the chain, i don't see how having that data would allow them to scam. Any insight would be appreciated, In my short time with bitcoin, I realized how little I know, and how much I have yet to learn, which was part of my curiosity about all the data in the blockchain.
The case MtGox vs. Dwolla wasn't about stealing bitcoins, it was about stealing dollars. They had some sort of net-transfer agreement to needlessly transferring dollars back and forth. The reports/invoices used in those bank transfers were created using bogus harvesting algorithms. The bogusness wasn't complete, it sums in the spreadsheets were close enough to real ones that the respective CEOs/CFOs signed them off as "true and accurate under the penalty of perjury". I don't remember further details, they were briefly described in some exhibits to that lawsuit.

Ask yourself a question: you were planning on computing some sort of "transaction volume" aggregate. The bugs that allowed to distinguish "payment" output from "change" output were fixed years ago, when Hal Finney was still alive. So what were you planing on doing to truthfully compute that in 2016?
member
Activity: 72
Merit: 10
September 23, 2016, 04:47:00 PM
#16
Thanks for taking the time to be considerate on not write me off like a nut job as most people on this forum would have. You are of rare kind good sir!
This is the reality of the Bitcoin milieu. There are many people coming to this board with request of supporting strange combinations of tools. Frequently this is because Bitcoin's (near-)immutability has created a problem with somebody's preexisting fraud or scam.

In my memory the most visible case of importing Bitcoin data to Microsoft Excel was highlighted by the MtGox vs. Dwolla lawsuit. I'm not going to try to pass the judgement on who was to trying to scam who in that case. But spreadsheet's easy mutability makes them perfect tool for perpetuating scams.

This is not to say that you are a scammer. But so many scammers have asked similar questions to yours that now the onus is on you to differentiate yourself from the previous people.


I'm not even going to pretend i understood all of that as although i have heard of MtGox in passing, i know nothing of Dwolla, as i am fairly new (less than a year) to bitcoin. Are you saying that somehow they used Excel spreadsheets to scam bitcoins from them? It was my understanding that in order to steal bitcoins (without getting someone to send them to you then not delivering a good) you needed to have the private key, and thats what made BTC secure.

Not that i am a super genius or anything, but in my short experience, i dont see how harvesting data from the blockchain allows you to steal bitcoins. Isn't that data available to anyone with the right piece of software? Maybe you could fill me in a little more, not that i am trying to pry, but by what you have stated about scammers asking about how to pull data from the chain, i don't see how having that data would allow them to scam. Any insight would be appreciated, In my short time with bitcoin, I realized how little I know, and how much I have yet to learn, which was part of my curiosity about all the data in the blockchain.
legendary
Activity: 2128
Merit: 1074
September 23, 2016, 04:34:25 PM
#15
Thanks for taking the time to be considerate on not write me off like a nut job as most people on this forum would have. You are of rare kind good sir!
This is the reality of the Bitcoin milieu. There are many people coming to this board with request of supporting strange combinations of tools. Frequently this is because Bitcoin's (near-)immutability has created a problem with somebody's preexisting fraud or scam.

In my memory the most visible case of importing Bitcoin data to Microsoft Excel was highlighted by the MtGox vs. Dwolla lawsuit. I'm not going to try to pass the judgement on who was to trying to scam who in that case. But spreadsheet's easy mutability makes them perfect tool for perpetuating scams.

This is not to say that you are a scammer. But so many scammers have asked similar questions to yours that now the onus is on you to differentiate yourself from the previous people.
member
Activity: 72
Merit: 10
September 23, 2016, 03:38:22 PM
#14
Thanks for taking the time to be considerate on not write me off like a nut job as most people on this forum would have. You are of rare kind good sir!
legendary
Activity: 2128
Merit: 1074
September 23, 2016, 02:07:25 PM
#13
Thank you very much for the clarification.

Sorry, I didn't think any of that info was relevant at the time as from what I saw on this forum, too much info often results in people passing over your post, and because I am not looking to hire anyone to do it for me, more so just to find out if there are more user friendly options out there, preferably windows based.


To answer your question, first starts with a little more explanation, Sure the blockchain certainly qualifies as "big data" in its entirety, but the data I hope to extract from it is very far from "big" in comparison, transaction volume, lists of addresses used, positive balance addresses etc. still would only number in the few millions of entries, Microsoft Excel has a max row height of 1,048,576 rows, so a few page workbook could handle most of the data, although most often for analytical purposes I use Microsoft Access. Formatting the data once extracted is not such a big deal as typically i can find someone on Fiverr to write a simple java app that would parse and convert the data from the format I receive it in to the format I need to import it into excel, access, or on occasion tanagra.


Again, that being said, if your reply was in hopes of finding a paying client, I am not the one. At most I spend on my projects of curiosity is $5 - $25 for someone on fiver to do something for me. This thread was more to find out what if anything is available in the bitcoin space that is not solely linux or advanced computer programmer level stuff.

No, I am not looking for any clients. I was just trying to get a better "sniff test" on who's asking.

In my experience, the reason that you aren't getting any suitable answers is that you don't look like somebody who isn't interested in learning anything new. Microsoft Office could be actually quite helpful, but you would have to learn how to set up proper Microsoft SQL backend for Microsoft Access and utilize the OLAP functionality there. With the clarification above you made clear that you even aren't a wannabe in big data.

I originally thought that you are something akin to a new instance of the German crackpot LvM (https://bitcointalksearch.org/user/lvm-103358) with his "BTC violates GAAP, result a mess." (https://bitcointalksearch.org/topic/btc-violates-gaap-result-a-mess-211835) . But then you wrote like somebody who is forced to use Windows because of the organizational constraints, not because you just "don't wanna".

Thanks again.

Edit: Fixed grammar: double negation removed.

 
member
Activity: 72
Merit: 10
September 23, 2016, 12:42:55 PM
#12
Thanks a bunch, i tried google, bing, and stackexchange extensivly over the past 24 hours to look for a windows version with no resolve, it would be awesome if such a thing was out there for us lay-persons.
You know, to get better help you should describe the tools you are planning on using to analyze the data once you get it. What kind of software you already have and are familiar with? What kind of hardware do you have available? Have you ever actually processed some really big data-sets on your own, without using external database?

In my previous company we had certain class of prospective customers that are just not worth dealing with, not even worth the salesperson time spend talking. Somewhat contrived example to maintain privacy: a department has site license for a non-current version of SPSS for 32-bit Windows, and always inputted the data either in the old dBaseIII+ format (2GB limitation) or used Oracle ODBC driver for 32-bit Windows (with the actual data on the mainframe/cluster). This is the type that isn't worth helping, even though they have budget to theoretically buy a working solution, because they don't understand their own tools and they've been doing bullshit GIGO for years. Any money gained isn't worth the time it would take to support them.



Sorry, I didn't think any of that info was relevant at the time as from what I saw on this forum, too much info often results in people passing over your post, and because I am not looking to hire anyone to do it for me, more so just to find out if there are more user friendly options out there, preferably windows based.


To answer your question, first starts with a little more explanation, Sure the blockchain certainly qualifies as "big data" in its entirety, but the data I hope to extract from it is very far from "big" in comparison, transaction volume, lists of addresses used, positive balance addresses etc. still would only number in the few millions of entries, Microsoft Excel has a max row height of 1,048,576 rows, so a few page workbook could handle most of the data, although most often for analytical purposes I use Microsoft Access. Formatting the data once extracted is not such a big deal as typically i can find someone on Fiverr to write a simple java app that would parse and convert the data from the format I receive it in to the format I need to import it into excel, access, or on occasion tanagra.


Again, that being said, if your reply was in hopes of finding a paying client, I am not the one. At most I spend on my projects of curiosity is $5 - $25 for someone on fiver to do something for me. This thread was more to find out what if anything is available in the bitcoin space that is not solely linux or advanced computer programmer level stuff.

legendary
Activity: 2128
Merit: 1074
September 23, 2016, 12:04:00 PM
#11
Thanks a bunch, i tried google, bing, and stackexchange extensivly over the past 24 hours to look for a windows version with no resolve, it would be awesome if such a thing was out there for us lay-persons.
You know, to get better help you should describe the tools you are planning on using to analyze the data once you get it. What kind of software you already have and are familiar with? What kind of hardware do you have available? Have you ever actually processed some really big data-sets on your own, without using external database?

In my previous company we had certain class of prospective customers that are just not worth dealing with, not even worth the salesperson time spend talking. Somewhat contrived example to maintain privacy: a department has site license for a non-current version of SPSS for 32-bit Windows, and always inputted the data either in the old dBaseIII+ format (2GB limitation) or used Oracle ODBC driver for 32-bit Windows (with the actual data on the mainframe/cluster). This is the type that isn't worth helping, even though they have budget to theoretically buy a working solution, because they don't understand their own tools and they've been doing bullshit GIGO for years. Any money gained isn't worth the time it would take to support them.
member
Activity: 72
Merit: 10
September 23, 2016, 08:50:52 AM
#10
Spent a few days trying to figure it out with block parser and abe but the linx stuff seems to be over my head by a long shot. Is there any windows based solutions out there?

Apparently, there should be a windows binary of blockparser somewhere on the web:
https://github.com/znort987/blockparser/blob/master/README.md

Quote
If you are unfortunate enough to still have to use windows, there is a port floating somehwere on github.

If i have time i'll try to use the github searchfunction later on...


Thanks a bunch, i tried google, bing, and stackexchange extensivly over the past 24 hours to look for a windows version with no resolve, it would be awesome if such a thing was out there for us lay-persons.
hero member
Activity: 896
Merit: 1006
September 23, 2016, 08:12:15 AM
#9
Spent a few days trying to figure it out with block parser and abe but the linx stuff seems to be over my head by a long shot. Is there any windows based solutions out there?

Apparently, there should be a windows binary of blockparser somewhere on the web:
https://github.com/znort987/blockparser/blob/master/README.md

Quote
If you are unfortunate enough to still have to use windows, there is a port floating somehwere on github.

If i have time i'll try to use the github searchfunction later on...
member
Activity: 72
Merit: 10
September 23, 2016, 07:44:55 AM
#8
Spent a few days trying to figure it out with block parser and abe but the linx stuff seems to be over my head by a long shot. Is there any windows based solutions out there?
hero member
Activity: 896
Merit: 1006
September 20, 2016, 03:20:09 AM
#7
Basically, either download and compile blockparser, or download and install ABE...
The second option will allow you to parse the blockchain and put everyting in a nice relational database...
The parsing itself takes weeks tough.
Weeks? Are you joking?

Take this code. It parses the whole bitcoin blockchain ( 630 blk-files 128 mb each ) in less than an hour.
(Of course, the parser does no ECDSA verification and SHA256d checking )

Code:
#include
#include

#include "BlockChain.h"

BlockChain::BlockChain ( const int start, QObject* parent ) : QFile ( parent ), blkFile ( 0 )
{
  QTimer::singleShot ( 0, this, SLOT ( start ( ) ) );
}
//--------------------------------------------------------------
void BlockChain::start ( )
{
  setFileName ( blkFileName ( blkFile++ ) );
  if ( !open ( QIODevice::ReadOnly ) )
  {
    _trace ( QString ( "cant open [%1]" ).arg ( fileName ( ) ) );
    getParent ( ).block ( QByteArray ( ) ); // quit signal - empty block
    deleteLater ( );
  }
  else
  {
    _trace ( QString ( "processing [%1]" ).arg ( fileName ( ) ) );
    QTimer::singleShot ( 0, this, SLOT ( next ( ) ) );
  }
}
//--------------------------------------------------------------
void BlockChain::next ( )
{
  bool lock ( true );
  if ( pos ( ) < size ( ) )
  {
    quint32 magic;
    quint32 size ( read ( (char*)&magic, 4 ) );
    xassert ( ( ( magic == MAGIC_ID ) || !magic ) && ( size == 4 ) );
    if ( magic )
    {
      read ( (char*)&size, 4 );
      xassert ( size > HEADER_SIZE && size <= MAX_BLOCK_SIZE );
      getParent ( ).block ( read ( size ) );             // callback to block parser
      QTimer::singleShot ( 0, this, SLOT ( next ( ) ) );
      return;
    }
    else
      lock = false;
  }
  close ( );
  getParent ( ).doneFile ( lock, blkFile - 1 );           // callback about eof
  QTimer::singleShot ( 0, this, SLOT ( start ( ) ) );     // goto next file
}
//--------------------------------------------------------------
const QString BlockChain::blkFileName ( const int i ) // [inline static]
{
  return
    ( i < 10 ) ? QString ( DATA_ROOT "\\blk0000%1.dat" ).arg ( i ) :
    ( i < 100 ) ? QString ( DATA_ROOT "\\blk000%1.dat" ).arg ( i ) :
    QString ( DATA_ROOT "\\blk00%1.dat" ).arg ( i );
}

This script would be even better than my sollution (untested)... I'm not joking tough, building the ABE database takes weeks... It's a slow, single thread, process that executes one insert at a time...
In the end, you do have a nice relational database which you can query any way you like.... It's huge tough.

The other sollution i proposed (compiling blockparser) is also a lot faster, it runs a couple of hours to parse everything, but you end up with a big dumpfile filled with results. I don't think you can use it to generate a nice database.
legendary
Activity: 1260
Merit: 1019
September 19, 2016, 12:13:29 PM
#6
Basically, either download and compile blockparser, or download and install ABE...
The second option will allow you to parse the blockchain and put everyting in a nice relational database...
The parsing itself takes weeks tough.
Weeks? Are you joking?

Take this code. It parses the whole bitcoin blockchain ( 630 blk-files 128 mb each ) in less than an hour.
(Of course, the parser does no ECDSA verification and SHA256d checking )

Code:
#include
#include

#include "BlockChain.h"

BlockChain::BlockChain ( const int start, QObject* parent ) : QFile ( parent ), blkFile ( 0 )
{
  QTimer::singleShot ( 0, this, SLOT ( start ( ) ) );
}
//--------------------------------------------------------------
void BlockChain::start ( )
{
  setFileName ( blkFileName ( blkFile++ ) );
  if ( !open ( QIODevice::ReadOnly ) )
  {
    _trace ( QString ( "cant open [%1]" ).arg ( fileName ( ) ) );
    getParent ( ).block ( QByteArray ( ) ); // quit signal - empty block
    deleteLater ( );
  }
  else
  {
    _trace ( QString ( "processing [%1]" ).arg ( fileName ( ) ) );
    QTimer::singleShot ( 0, this, SLOT ( next ( ) ) );
  }
}
//--------------------------------------------------------------
void BlockChain::next ( )
{
  bool lock ( true );
  if ( pos ( ) < size ( ) )
  {
    quint32 magic;
    quint32 size ( read ( (char*)&magic, 4 ) );
    xassert ( ( ( magic == MAGIC_ID ) || !magic ) && ( size == 4 ) );
    if ( magic )
    {
      read ( (char*)&size, 4 );
      xassert ( size > HEADER_SIZE && size <= MAX_BLOCK_SIZE );
      getParent ( ).block ( read ( size ) );             // callback to block parser
      QTimer::singleShot ( 0, this, SLOT ( next ( ) ) );
      return;
    }
    else
      lock = false;
  }
  close ( );
  getParent ( ).doneFile ( lock, blkFile - 1 );           // callback about eof
  QTimer::singleShot ( 0, this, SLOT ( start ( ) ) );     // goto next file
}
//--------------------------------------------------------------
const QString BlockChain::blkFileName ( const int i ) // [inline static]
{
  return
    ( i < 10 ) ? QString ( DATA_ROOT "\\blk0000%1.dat" ).arg ( i ) :
    ( i < 100 ) ? QString ( DATA_ROOT "\\blk000%1.dat" ).arg ( i ) :
    QString ( DATA_ROOT "\\blk00%1.dat" ).arg ( i );
}
member
Activity: 72
Merit: 10
September 19, 2016, 11:59:01 AM
#5
This topic might help you:
https://bitcointalk.org/index.php?topic=267618.0;topicseen

Basically, either download and compile blockparser, or download and install ABE... The second option will allow you to parse the blockchain and put everyting in a nice relational database... The parsing itself takes weeks tough.


Thanks a bunch !!
hero member
Activity: 896
Merit: 1006
September 19, 2016, 07:50:12 AM
#4
This topic might help you:
https://bitcointalk.org/index.php?topic=267618.0;topicseen

Basically, either download and compile blockparser, or download and install ABE... The second option will allow you to parse the blockchain and put everyting in a nice relational database... The parsing itself takes weeks tough.
Pages:
Jump to: