This said, would you mind doing a quick overview of your changes and where that puts us in terms of a roadmap, or at least your goals? Could help others find a way to contribute
Could you identify any other prospective things the community could be doing to help? Many of us want to help but since it's hard to keep track of where we're at and where we're going it's hard to tell what needs doing.
For example, is anyone here a mod/admin of /r/datacoin? That sub could use some cleaning up. I'd gladly mod it and do it myself.
Following on from my last post, I've been collaborating with DataSea on an "in wallet file publisher and file retriever for the block chain". That's the significance of the block explorer reporting the size of the stored data. It can "see" the data, so could be programmed to decode it and, if the mimetype of the resulting
binary data (an apparently random sequence of 1s and 0s) can be accurately characterised, display it. Then there's some hope that a file retriever might be feasible.
The problem (of a crippling lack of metadata) is something I failed to communicate clearly enough to Verionum. It's reasonably trivial to hack up a Python script to reveal the state of play simply by iterating over all the txs, logging what it finds:
I constructed a preliminary log (
available from mega.nz), the contents of which follow this format:
block='18587', tx=1, size=16, magic=text/plain,
block='19850', tx=1, size=4, magic=text/plain,
block='19990', tx=1, size=4, magic=application/octet-stream,
block='19996', tx=1, size=24, magic=application/octet-stream,
block='19997', tx=2, size=24, magic=application/octet-stream,
block='19997', tx=3, size=24, magic=application/octet-stream,
block='19997', tx=4, size=24, magic=application/octet-stream,
block='19998', tx=1, size=24, magic=application/octet-stream,
block='25030', tx=1, size=76, magic=application/x-bzip2,
Out of the total txs recorded on the Datacin blockchain (an unknown number but obviously equal to
at least noofblocks, i.e. 3,018,698), there is a grand total of 8243 txs that store data on the Datacoin blockchain. Of those 8243 txs, the overwhelming proportion is of text/plain entries from years ago, carrying (now bitrotted) torrent urls in JSON format:
[{"datetime": "1394508689", "version": "0.1", "uploads": [{"host_name": "zalil_ru", "error": true}, {"url": "http://rghost.net/52974353", "host_name": "rghost"},
{"url": "http://ge.tt/6SxXzpP1/v/0", "host_name": "ge_tt"}], "filehash": "4feae9543dd945195b0ad16ff6bdf47bf8dd6d0cbbbb8aa04ede00e1c6e6a69a", "filesize": 58055}, {"datetime": "1394543728", "version": "0.1", "uploads": [{"url": "http://multiupload.nl/TAKI54M1WF", "host_name": "multiupload"}, {"url": "http://gfile.ru/a4gAx", "host_name": "gfile_ru"}, {"url": "http://rghost.net/52983046", "host_name": "rghost"}], "filehash": "cef6b6655166dc4b38e6488fcd2f6a63793a1859555978b700cadbc1cb4802b5", "filesize": 409569}]
(Latterly, the chain has been storing only asset and notarisation data):
ASSET:{{{{{md5 => c9d89438361ce6ed1fce8f7c50a9a92a, owner => DTC:D8GafEsbyssg4TQN71KTbd8QdRg846MxCQ, inputtx => FirstIssue, previousowner => DTC:D8GafEsbyssg4TQN71KTbd8QdRg846MxCQ, idop => JIE1}, prevownersign => H5OkR1dpSl0jfvRqfmmFw45SsG015RVF0AN+k1HZ40DHVtCHxuMAE60HLmQ3rmW1FS3/tHdsCLsGeHffNhPWfrE=}, bytestampsign => IGJWfWswMuPkbRx5BcErqFNDgygcdjTaCz4UmNcxEYjTdHmZMRtZN2nbO7KkzmkRH5DXt7tU0eGth8x khtXtrkY=}, prevhash => 39b8c8baa6cfe1c370816175265f71ab}, hash => a75f1175388b2ccbe4325284af5a1587}
I dumped everything that reported as "text/plain" into a (3.5Mb) text dump, one entry per line (
also available from mega.nz), the first few lines of which are (output by Python as
bytes format):
b'Hello Data\n'
b'\x86(b'
b'Test message...woooooooooooooooooot!\n'
b'\x8aY^\x81\xa9'
b'Hello DTC community, my name is mstfck. Pool software and mining software developer!'
b'https://cryptopush.com : Bitcoin Markey Alert System\n\n'
b'https://krypte.NET\r\n'
b'https://krypte.NET\r\n'
b'https://krypte.NET\r\n'
b'https://krypte.NET\r\n'
b' _ __ ____ ___ _ ____ _____ _____ _ _____ _____ \r\n/ |/ // __\\\\ \\/// __\\/__ __\\/ __/ / \\ /|/ __//__ __\\\r\n| / | \\/| \\ / | \\/| / \\ | \\ | |\\ ||| \\ / \\ \r\n| \\ | / / / | __/ | | | /_ __| | \\||| /_ | | \r\n\\_|\\_\\\\_/\\_\\/_/ \\_/ \\_/ \\____\\\\/\\_/ \\|\\____\\ \\_/ \r\n \r\n\r\n'
b'"Peace, Montag. Give the people contests they win by remembering the words to more popular songs or the names of state capitals or how much corn Iowa grew last year. Cram them full of noncombustible data, chock them so full of \'facts\' they feel stuffed, but absolutely \'brilliant\' with information. Then they\'ll feel they\'re thinking, they\'ll get a sense of motion without moving. And they\'ll be happy, because facts of that sort don\'t change. Don\'t give them any slippery stuff like philosophy or sociology to tie things up with. That way lies melancholy. Any man who can take a TV wall apart and put it back together again, and most men can, nowadays, is happier than any man who tries to slide-rule, measure, and equate the universe, which just won\'t be measured or equated without making man feel bestial and lonely. I know, I\'ve tried it. So bring on your clubs and parties, your acrobats and magicians, your daredevils, jet cars, motorcycle helicopters, your sex and heroin, more of everything to do with automatic reflex. If the drama is bad, if the film says nothing; if the play is hollow, sting me with the theremin, loudly. I\'ll think I\'m responding to the play, when it\'s only a tactile reaction to vibration. But I don\'t care. I just like solid entertainment."\r\nR. Bradbury, Fahrenheit_451'
b'greg is awesome'
(Nice to see SF classics respected and way to go, greg.)
I then created a summary of the findings (block hash, tx index, mimetype) of those entries that had mimetypes characterisable by the Linux
file utility (about 150 of 'em) which, in certain cases, could be retrieved and displayed (although do note that "ddi" is a disk image format). The summary is available as Python bindings (
again, also available from mega.nz):
categorised = {
"646265ab6761a19a179997ef49ee9bb68b7b0804790bf1069177d01a57c1127b-1": "pcx",
"7a4bfe2095c4f6e033a40302f43799ff7dcb61ffbc4ac11f1d3a7ebc7ab95d3e-2": "vdx",
"7dd4818994156ec6e11e51d00fa3e47ff0eb2dbaffcf5d536bccdb783d5f07cc-1": "pcx",
"a1d413ad368b22274bcec6af50fb0e6a8de19e4efa2250d54f8955b50b7b3776-1": "rby",
"a2720d3d85726ecb94d9feb02eab108bb47072fc2b283df18109467d4f385571-1": "mzx",
"af27e7daf5c9178448526bea3ac83a1a27bd51da90933cfd202249b387936bab-1": "mov",
"b32f973a0a3863ce5eb38f80cfc77a67555ae3ae19020ca5871158e4eab2377f-1": "pcx",
"c91e8f9ea7c0ff210a3ee3e1f48079a8e2961ed99f180c2c3c4d9876edba3027-1": "vxd",
"f0da6aae8347041d4cf080c93439e1af4e27639cf6546856ae7789d3b564bd63-2": "pgc",
"fa6483b4d8a4a9d8f6573ecdf5a134b36686d7d2047042c4d115fb61f1f32bad-3": "dbf",
"0510db4e73b38a16fde66ad8d5328e806d769930153591eb432ec79194c3f526-1": "ddi",
...
The "mov" is the only instance of Datacoin-blockchain-stored erotica (doesn't qualify legally as "pornography", AIUI) to be revealed by this analysis (5-10s of a couple doing it doggy-style, copied from a pron site - sadly more of a token gesture than any kind of an attempt at a public-spirited free pron service).
and a
very much longer list of "uncategorised" of which all I know is i) its size and ii) its broad mimetype, for which characterising in further detail would require individual examination (but is rather unlikely to receive any):
datafiles = [
dict(block='4c6dac1ec6c5131994a47d5a020adf2ef968222be88cc4a761ff283658c4af92',
txid='39d542f56622d57a09f4e6bc05ce1abbb63e24430e71c26c03d3781a33afb302',
n=1, size=16, mimetype="text/plain"),
dict(block='92e53a0dd19486361404a7275dd9645ecbecddb0fd7f270eaa2cab79b427ca2b',
txid='fc52b5f7a71bcbbe695e6edff94d6fd6e764b49023fe756f3e77366503befca8',
n=1, size=100, mimetype="application/x-bzip2"),
...
In the event, I recoursed to dumping the text/plain entries into a single dump (see above), uncompressing the obviously-compressed (a mere handful) and assigning them accordingly (some images, some plaintext). This leaves around 2000 entries (i.e. the overwhelming majority) which are basically impenetrable from a practical perspective. Only those that created the tx know what it is they have stored. Maybe they will publish details of their format, maybe they won't. In some cases, a large binary has been split over several txs and has to be recombined to form the original single large file. Unfortunately, the inscrutability of "application/octet-data" gives no clue as to whether the binary is standalone or part of a sequence to be recombined and, importantly, which part of which sequence. None of this metadata is made available, so file retrieval cannot be offered for the vast majority of the chunks of binary data stored on the Datacoin blockchain txs.
Nevertheless, I intend to add the summaries to the clone of ACME that I have for Datacoin.
ACME?
A Cryptocurrency Metadata Explorer, an open source project in Python, a web app (built using Pyramid) fronting an instance of
Fuseki hosting a mapping of the blockchain/txs/addresses as an RDF graph (it's a mapping because the blockchain is characterisable as an acyclic directed graph and RDF is similarly characterisable as a directed acyclic graph). The RDF graph can be queried (by SPARQL queyr) and the results displayed appropriately:
Yes, this
is stuff I've been working on for a while now. You may get some idea of the power and flexibilty of this technology by browsing
the DOACC documentation (details of altcoins represented in RDF, put to rest in early 2016 when altcoin launches stopped including dereferencable URLs in ANNs) which is about the RDF representation of metadata pertaining to individual altcoins (and the datasource for the altcoincheatsheet I referenced in my previous post). I'm only suggesting it because you'll likely to be broadly familiar with the content (altcoin stuff) and that will provide a framework to appreciate the representational power - because it's a standalone site, all the RDF and SPARQL work you can see going on in the presentation and examples is happening in the browser.
Note the ACME "Publication" tab. That's where the summary data will appear, characterising the stored data and providing appropriate access. If metadata is available, the information can be presented:
And ACME is where
TrustyURIs can resolve to. It's the next step I want to take. I have a candidate standalone, mostly self-contained (view source to see)
blog app-cum-post, nearly ready for storing on the Datacoin chain (I have yet to decide whether to bundle in a base64-encoding of the javascript and css resources, or just leave them (comparatively dangerously) remote and specify a validation hash).
I will render the HTML (and thus the content) as an RDF graph and then use the Python
TrustyURI code library to calculate a hash of the RDF graph of the content, which can then be embedded in the resolvable TrustyURI inscribed on the blockchain. If the hash don't match, it ain't what I published. If the hash does match the resolved-to, ACME-hosted RDF graph, then its worth having a go at retrieving the content and rendering it.
The mechanism uses OP_RETURN data and is independent of any other uses, it's a sort of data storage that is tangential to Datacoin's use of a special field in the tx. Importantly, it's a form of data storage for which metadata (in the content such as
A different perspective on cryptocurrency) can be made available on a principled basis (i.e. if the hash matches, show the metadata).
The other day, I mapped the Datacoin blockchain into RDF, grand total 36Gb (uncompressed, 7.6Gb compressed) serialized as ntriples. Next task is to slice that list of ntriples into bite-sized chunks to feed to Fuseki, then the Datacoin ACME's lights and switches should start working again.
Can I suggest that the content of the blog-app-cum-post
A different perspective on cryptocurrency provides some answers to your
other questions.
Cheers
Graham