Do you have any resources where I can get my head sucked into regarding RDF?
Here's a PDF of my write-up of the preliminary R&D scratchpad work which goes into a bit more detail of the RDF modelling as applied to an altcoin's blockchain (V, in this instance):
https://minkiz.co/library/signed-graphs.pdf in which I use the ActivityStreams examples to illustrate the principles of publishing via the inscription of RDF graph signatures on the blockchain.
The W3C has a set of
ActivityStreams Examples in JSON-LD (RDF expressed as JSON). The examples are not dissimilar to the JSON output from an altcoin RPC API and should give you an idea of what's involved in publishing JSON-LD (*cough* RDF). The overall concept is the W3C's
ActivityPub effort, which is basically speccing out a set of standardised APIs for social networking. Nice of them to have prepared the ground so thoroughly, we can ”just” pick up the API spec and implement it for ourselves.
As regards compute resources, the server is a 25-euro a month Hetzner “robot”, 16Gb RAM, 8 threads (getting fairly elderly, now). The file of ntriples ran to 7.8Gb, iirc, haven't checked how much HD space is being used by TDB (the Jena persistent store back end). ACME needs the entire graph but you can just trim it down to the last X blocks and tx if you know that's where your queries are vectored. I am running the Slimcoin testnet graphchain on my 8 year-old laptop.
On the server I set Fuseki's JVM envars to:
# JVM_ARGS=${JVM_ARGS:--Xmx1200M}
JVM_ARGS=${JVM_ARGS:--Xmx2048M}
JVM_ARGS=${JVM_ARGS:--Xms2048M}
JVM_ARGS=${JVM_ARGS:--XX:NewSize=512M}
JVM_ARGS=${JVM_ARGS:--XX:MaxDirectMemorySize=256M}
There'll be sage advice on tuning from the Fuseki mailing list I imagine.
RDF ain’t so bad once you can see it actually applied to something you can relate to. It’s basically a highly sophisticated and well-thought out database representation. Each statement is a standalone piece of data: the row index, the column index and the cell content, the latter being either a literal or a row index. Namespaces are used to segregate collections of row and column indices. In a classic example of elegant re-use, namespaces are URIs and indices are #fragments. The most popular format hangs the column indices off've the row index:
row_index_id:
column_1 cellval
column_2 cellval
...
and it is reasonably transparently instantiated as:
http://purl.org/net/bel-epa/ccy#SfSLMCoinMainNetworkBurnAddr1DeTK5
http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/net/bel-epa/ccy#Address ;A shorthand form of the URL#fragment is available:
PREFIX rdf:
PREFIX ccy:
PREFIX xsd:
ccy:SfSLMCoinMainNetworkBurnAddr1DeTK5 rdf:type ccy:Address ;
ccy:C00000766be5a4bb74c040b85a98d2ba2b433c5f4c673912b3331ea6f18d61bea rdf:type ccy:Block ;
ccy:height 0^^xsd:integer ;
ccy:size 313^^xsd:integer;
ccy:version 1^^xsd:integer
ccy:merkleroot ccy:Cbae3867d5e5d35c321adaf9610b9e4147a855f9ad319fdcf70913083d783753f ;
ccy:time "2014-05-08 19:47:40 UTC"^^xsd:dateTime ;
ccy:nonce 116872^^xsd:integer
ccy:bits "1e0fffff"^^xsd:hexaDecimal ;
ccy:flags "proof-of-burn"@en
ccy:difficulty 0.00024414^^xsd:decimal ;
ccy:nextblockhash ccy:C000006e022fc5e432e55cd61885d6ab9bb2ad6d5cef943f0e397ee21fe37b5db ;
...
The above counts as a populated graph - it has at least one statement. In the world of RDF you can simply add one graph to another. It makes sense if you think of it as adding one standalone collection of unordered row/col/cell triples to another standalone collection of unordered subj/pred/obj triples and ending up with one large standalone collection of unordered triples.
Column indices can (preferably) resolve to a formal description, an ontology (see
CCY for the ontology used to model the blockchain. Also expressed in RDF:
ccy:height a owl:ObjectProperty ;
dc:description "The sequence of the block in the current blockchain."@en ;
rdfs:domain ccy:Block ;
rdfs:isDefinedBy ;
rdfs:range xsd:integer ;
ns:term_status "unstable" ;
skos:prefLabel "height"@en .
(height is an integer and a property of Block)
Useful terms are freely borrowed from other namespaces to avoid NIH and a quick dekko at
Richard Cyganiak’s 100 most popular RDF namespaces gives a hint of the range of subject ontologies available.
Conventionally,
row_index, column_index, cell_value dissolves into
subject, predicate, object, a more natural expression.
Modelling the blockchain is easy, hashes (e.g. previousblockhash, merkleroothash, nextblockhash) are row indices (subject) and are written as
ccy:C, addresses (in WIF format) are also row indices (subject) and are written as
ccy:, all attributes (predicate) are written as
ccy: and everything else (object) is a literal.
Querying the graph is straightforward with the SPARQL query algebra and by algebra I mean
?x AND ?y, i.e. logic operators and variable bindings. In SPARQL, for convenience, AND is written as a full stop, conventionally at the end of line.
PREFIX ccy:
select (SUM(?value) as ?total_slm_burned) WHERE
{ ?txo ccy:address ccy:SfSLMCoinMainNetworkBurnAddr1DeTK5 .
?txo ccy:value ?value
}Once you've got your queries worked out, you can use them with sgvizler to create charts.
PREFIX ccy:
select (SUM(?value) as ?total_slm_burned) (COUNT(DISTINCT ?txo) as ?num_burn_txs) WHERE
{ ?txo ccy:address ccy:SfSLMCoinMainNetworkBurnAddr1DeTK5 .
?txo ccy:value ?value .
?tx ccy:output ?txo .
?tx ccy:time ?datetime .
FILTER(?datetime < 1430434800 && ?datetime > 1427842800)
}"3284.6929"^^xsd:decimal "24"^^xsd:integer(2015-04-01T00:00:00Z to 2015-05-01T00:00:00Z)
Feel the
power...
https://github.com/mgskjaeveland/sgvizlerCheers
Graham