https://www.linkedin.com/pulse/private-blockchains-just-shared-databases-gideon-greenspanThe original post is full of useful links.
Why blockchain detractors are missing the point
And so it goes on. From popular posts to contemptuous tweets to predictions about the future, the world and its mother are lining up to throw tomatoes at private blockchains, before even understanding what they are.
Saying that a private blockchain is just a shared database is like saying that HTML and HTTP are “just” distributed hypertext. It’s wrong in two ways. First, the semantic one: private blockchains are a technology that enables shared databases, like pens enable writing and HTML/HTTP enable distributed hypertext. The bitcoin blockchain and its primary application cannot be meaningfully separated, because one could not exist without the other. But this equivalence does not apply to private blockchains at all.
The second mistake is the use of the word “just”. Just? Were HTML and HTTP just a way to do distributed hypertext? Hypertext was invented decades earlier, so are these technologies a minor footnote in computer history? Oh but let me count the ways in which they earned their place: (a) a simple markup language that any layperson could learn, (b) a hierarchical addressing scheme that works both with TCP/IP and our conceptual model of place, (c) a simple protocol for the state-free retrieval of content, and (d) both client and server software that brought the whole thing to life. We might as well say that Newton was just a scientist and Dostoyevsky just a writer.
So let’s make this perfectly clear: Yes, private blockchains are just a way to share a database. But they enable a new type of shared database, with huge implications for the financial world and beyond. And if you’re willing to read on, I’m going to tell you exactly why.
What is a database?
A database is a repository of structured information, organized into tables. You can think of it as a collection of one or more Excel spreadsheets, which can optionally be linked together. Each table contains information about a set of entities of a particular type, with one entity per row. Each table also has one or more columns, which describe different aspects of those entities. For example, the table for WidgetCo’s internal staff directory might have columns for employee ID, first name, last name, department, internal phone number and room number.
One of the important ways in which databases go beyond spreadsheets is that they contain rules about the data stored within. These rules help ensure that the information remains sane and consistent for the benefit of the entire organization. In today’s most popular databases, the rules take a number of common forms:
The database schema defines what kind of information is permitted in each column. For example, the phone number must contain 4 digits and cannot be left blank (“null”).
Unique keys which state that a particular column (e.g. employee ID) must have a different value in every row.
Check constraints which enforce relationships between the column values in each row. For example, if the department is “Procurement” then the room number must start with a 3 or 4.
Foreign keys which enforce relationships between tables. For example, if the database contains another table used for payroll, there might be a rule that every employee ID in the payroll table must also exist in the staff directory.
A transaction is a collection of changes to a database that is accepted or rejected as a whole. Every time a transaction modifies the database, the software ensures that the database’s rules are followed. If any part of a transaction violates one of these rules, the entire transaction will be rejected with a corresponding error.
There are other more esoteric rule types I could list, but they all have one thing in common. They answer the question: Is the database in a valid state? In other words, they act as a constraint on the database’s contents when viewed at a single point in time. And this works just fine for a database which sits inside a single organization, because the main job of the constraints is to prevent programmer error. If one of WidgetCo’s internal applications tried to insert a 3-digit phone number into the database, this wouldn’t be due to malice, but rather a bug in the developer’s thinking or code. The ability of a database to catch these mistakes is undoubtedly handy, and helps prevent bad information propagating within an organization, but it doesn’t fix problems of trust. (Constraints can also help simplify application logic, for example via foreign key cascading or on-duplicate clauses, but these are still just ways to help developers.)
Database sharing
Now let’s think about how WidgetCo’s internal staff directory might be shared with the outside world. In many cases, there is no problem providing shared read access. The directory can be exported to a text file and emailed to customers and suppliers. It can be posted on the Internet, just like this one. It can even be given an API to allow searching by external code. Shared read is a technical doddle, a question of deciding who can see what.
But things start getting stickier when we think about shared write. What if WidgetCo wants an external entity to modify its database? Perhaps the phones are being replaced by PhoneCo, who will then update the phone numbers in the staff database. In this case, WidgetCo would create a new “account” for PhoneCo to use. Unlike accounts for WidgetCo’s internal use, PhoneCo’s account is only permitted to change the phone number column, and never add or delete rows. All of PhoneCo’s transactions are processed by WidgetCo’s database system, which now applies two types of restriction:
Global rules which apply to all database users. For example, the phone company can’t change a number to contain only 3 digits, and neither can anybody else.
Per-account rules restricting what PhoneCo is permitted to do, in this case only modifying the phone number column of existing rows.
So far, so good. We have a shared write database. It works because WidgetCo is in charge of the database and the phone company gains access by virtue of WidgetCo’s good grace. If PhoneCo started setting phone numbers randomly, WidgetCo can shut down their access, terminate their contract, and restore some old data from a backup. And if WidgetCo started misbehaving, say by reversing the new phone numbers entered by PhoneCo, well that would be entirely pointless, since it would only harm WidgetCo themselves. The phone company would consider WidgetCo to be a peculiar customer but not particularly care, so long as they paid their bill on time.
But now let’s see what happens if two or more parties want to share a database which (a) none of the parties controls, (b) can be written to by any party, and (c) can be relied upon by everyone. To make things worse, let’s say that these parties have different incentives, don’t trust each other and may even be fierce competitors. In this case, the solution has always been the same: introduce a trusted intermediary. This intermediary manages a database centrally, provides accounts to all of the parties, and ensures that all operations are permitted according to a known set of rules. In many cases, especially financial, every party still maintains its own copy of the data, so everyone spends a lot of time checking that their databases agree.
It all gets rather messy and cumbersome. But if we’re talking about a shared write database in an environment of limited trust, there is currently no alternative. That’s one of the main reasons why financial transactions go through central clearing houses, why you use Google Calendar even in a small workgroup, and why the crowd-sourced wonder that is Wikipedia spends millions of dollars on hosting. Even as the user interface of the web moves to the client side, centralized servers continue to store the data on which those interfaces rely.
Real shared write
So let’s say that we wanted to allow a database to be shared, in a write sense, without a central authority. For example, several competing companies want to maintain a joint staff directory for the benefit of their mutual customers. What might that actually look like? Well, it would need a number of things:
A robust peer-to-peer network that allows transactions to be created by any party and propagated quickly to all connected nodes.
A way to identify conflicts between transactions and resolve them automatically.
A synchronization technology that ensures all peers converge on an identical copy of the database.
A method for tagging different pieces of information as belonging to different participants, and enforcing this form of data ownership without a central authority.
A paradigm for expressing restrictions on which operations are permitted, e.g. to prevent one company from inflating the directory with fictitious entries.
Whew. That’s a tough list right there, and it’s simply not supported by today’s off-the-shelf databases. Current peer-to-peer replication technology is clumsy and has a complex approach to conflict resolution. Those databases that do support row-based security still require a central authority to enforce it. And standard database-level restrictions like unique keys and check constraints cannot protect a database against malicious modifications. The bottom line is this:
We need a whole bunch of new stuff for shared write databases to work, and it just so happens that blockchains provide them.
I won’t go into too much detail about how blockchains do these things, because I’ve covered much of it before. Some key elements include regular peer-to-peer techniques, grouping transactions into blocks, one-way cryptographic hash functions, a multi-party consensus algorithm, distributed multiversion concurrency control and per-row permissions based on public key cryptography. A long list of old ideas combined in a new way. HTML/HTTP, if you like.
In addition to all of these, shared write databases require an entirely new type of rule, to restrict the transformations that a transaction can perform. This is an absolutely key innovation, and makes all the difference if we’re sharing a database between non-trusting entities. These types of rules can be expressed as bitcoin-style transaction constraints or Ethereum-style enforced stored procedures (“smart contracts”), each of which has advantages and disadvantages. Perhaps there are other better ways waiting to be discovered. But they all share the property of tying together the database’s state before and after a transaction takes place. In other words, they answer the question: Was that a valid transaction? This is fundamentally different from asking whether the database is valid at a single point in time.
If you’re wondering if this type of database has useful real-world applications, well that’s a fair question. But you might note the intense interest in private blockchains from one sector at least, because of their potential for simplifying processes and reducing costs and delays. Financial institutions are heavy users of today’s database platforms, and those platforms do not enable a shared write scenario. This is what banks are looking for.
This problem and its solution have absolutely nothing to do with bitcoin and the idea of censorship-free money. In fact, the only connection to bitcoin is the technical similarity between the bitcoin blockchain and how some of these private blockchains are implemented today. Over time the two worlds may well diverge, because their requirements are completely different. Whether you like financial regulation or not, the simple fact is that private blockchains are potentially useful in a regulated world, whereas for now at least, public blockchains are not.
If I may finish with an analogy, the UN Declaration on the Principles of International Law does not tell countries that they can hold any territory they want, so long as it’s surrounded by a clearly-marked fence. Rather, it states that “No territorial acquisition resulting from the threat or use of force shall be recognized as legal”. In other words, it’s a rule regarding the legitimacy of changes, not just of situations. And the UN declaration, which seems so obvious to us now, was a complete revolution in international law. It meant a world no longer based on unilateral power and authority, but one where differences can be resolved by mutual consensus.
When it comes to shared databases, private blockchains do exactly the same thing.