I'm on a journey "back to basics" in Bitcoin.
And the best way I found to find out if I understood, is to write and see if the content makes sense to a lay person.
I finished the chapter on Private / Public Keys and Digital Signatures.
The idea is to understand the problems that are solved with a given concept and use some analogies.
I would be honored with any comments regarding:
- Errors in concepts and understandings;
- Suggestions for structural improvements;
- Example improvements;
I hope it's useful to someone!
Let's go:
Private / Public Keys and Digital Signatures
Problem: Who issues your account login and password?
In the traditional banking world, when you want to open a bank account, you walk into a branch or visit a website, provide your personal details, and the bank creates an account for you. It's the bank that gives you an account number and helps you set a password. They hold the power and control over account creation, and they ensure that only you, with your unique password, can access your funds. If you forget your password or if someone tries to fraudulently access your account, it's the bank's responsibility to verify your identity and safeguard your money. The bank acts as the central authority, the gatekeeper, and the verifier.
Now, in a decentralized system like Bitcoin. If there's no central entity like a bank, then who creates your account? Who ensures that only you can access your bitcoins?
The brilliance of Bitcoin's decentralized system is that it doesn't rely on a central authority to issue accounts or validate transactions. Instead, it leverages the power of cryptography. In the Bitcoin world, you create your own "account" by generating a pair of cryptographic keys: a public key, which is like your account number, and a private key, which is like your password.
But unlike a bank password that can be reset, your private key is unique and non-recoverable. Lose it, and you lose access to your funds. Share it, and others gain access. It's a system built on trust in mathematics and code, rather than trust in a central institution.
This decentralized approach offers freedom, control, and responsibility. It's a revolutionary shift from the centralized systems we've always known, placing the power of account creation and access squarely in the hands of the individual.
Excellent. But once I have my account and password, who will validate that I authorized a transaction?
Problem: Who verifies that a transaction was actually authorized by the issuer and not tampered with?
Digital signature: They bridge this trust gap, offering a decentralized way to verify that a transaction truly originates from its claimed source and guarantees that it has not been tampered with.
Digital signatures is a more secure subset of electronic signatures. So, let’s understand the differences, real-world applications, especially in a decentralized context and for Bitcoin.
Let's start with a broader category:
Electronic Signatures:
Think of the electronic signature as the digital equivalent of your handwritten signature on a paper document. It's any electronic data (like a typed name, an uploaded image of a handwritten signature, or a click on an "I agree" button) which is logically associated with other electronic data and is used by the signatory to sign. It's akin to a physical signature but in electronic form.
Most of us engage with electronic signatures, often without realizing it. Here are some commonplace examples:
Agreeing to the terms and conditions of a software or online service by clicking "I Accept."
Signing on digital pads after credit card transactions at retail outlets.
Using signing platforms, where one can draw or upload an image of your signature to digitally sign a document.
Your electronic signature is your signature and doesn’t change based on the item being signed: when you sign a cheque, a letter, or a document, the whole point is that your signature looks the same. This is easy for other people to copy! This is really terrible security!
The problem with electronic signatures is that they rely on a trusted third party to validate the authenticity of the signatory and the integrity of the signed data. For instance, when using e-signature platforms, the platform itself acts as the third party, ensuring that the signatory is who they claim to be and that the document hasn't been tampered with after signing.
In contrast, a digital signature is only valid for that exact piece of data, and so it cannot be copied and pasted underneath another piece of data, nor can someone else re-use it for their own purposes. Any tampering with the message will result in the signature being invalidated. The digital signature is a one-time proof that the person with a private key really did approve that exact message. No one else in the world can create that digital signature except you, unless they have your private key.
So given that we learned that Bitcoin does not have a trusted third party, this is where digital signatures come in to "sign" valid transactions confirming the sending of coins from one account to someone else’s.
Digital Signature:
Delving deeper, the digital signature is a specific type of electronic signature. Rooted in cryptography, it involves creating a unique digital code (“signature”) using a private cryptographic key. When others receive the digitally signed document, they can use the signatory's public cryptographic key to verify the document's authenticity and ensure it remains unaltered since being signed.
Imagine I've organized an exclusive party, and I want to send out special invitations to a select group of friends. Given the event's exclusivity, it's vital that the recipients know that the invitation genuinely came from me and hasn't been replicated or forged.
To ensure this, I seal each invitation envelope with my unique wax stamp. This stamp, known only to belong to me, adds a touch of authenticity to each invitation. Once pressed into the wax, the seal's intricate design hardens, making it evident if someone were to tamper with the envelope.
While my wax stamp is unique, the method to verify it isn't hidden. Over the years, friends and acquaintances have come to recognize the design of my stamp. Moreover, I've often shared a magnifying glass at gatherings, which displays the finer details of my stamp's design for anyone curious.
Once they receive the invitation, anyone can analyze the wax seal and validate its authenticity.
This verification assures that the invitation is genuine and indeed from me.
In this scenario:
• My unique wax stamp represents the private key. It's used to assert authenticity by "signing" the invitation.
• The magnifying glass, shared among friends and acquaintances, represents the public key. It allows anyone familiar with my stamp to verify the authenticity of the seal, ensuring the invitation truly comes from me.
A digital signature is created by taking the message you want to sign and applying a mathematical formula with your private key. Anyone who knows your public key can mathematically verify that this signature was indeed created by the holder of the associated private key (but without knowing the private key itself).
Knowing that those who will solve the problem of issuing the account and password are the public key and the private key, and that those who will solve the problem of verifying the authenticity of transactions are digital signatures, then how are they created? How do they work?
To do this we will have to quickly understand a little cryptography. Although this is the most important and complicated topic, we will only touch the surface.
Cryptography is used to provide:
Encryption: When only the intended recipient can interpret the message (Confidentiality);
Signatures: When you want to ensure that the message was written by the sender (authentication) and was not tampered with in transit (integrity);
There are two ways to do encryption. Those two ways are symmetric encryption and asymmetric encryption.
The main difference between these two is that symmetric encryption is going to encrypt and decrypt content using the same keys, and asymmetric encryption is going to encrypt and decrypt using different keys.
So let's talk about what that means.
To show you how this is going to work, we're going to use the alphabet. Now for these examples, we're going to assume that there's only lowercase a through z, there's no uppercase characters, there's no numbers, there's no symbols. We're going to keep it simple for the explanation.
So the symmetric encryption uses the same key for encryption and decryption. So let's say we start with the word HELLO. We are going to use a symmetric encryption algorithm in combination with a secret key. Now the algorithm we are going to use for this example is simply moving the letters forward, and we are going to move it that amount of times In this case: three. Well if we start at the H and I move forward three times, we'll end up at K. If we did the same for the rest of the letters in the word, we'd end up with KHOOR.
To decrypt this, we would simply take the cipher text and do the inverse of the algorithm. So if our algorithm was to move forward, our decryption algorithm is going to be to move backwards and we’re going to use the same key. So if we move forward three times to encrypt, we’re going to move backwards three times to decrypt. If we start at the K and we move backwards three times, we'll end up back at the H. And again we could do this for the rest of the letters to decrypt the whole word.
So that's a simple example of symmetric encryption. In this case, the same key was used for both encryption and decryption.
Now let's talk about asymmetric encryption and you're going to see it's a little different. With asymmetric encryption, we’re still going to use an encryption algorithm, but the keys I use for encryption and decryption are going to be different.
Here I'm going to use the encryption key of five. Again I'm going to start with H and I'm going to move forward five times to get to M. I could do it with the rest of the letters in this word to get to MJQQT. Now it might seem like you can just go backwards to get back to H. But asymmetric encryption algorithms are usually a one-way function. Remember the hash algorithms we learned previously? We can't do them backwards!
So in the case of asymmetric encryption, we can't actually go backwards. Instead we have to go forward a different amount. To decrypt this, I'm going to have to take my cipher text and use a different key going forward again. So starting with the M, if I go forward 21 positions, I'll end up back at the H. And I could do it again for the rest of the letters to decrypt the rest of the word. But note that unlike symmetric encryption, we move forward to encrypt and forward again to decrypt. With symmetric encryption, I was able to use the same key to encrypt the decrypt. Whereas with asymmetric encryption, I had to use different keys to encrypt and decrypt.
Now let's talk about those keys a little bit more. Those two keys I used in this case, 5 and 21 are mathematically related. Whatever I encrypted with 5 could only be decrypted with 21. There are other combinations of keys that you could use in our little example using just the alphabet. Actually anything that adds up to 26 would work. So I could have also used an encryption key of 6, a decryption key of 20.
Well, what if I used them in the reverse order? What if I encrypted a 21? Could I not then decrypt with 5?
Well, let's give it a shot. Again, I'm going to start at the H and I'm going to see if I can move forward 21 times. That will bring me back to the C and I could also do the same for the rest of the letters. And then to decrypt this, I would again take my cipher text and then move forward another 5 times. That would bring my C back to an H successfully decrypting the first letter of my plain text. I could again use the same decryption key to decrypt the rest of the letters. The main thing I'm pointing out here is this property of asymmetric encryption is that what you can encrypt with one key can only be decrypted by the other key. But it works in either direction. I can encrypt with 21 and decrypt with 5, or as we showed earlier, I can encrypt with 5 and decrypt with 21.
These two asymmetric keys are mathematically related.
Now, what the industry does with this is they take one key and they label it as the public key and they make it available to anybody that asks for it. And then they take the other key and they call it the private key and they keep it to themselves.
Given that cryptography allows encryption and signatures, but for Bitcoin purposes, we will only focus on the Signature feature.
So, if you have the private key you can sign a message.
And if you have the public key, you can prove the signature was made by the owner of the private key;
When someone wants to send bitcoins to another person, they create a transaction message specifying the amount and the recipient. However, instead of signing the entire transaction message, which can be of variable length and relatively large, Bitcoin employs a more efficient approach: What is signed is the hash of the transaction.
This way, we can ensure:
Uniformity: Regardless of the length or content of the original message (transaction), its hash will always be of a fixed length (256 bits in the case of Bitcoin's SHA-256 hashing algorithm). This uniformity is convenient for processing and verification purposes.
Efficiency: Signing a hash, which is a fixed and relatively small size, is computationally more efficient than signing a potentially large and variable-length message.
Security: The cryptographic hash functions used in Bitcoin (like SHA-256) have the property that even a tiny change in the input will produce a vastly different output. This means that if even one character in the original transaction changes, the hash will change entirely. Thus, by signing the hash, the integrity of the entire transaction is ensured.
So, this is how public and private keys are used to sign a transaction in Bitcoin:
I am going to generate a transaction of 1 bitcoin for my grandma. I'm then going to run that transaction through a hashing algorithm. That's going to result in a particular output. For our example, the hashing algorithm produces the output "HELLO" from the input "transaction of 1 bitcoin".
That output "HELLO" is then going to be encrypted with my private key. Given that my private key is 5 (letters ahead in the alphabet), this means signing the hash "HELLO" results in "MJQQT". The result of that, which is the encrypted output "MJQQT", is the signature. That is actually the signature of that transaction of 1 bitcoin. That gets appended to the transaction, and then both the transaction and the signature get sent across the wire.
Now, that signature was created with my private key, which means on the other side, my grandma is going to use my public key to verify the signature. Given that my public key is 21 (letters ahead in the alphabet), my grandma will use it to verify if the signature was made by the private key that is a pair of my public key.
What she's going to do is take the signature "MJQQT" and decrypt it using my public key. That's going to result in the output of the hash of the transaction: "HELLO" .
Then my grandma is going to independently calculate a hash of that transaction. If the output "HELLO" that my grandma got in her calculation matches the output "HELLO" that I had sent, this proves two things.
First, it proves that the transaction has not changed since I signed it. Remember, this output was created by taking a hash of this transaction. So if anything changed in this transaction, my grandma would have gotten a different output. This gives us the property of integrity.
The other thing that signatures prove is that only I could have created the signatures. This signature was created as a result of taking my private key and encrypting the digest "HELLO". Well, if my grandma was able to decrypt something with my public key, this proves it was definitely my private key that signed it. And the only person in the world that has my private key is me. This gives us authentication.