Cryptopals: Exploiting CBC Padding Oracles

This is a write-up of the classic embroider oracle attack on CBC-mode obstruct ciphers. If you ’ ve done the Cryptopals cryptanalysis challenges, you ’ ll remember it as challenge 17. This is a celebrated and elegant attack. With it, we will see how even a modest data leak ( in this case, the bearing of a “ pad oracle ” – defined below ) can lead to entire plaintext recovery .
Like the Cryptopals challenges, this post is written to be accessible to anyone with an matter to in cryptanalysis – no graduate degree required. All you need is solitaire, focus, and some basic casualness with the concepts in the following section .

Core Concepts

here ’ s what you ’ ll need to know to follow the rest of this stake. If any these definitions don ’ metric ton go into adequate detail for you, please take a moment to look up their corresponding terms and do some more background recitation before you continue .
Block cipher : AES is the most celebrated exercise. In general, a stop cipher is a way of transforming ( encrypting ) fixed-size groups of bits so that they look random unless you possess the keystone used for encoding. With the key, you can recover ( decode ) the blockage ’ s original contents. We ’ ll be writing pulley encoding and decoding as ENCkey and DECkey respectively. To encrypt messages of arbitrary size, a freeze calculate needs some extra scaffolding : a engine block code modality and a pad scheme.

Padding scheme : A way of “ padding out ” a message ’ second duration to the nearest multiple of the block distance. The padding needs to be well distinguishable from the message and easily removed after decoding. We ’ re going to use a pad dodge known as PKCS # 7, which works by appending nitrogen bytes of rate n. This is not the merely padding scheme for which the attack works, but it is the most common one .
Block cipher mode : A direction of generalizing a blockage zero to handle multi-block plaintexts. There are a lot of these ; the one we ’ ll be using here is called CBC mode ( for Cipher Block Chaining ) .
CBC mode : A block zero modality where each block of plaintext is XORed with the previous obstruct ’ s ciphertext anterior to encoding ( illustrated below ). The first base block of plaintext is XORed with a one-block low-level formatting vector, which is normally prepended to the ciphertext .
Initialization vector : A one-block-sized bytestring with uniformly random contents. normally abbreviated to IV. This serves an crucial function : in CBC mode every normality ’ thorium plaintext block is XORed against the ( n-1 ) ’ thorium ciphertext block, but for the beginning plaintext barricade there is no former ciphertext block to use, so this plaintext block is rather XORed against the IV. Among other benefits, the use of ( clear-cut, randomly generated ) IVs for each encoding ensures that multiple encryptions of the same plaintext will result in unlike, apparently unrelated ciphertexts .
Padding oracle : Something which, given a ciphertext, tells us whether its decrypted plaintext has valid pad or not. The name is meant to evoke the similarly insightful and cryptic Oracles of antiquity. The classic model of a slog oracle is a overhaul with detail error messages. That said, the prophet could in fact be anything that allows us to differentiate between valid and disable padding for arbitrary ciphertexts. Any act of side channels can lead to padding oracles. We ’ ll front at a match examples in the adjacent section .

Sample Padding Oracles

OK, so ampere far as the attack is concerned, a pad prophet is a generic, abstract entity – but what might it look like in very liveliness ?
Say you ’ re auditing a world wide web API, and it authenticates its users using code tokens. That is, it takes some data about the drug user ( some of which might be secret from the drug user ), encrypts this datum under ( say ) AES-CBC using a cardinal that is merely store server-side, and then it provides the drug user with the resulting ciphertext ( which, you ’ ll recall, looks equitable like a random string ). You, the drug user, can ’ t decode this ciphertext, but you can store it and provide it in future API queries. When you do this, the server will be able to decrypt it and immediately know all about you .
But what if you provide an invalid nominal ?
There are two likely error states hera. If the token ’ second decoding has bad padding then the decoding operation will fail. On the other hand, if the plaintext has valid pad, the embroider will be successfully stripped and processing will proceed to deserialization ( which will about surely fail ) .
Suppose that the API decides to helpfully distinguish between these two error states, serving responses like {'code': 401, 'msg': 'decryption error'} or {'code': 401, 'msg': 'deserialization error'} respectively. This gives us a padding oracle ! We can send any arbitrary ciphertext to the server and check the response ’ s msg field. If we get ‘ decoding mistake ’, the padding is invalid ; any other response indicates valid pad. As we are about to see, this is all we need to fully decrypt our – or anyone ’ s – secret nominal .
here ’ s the second example. Suppose some altruistic cryptanalyst notices this issue we just described and reports it. The developers decide to fix the return by changing their validation code to return a generic error. If your token is invalid, they now merely send back {'code': 401} with no context. No message, no oracle – or so it seems .
well, it ’ s truthful that we no longer have an mistake message to rely on, but we can even look at how long it takes the server to send their generic error. In this encase, we might send each nominal a few times, watch how long it takes for the response to arrive, and infer that a longer delay indicates valid padding. If decoding and deserialization happen back-to-back, then the remainder in time will be very small ( though still nonzero ) ; if anything else takes rate between these steps ( e.g. writing logs, setting up a session object, opening a database connection, etc ) then the timing dispute will be that a lot easier to detect .
If we can establish faithfully low-latency connections ( e.g. if the waiter is hosted by a cloud supplier and we spin up a virtual machine with that supplier in the same region ) then flush a very small time deviation might be measurable. And if we are limited to less reliable connections, we can constantly precisely send more requests to get more time data. We can consolidate this data to reach arbitrary levels of accuracy. This will slow down the attack and make it easier to detect, but the approach will calm work .
Those are a couple examples of what a padding oracle might look like. Remember that ampere far as the perch of the attack is concerned, the details of the oracle aren ’ thymine significant ; ampere retentive as we can take a ciphertext and determine, somehow, whether or not the ciphertext ’ s decoding ’ randomness pad is valid, that ’ s all we need for this attack. In that sense, the oracle is efficaciously a total darkness box .
Enough preamble – let ’ s look at how the attack actually works .

The Attack (Multi-Block Case)


This figure shows how multi-block messages are encrypted in CBC mode. Each plaintext blocking is XORed with the previous ciphertext stuff ( or IV ) prior to encoding. The musical arrangement of the bottom rowing of blocks here reflects how code messages are normally serialized for transmission. While it is not rigorously required for the IV to be prepended to the ciphertext — IV and ciphertext could be conveyed in some other way, e.g. as siblings in a JSON datum structure — this shape of serialization is something you will commonly see in practice .
We won ’ metric ton be spending much prison term with this multi-block design, because it turns out that the multi-block subject of this attack reduces nicely to the single-block case. To see why, let ’ s take a promptly look at the CBC decoding operation .

As discussed above, without the encoding samara we have no way of computing DECkey directly. however, if we could somehow determine the output signal of DECkey, the rest of the attack would fair come down to bookkeeping. We could take the decode block, xor it against the previous ciphertext stuff ( or IV ), and thereby recover the corresponding block of plaintext. Do this for every individual freeze, and we will have recovered the solid message. The only thing stopping us from doing this is the fact that we can ’ deoxythymidine monophosphate calculate DECkey directly .
however, as it turns out, recovering the output of DECkey is precisely what the padding prophet attack allows us to do .

The Attack (Single-Block Case)


here ’ mho that decoding operation again. This time, our code message consists of an IV and a single ciphertext block .
Say we pass an arbitrary obstruct of ciphertext to our padding oracle. We can set the IV to whatever we want ; we ’ ll zero it for now. The example above shows what the prophet will compute. It doesn ’ t tell us the consequence of this calculation ; it merely tells us alone whether or not the resulting plaintext blockage ends with valid embroider .
The key mind behind the attack is this : by making modifications to the IV, we can predictably modify the plaintext block. Flipping a moment in the IV will flip the comparable bit in the plaintext. Setting the IV ’ s final byte to any value will xor that prize into the plaintext ’ s concluding byte. If we iterate through every possible value for the final examination IV byte, finally one of them will set the plaintext ’ s final byte to 0x01 – and our embroider oracle will tell us when this happens, because 0x01 is valid pad !
Why is it valid ? Recall that under the scheme we ’ ra using, valid slog consists of normality bytes of measure n. A trailing 0x01 byte might not look like a lot, but it meets this definition, so the oracle accepts it good like it would accept 0x02 0x02 or 0x03 0x03 0x03 .
here ’ s an model of what this looks like in action :

You ’ ll notification an supernumerary block in this figure. This obstruct shows the output of DECkey. I ’ ve chosen wholly arbitrary contents for this block ; the point is just that you can see the relationship between this value, the IV, and the plaintext. In especial, the search for a valid IV byte ends when we reach 0x2e, because 0x2e ⊕ 0x2f = 0x01.

once we have this step of the assail influence, we can do something truly cool : we can start to construct what I ’ ll call a zero IV. This is an IV which will set some ( finally all ) of the plaintext ’ second bytes to zero .
Why zero ? Two reasons, one of which is useful now and one of which will come up late. The rationality I ’ ll give you for now is this : zero gives us options. If we want to set a plaintext byte to any value other than zero, we can good xor that value into the zero IV. In other words, the zeroing IV gives us a way of manipulating the plaintext however we like .
How do we build a zero IV ? well, adenine soon as we set the plaintext ’ s final byte to 0x01, we can take the corresponding IV byte and xor that against 0x01. This modified IV byte will set the plaintext ’ s final byte to 0x00 – and indeed it will work as the final byte of our zeroing IV .
once we have that, we can derive a newfangled IV which is guaranteed to set the plaintext ’ s final byte to 0x02, and we can start trying to set the plaintext ’ s penult byte to 0x02 a well .
actually, there is one bantam boundary encase that we have to check for inaugural. Suppose the plaintext ’ s penult byte is already set to 0x02. In this lawsuit, the message ’ sulfur padding would be valid if the concluding byte is set to either 0x01 or 0x02. If our search hits 0x02 before 0x01, but we assume that we found 0x01 and not 0x02, then the rest of the attack will fail. Luckily there is a simple test we can use hera : equally soon as we get an affirmative result from the oracle, we ’ ll change the IV ’ south penultimate byte and query the oracle again. If both queries succeed, this tells us that the penult byte is not character of the message ’ sulfur ( valid ) slog, proving that the padding has length one and therefore must have value 0x01 american samoa well. On the other hand, if this second gear question fails, we ’ ve run into a faithlessly positive and should keep search .
once we ’ ve found valid one-byte slog, we can use a similar process to search for valid two-byte slog. This search will go just like the search for the final byte ( minus the boundary case, since now we know our valid pad can only be of distance 2 ). This search will terminate when the plaintext ’ s final examination two bytes adequate 0x02 0x02, at which compass point we ’ ll know how to zero ( and thus operate ) both of these bytes. This permits us to move on to attacking the third-from-last byte, then the fourth-from-last, and indeed on .
here ’ s what the entire process looks like :

This process can be followed until we ’ ve managed to build up a full zero IV. This brings us to the second base reason why a zero IV is useful. One of the basic properties of xor is this : if IV ⊕ BLOCK = 0, then IV = BLOCK. In early words, we ’ ve merely recovered the output signal of DECkey – it is equal to our zeroing IV !
This is big ! now that we ’ ve recovered this, we can xor it against the previous ciphertext block ( or IV ) to recover the correspond plaintext engine block. Run this process once per barricade and we ’ ll have recovered the full moon plaintext !

Sample Implementation

That ’ s the hypothesis behind the approach. hera ’ s how you might implement it .
Something to bear in mind : You can learn a fortune from other people ’ s code, but you ’ ll learn tied more from your own hands-on feel ! My suggestion is that you go and try implementing the ideas above, then once you ’ re done, come back to this reference implementation and use it to “ check your shape ” .
With that said, here ’ sulfur how I might write this attack :

#!/usr/bin/env python3

BLOCK_SIZE = 16


def single_block_attack(block, oracle):
    """Returns the decryption of the given ciphertext block"""

    # zeroing_iv starts out nulled. each iteration of the main loop will add
    # one byte to it, working from right to left, until it is fully populated,
    # at which point it contains the result of DEC(ct_block)
    zeroing_iv = [0] * BLOCK_SIZE

    for pad_val in range(1, BLOCK_SIZE+1):
        padding_iv = [pad_val ^ b for b in zeroing_iv]

        for candidate in range(256):
            padding_iv[-pad_val] = candidate
            iv = bytes(padding_iv)
            if oracle(iv, block):
                if pad_val == 1:
                    # make sure the padding really is of length 1 by changing
                    # the penultimate block and querying the oracle again
                    padding_iv[-2] ^= 1
                    iv = bytes(padding_iv)
                    if not oracle(iv, block):
                        continue  # false positive; keep searching
                break
        else:
            raise Exception("no valid padding byte found (is the oracle working correctly?)")

        zeroing_iv[-pad_val] = candidate ^ pad_val

    return zeroing_iv


def full_attack(iv, ct, oracle):
    """Given the iv, ciphertext, and a padding oracle, finds and returns the plaintext"""
    assert len(iv) == BLOCK_SIZE and len(ct) % BLOCK_SIZE == 0

    msg = iv + ct
    blocks = [msg[i:i+BLOCK_SIZE] for i in range(0, len(msg), BLOCK_SIZE)]
    result = b''

    # loop over pairs of consecutive blocks performing CBC decryption on them
    iv = blocks[0]
    for ct in blocks[1:]:
        dec = single_block_attack(ct, oracle)
        pt = bytes(iv_byte ^ dec_byte for iv_byte, dec_byte in zip(iv, dec))
        result += pt
        iv = ct

    return result

This code provides a generic implementation of the attack. It will work for any reliable embroider prophet. It demonstrates how the multi-block approach reduces to the single-block attack, how the single-block case builds up the zero IV one byte at a fourth dimension, how to efficiently handle the edge case with the IV ’ s first byte, and so on .
To attack particular padding oracles, one can just import this handwriting and set it to work. For case, here ’ s a little script that imports the previous code snip and uses it to solve Cryptopals Challenge 17 .
eminence that to save on space, I ’ ve chosen to use the versions of AES, CBC, and PKCS # 7 provided by the PyCryptodome library rather than including wax implementations of those build blocks here .

#!/usr/bin/env python3

import random
import os

from Crypto.Cipher import AES  # requires PyCryptodome
from Crypto.Util.Padding import pad, unpad


class Challenge:
    _strings = (
        b"MDAwMDAwTm93IHRoYXQgdGhlIHBhcnR5IGlzIGp1bXBpbmc=",
        b"MDAwMDAxV2l0aCB0aGUgYmFzcyBraWNrZWQgaW4gYW5kIHRoZSBWZWdhJ3MgYXJlIHB1bXBpbic=",
        b"MDAwMDAyUXVpY2sgdG8gdGhlIHBvaW50LCB0byB0aGUgcG9pbnQsIG5vIGZha2luZw==",
        b"MDAwMDAzQ29va2luZyBNQydzIGxpa2UgYSBwb3VuZCBvZiBiYWNvbg==",
        b"MDAwMDA0QnVybmluZyAnZW0sIGlmIHlvdSBhaW4ndCBxdWljayBhbmQgbmltYmxl",
        b"MDAwMDA1SSBnbyBjcmF6eSB3aGVuIEkgaGVhciBhIGN5bWJhbA==",
        b"MDAwMDA2QW5kIGEgaGlnaCBoYXQgd2l0aCBhIHNvdXBlZCB1cCB0ZW1wbw==",
        b"MDAwMDA3SSdtIG9uIGEgcm9sbCwgaXQncyB0aW1lIHRvIGdvIHNvbG8=",
        b"MDAwMDA4b2xsaW4nIGluIG15IGZpdmUgcG9pbnQgb2g=",
        b"MDAwMDA5aXRoIG15IHJhZy10b3AgZG93biBzbyBteSBoYWlyIGNhbiBibG93"
    )

    def __init__(self):
        self._key = os.urandom(16)

    def get_string(self):
        """This is the first function described by Challenge 17."""
        string = random.choice(self._strings)
        cipher = AES.new(self._key, AES.MODE_CBC)
        ct = cipher.encrypt(pad(string, AES.block_size))
        return cipher.iv, ct

    def check_padding(self, iv, ct):
        """This is the second function described by Challenge 17."""
        cipher = AES.new(self._key, AES.MODE_CBC, iv)
        pt = cipher.decrypt(ct)
        try:
            unpad(pt, AES.block_size)
        except ValueError:  # raised by unpad() if padding is invalid
            return False
        return True


if __name__ == "__main__":
    from cbc_padding_oracle_attack import full_attack
    from base64 import b64decode

    service = Challenge()
    iv, ct = service.get_string()
    print("Ciphertext:", ct)
    print("Launching attack...")

    result = full_attack(iv, ct, service.check_padding)
    plaintext = unpad(result, AES.block_size)
    print("Recovered plaintext:", plaintext)
    print("Decoded:", b64decode(plaintext).decode('ascii'))

Defenses

This attack is a chosen-ciphertext attack. It depends on the attacker being able to submit arbitrary ciphertexts to the prophet. As such, you can prevent the attack by authenticating your ciphertexts. You might do this by switching from CBC mode to an authenticate encoding mode like GCM or OCB ; alternately, keep CBC mode but start MACing your ciphertexts using something like HMAC .
Removing the oracle would besides prevent the attack. however, hopefully the example oracles above gave you some sense of how nontrivial this actually can be in commit. This is a cryptanalytic problem and it calls for a cryptanalytic solution ; anything less is likely to be flimsy and erring .
By adding authentication tags and checking them anterior to decoding, we guarantee that we ’ ll be able to reject any attacker-crafted messages without ever decrypting them, preventing us from leaking any information at all about their decode contents, padding-related or otherwise .

Share this:

Like this:

Like

Loading…

reference : https://coinselected.com
Category : crypto topics

Leave a Reply

Your email address will not be published.