Initialisation Vectors

Started by
5 comments, last by rip-off 15 years, 6 months ago
I've been designing a general-purpose archive format that supports encryption, and am at the stage of implementing Block Cipher Modes of Operation. My question is, is it safe to use a data checksum (or part of it) as an Initialisation Vector? I ask because I am trying to avoid having fields that are not always used; for example, a separate "Initialisation Vector" field will become redundant for files that are not encrypted, or are encrypted but employ a Mode of Operation that does not require an Initialisation Vector, such as ECB (which I know is not recommended). [Edited by - WaterCrane on October 9, 2008 7:17:47 AM]
Advertisement
I don't think so, no. The Initialisation Vector must be allowed to vary even when sending the same data. This is why we include IVs - to prevent the same plaintext resulting in identical ciphertext.

I believe you should still be able to make a format work while omitting the IV field for unencrypted archives.
I was under the impression that Block Cipher Modes of Operation were there to prevent identical plaintext blocks encrypting into identical ciphertext blocks rather than an entire data stream. I figured it was okay to use a plaintext message digest as the Initialisation Vector because, normally, only one copy of the encrypted file is stored.

I have included support for per-file "chunks" to contain timestamps, filenames (only the hashes are stored in the look-up table) and meta-data, so there is nothing to stop me from including one that contains an initialisation vector (and returning an error if it is missing when one is required).

[Edited by - WaterCrane on October 9, 2008 7:39:23 AM]
Cipher modes do as you say and prevent identical plain text blocks from becoming identical cipher text blocks. Depending on the mode it will be replace with a previous encrypted block such as with CBC or incremented by some amount such as with CTR.
Patrick
Thank you for your responses everyone. I will look for a clean and elegant place to store a separate Initialisation Vector, and keep the checksum solely as a means to confirm that the correct key has been provided.

Using a chunk to store the IV would work but that will have a lot of extra overhead from the read functions trying to find it (probably not as complicated as decrypting an encrypted stream of blocks, but still extra overhead that I prefer to minimise if possible), plus it feels a little clumsy; therefore I will likely have an optional field that is only present if required.

P.S. I modified my other posts slightly because I said "Cipher Block" instead of "Block Cipher"!
Just one final thing...

The archive is designed to be slow to write, but fast to read, therefore the data within, encrypted or not, seldom changes. Is it still unsafe for the checksum of the entire plaintext to act as the Initialisation Vector? I did wonder if it would aid cryptanalysis, but a good message digest algorithm, such as the Secure Hash Algorithm family, will minimise this, right?
Quote:
Is it still unsafe for the checksum of the entire plaintext to act as the Initialisation Vector?


"Unsafe", not really. But you don't gain anything. See below.

Quote:
I was under the impression that chaining is there to prevent identical plaintext blocks encrypting into identical ciphertext blocks rather than an entire data stream.


Ok, with chaining you won't get repeating blocks of ciphertext if the plaintext repeats. However, that is at the block level. Consider what happens if you more than one plaintext message (in your case, archive) encrypted with the same key.

If the two plaintexts begin with a common prefix, then the ciphertext will also have a common prefix. If the two plaintext messages are identical, then the ciphertext will be identical.

In a networking scenario, this is quite common. Consider IP packet headers, or higher level protocol headers such as HTTP. An interested third party could infer some information from the fact that (at least part of) the ciphertext is identical, without even cracking the decryption scheme.

This could be the case for your archive. Consider a simple implementation, we keep the metadata towards the end of the file to make it easier to change. File data is stored consecutively in the order the files are added to the archive.

Alice makes an initial archive, and emails it to Bob. Bob adds more files, and again emails it to Alice. Maybe they do a few more exchanges. Eve meanwhile has intercepted Alice's email. She has N archives, all with the same key. She also can inspect them to see that they all begin with a common prefix. This may ad her if she wants to do break the encryption. She can already infer that Alice and Bob are collaborating on some project, and that is without doing any work!

By including an IV, otherwise identical message prefixes will end up with radically different ciphertexts, due to the "seed" that is the IV. While Eve could still make the same inference, she cannot do so with the same degree of certainty.

It is up to you. It depends on what use you are going to put the archive format to. If the user encrypts two archives that share some prefix with the same key, does it bother you that they are identical? Bear in mind though that such sharing probably makes it easier for someone who wants to crack the archive.


This topic is closed to new replies.

Advertisement