Thursday, August 27, 2020

Steganography with zip archives

The elegance of CVE-2020-1464 comes from the internal structure of Zip file format. While many other archive formats, like Microsoft Cab, put an index of the compressed files in the beginning of an archive, zip archivers place it in the end of a file.

The reason is historical: apparently, in 1989 disk drives were so slow, that adding a new blob to an existing file & appending a new index to it was cheaper then copying chunks of the original archive to a new file.

The CVE reminded me of an old joke of hiding a .zip in a .jpg. When you append a .zip to an image file, the recipient of the jpeg not necessarily notices junk in the image, but if you know about such a 'hidden' part, any ordinary unzip tool is able to extract it.

This got me thinking: can we hide a file inside of a .zip? BlackHat Europe 2010 had a talk about steganography in popular archives formats. In one of the described tricks, carefully inserting a blob before a zip index, makes it invisible to all common unpackers.

To verify this claim, I wrote a couple of small Ruby scripts, that inject & extract a 'hidden' blob. The approach works: Windows Explorer, 7-Zip, WinRAR, bsdtar(1), unzip(1) didn't see anything unusual. Even in the extreme cases like:

$ du -h foo.zip
4.1G foo.zip

$ bsdtar ftv foo.zip
-rw-r--r-- 0 1000 100 1 Aug 25 21:58 q

that certainly may look unusual to an innocent user–a 4 gigabyte archive that unpacks into an exactly 1 byte file! The opposite of a zip bomb.

A Zip index is formally termed central directory. It consists of 2 main parts: ① central directory headers (CHDs) & ② end of central directory (EOCD) record. A CHD contain metadata about a particular file, EOCD–metadata about the index itself 1:

class Eocd < BinData::Record
endian :little

uint32 :signature, asserted_value: 0x06054b50
uint16 :disk
uint16 :disk_cd_start
uint16 :cd_entries_disk
uint16 :cd_entries_total
uint32 :cd_size
uint32 :cd_offset_start
uint16 :comment_len
string :comment, :read_length => :comment_len,
onlyif: -> { comment_len.nonzero? }
end

The thing of interest here is cd_offset_start (officially called offset of start of central directory 2), a 4-byte value that indicates how many bytes to skip after the first file entry in an archive.

Therefore, after inserting a blob, we need to update cd_offset_start, otherwise the zip file becomes broken.

Just because a user has no clue about the hidden blob whatsoever, doesn't mean specialized tools won't notice it. Say, we have an archive w/ 2 text files:

$ bsdtar ft orig.zip
The Celebrated Jumping Frog of Calaveras County.txt
What You Want.txt

We inject a .png image to it:

$ zipography-inject orig.zip blob1.png > 1.zip

Whilst bsdtar is still none the wiser:

$ bsdtar ft 1.zip
The Celebrated Jumping Frog of Calaveras County.txt
What You Want.txt

Hachoir correctly recognises it as an unparsed block:


  1. This is a DSL from BinData package that provides a declarative way to read/write structured binary data in Ruby.↩︎

  2. Field names in PKWARE's spec are quite verbose.↩︎