I'm working on a project to be able to automatically detect JPEG files embedded in other files (such as game resources). In the absence of knowledge of the resource file format, but with knowledge that it is uncompressed, I can theoretically locate the start and end of each datafile in the resources file. While fairly simple with BMP as BM starts the file and the length of the file is determined by the very next data field, it's more complicated with JPEG.
Ok I know I can find the start of a JPEG file embedded in another file by looking for the first the first tag that opens a JPEG file. It is the specific tw-byte-integer 0xFFD8. Following that are a series of tags and offsets.
Each tag is of the form 0xFFXX (where XX means any valid 1-byte value). After each tag is a two byte offset in BigEndian format. This is SUPPOSED TO point to the next tag (it's a relative offset from the begining of the current offset field). But it does NOT always point to the next tag. In theory I could follow all the jumps from these offsets until I reach the EOF tag which is 0xFFD9. But it doesn't work in practice. I found that at least once following this offset will land me in somewhere in the middle of the compressed JPEG image data.
My test file is the Windows XP sample image file "Blue hills.jpg". I parsed it manually in a hex editor, and here's my results.
The LAST TAG that gets processed in a JPEG image file should the end-of-image tag which is 0xFFD9, but no such tag is EVER encountered in the jumps in my test. Instead it runs into a region of the file containing apparently invalid data! My technique is completely according to the official JPEG/JFIF specs. But it doesn't work entirely. I personally think that the TRUE SPECS are some kind of trade-secret known only to the JPEG organization, and is a spec not publicly available, and only licensed out to official software developing corporations.
If someone here can shed some light on what I'm doing wrong, please let me know. Thanks in advance.
Ok I know I can find the start of a JPEG file embedded in another file by looking for the first the first tag that opens a JPEG file. It is the specific tw-byte-integer 0xFFD8. Following that are a series of tags and offsets.
Each tag is of the form 0xFFXX (where XX means any valid 1-byte value). After each tag is a two byte offset in BigEndian format. This is SUPPOSED TO point to the next tag (it's a relative offset from the begining of the current offset field). But it does NOT always point to the next tag. In theory I could follow all the jumps from these offsets until I reach the EOF tag which is 0xFFD9. But it doesn't work in practice. I found that at least once following this offset will land me in somewhere in the middle of the compressed JPEG image data.
My test file is the Windows XP sample image file "Blue hills.jpg". I parsed it manually in a hex editor, and here's my results.
Code:
Tag Relative Offset to Next Tag
0xFFE0 0x0010
0xFFED 0x094C
0xFFEE 0x000E
0xFFDB 0x0084
0xFFC0 0x0011
0xFFDD 0x0004
0xFFC4 0x013F
0xFFDA 0x000C
0xF4D9 THIS IS NOT A VALID JPEG TAG!!!!!
If someone here can shed some light on what I'm doing wrong, please let me know. Thanks in advance.