CSI:Internet
Episode 5: Matryoshka in Flash
by Sergei Shevchenko
To find a real iPhone video instead of the one turned out to be a trojan yesterday, I'm entering "new iphone video" into Google. One of the top links promises an "exclusive preview"; it leads to a web page with a video – but what's going on there? This one isn't working, either!
Unlike yesterday, there aren't any suspicious strings in the SWF file I've just downloaded. However, after analysing the file with swfdump it doesn't surprise me that I can't see a movie:
[HEADER] Frame count: 1
[HEADER] Movie width: 1.00
[HEADER] Movie height: 1.00
One single frame which measures 1 x 1 pixels – doesn't look like the author really wanted to show me a video. Instead, the dump contains pages and pages of what is called P-code, which is very difficult to read. This isn't so unusual in itself, as ActionScript does allow programmers to create things like games and interactive movies.
The pertaining P-code is something like an assembler for CPU machine instructions. Similar to Java or .NET, ActionScript is compiled into bytecode. The just-in-time compiler of a virtual machine then converts this bytecode into native code for the CPU to execute. When these binary bytecode instructions are then translated back, we get this P-code.
What I wouldn't give for a functional installation of Buraks' Action Script Viewer or of the Sothink SWF Decompiler, which would produce some form of readable ActionScript from this code. However, on this computer I have to make do with the free abcdump Adobe released as part of the Tamarin project. At least it allows me to process the P-code listings further than the swfdump output. And we will need to process the listings further if we are to make any sense of them at all.
The statistics in the file header already confirm my growing suspicion that something is going on here: If more than 25,000 pushshort
and pushbyte
commands make up about 97 per-cent of the code, this doesn't bode too well. The API analysis of abcdump further confirms my suspicion:
class EySpSUmzVvhfxjxHBjknyJec extends Object
class EySpSUmzVvhfxjxHBjknyJec
function EySpSUmzVvhfxjxHBjknyJec():*
class EySpSUmzVvhfxjxHBjknyJec
static var DAeBnlwHPuJPkQrZogFcTVoLn:String =
"fx46RIu1kelToyIVefnbEF"
Experience tells me: People who use such identifiers have something to hide. But let's not be hasty! After all, I've seen similar things in legitimate code, for instance when a Flash artist tried to protect his "intellectual property" by using a scrambler.
To brush up on my P-code knowledge, I quickly download Adobe's description of the ActionScript Virtual Machine 2. P-code uses stack-based operations – like some of the stone-age pocket calculators that required two values to be pushed to a stack before performing an addition. The following segment, for instance, sets the equivalent of a variable – a slot – to the value of 0:
pushbyte 0 // push 0 on stack
convert_d // pop, convert to double, push it back
setslot 1 // pop value from stack into slot 1
To avoid going crazy I do a few search&replace runs to systematically replace such variable names as GaAnighKAUXBTVKnoMpTnQgKB
, EySpSUmzVvhfxjxHBjknyJec
and DAeBnlwHPuJPkQrZogFcTVoLn
with the short expressions Gaa
, Eys
and Dae
. Conveniently, I find a further few replacement tips in the decompiler comments pertaining to the function header of Main()
.
var ii:Number /* slot_id 5 */
var i:Number /* slot_id 3 */
var j:Number /* slot_id 2 */
var bytes:flash.utils::ByteArray /* slot_id 1 */
There's probably some reason why slot_id 3
is assigned to the i
variable, so I go along with that for now. To do so, I use a trick which utilises the listing's systematic formatting. For instance, replacing slot 3
with i
conveniently adjusts all get
and set
commands for this field in one go, and
67 getslot 3
is transformed into a simple get i
. Ok – time to take a closer look at the first code segments at Main()
.