Automating Skimmer Deobfuscation
Ben Baryo
TL;DR: Code. Skimmers. Packers. Obfuscation.
True Love.
Not just your basic, average, everyday, ordinary, run-of-the-mill, no-hum blog post.
Expanding on my previous post, I’ll share my method for automating deobfuscation, describing code structure using AST. I’ll also give a sneak preview to my upcoming open source deobfuscation tools.
This is going to be a very (very) technical post in which I’ll take you on a journey through automating part of the deobfuscation process. Those of you who’d rather not dig into code structure and ASTs – this is where the story ends. You click away, and believe whatever you want to believe. Those of you who would like to continue can join me in scaling the Codes of Insanity, battling strings of unusual size, and facing mild annoyance at all of the mixture of references I keep throwing your way.
Spoiler alert: I’ll be making use of a couple of my deobfuscation tools: flAST – which flattens and modifies ASTs, and REstringer – a Javascript deobfuscator. The latter will be released soon as open source so you’ll be able to use it yourself!
Start Reading, Player One
In my previous post I explained my thought process when facing an obfuscated script and showed the steps I took to deobfuscate the anti-VM skimmer. To reiterate, the steps were:
- Unpack the code to get a skimmer obfuscated with references to a packed strings-array.
- Unpack the strings-array.
- Replace all references to the strings array throughout the script with the actual values.
Since I was reviewing many variants of the skimmer, all using the same obfuscation with different variable/function names, I made use of my deobfuscator tool as much as possible. But alas! When running the deobfuscator (dubbed REstringer) on the obfuscated script, it stops just after decoding the skimmer code, but before turning the string into actual code nodes:
var ZhC = 'constructor';
var LSm = 'f)C[fv.ogo=exs!)zo)-)...';
var AET = UDS.constructor;
var JIX = '';
var Uku = UDS.constructor;
var BTX = UDS.constructor('', 'var m=10,o=59,v=10;var...');
var FpP = 'function _0x270ED(_0x26D23,_0x26A7C){var ...';
var CuM = UDS.constructor('', 'function _0x270ED(_0x26D23,_0x26A7C){var...');
CuM(4372);
I then have to take the value in the FpP
variable, remove the quotes around it, wrap it in an anonymous function, and replace the value of the CuM
variable with it.
Once I do this, I can run the unpacked yet still obfuscated script through REstringer. It then completes the array-replacements deobfuscation and I get all of the strings in place. Basically, I was stuck with a manual process between the first and second steps.
But why spend 10 seconds manually doing something you can automate in a couple of hours?
hASTa la vista, Packer
Not all packers look the same. In fact, REstringer already handles eval
based and Function
based unpacking. How do I add this new constructor
based variation?
REstringer’s basic MO is looking for syntactic structures that describe specific obfuscation, and undoing them, resulting in reconstructed strings and simplified flows.
What I need to do is create a new method to detect the single step I’m missing – namely the .constructor(..., ‘code’)
structure and turn it into a function() {code}
structure.
The first thing I do when faced with a task like this is to understand how to describe the node’s structure. A good way to understand what I’m looking at is throwing the code into an online tool like AST Explorer and walking through the tree, making note of the relationships between parent and child nodes, and their properties:
What we’re looking for can be found on line 28 and on line 30. Let’s focus on the latter, since that’s the code that we’re really interested in:
var CuM = UDS.constructor('', 'function _0x270ED(_0x26D23,_0x26A7C){var...');
The whole line is a variable declaration, but what we’re looking for is on the right side (init) of the variable declarator – the call expression.
Ok, that’s enough node-speak for now . To write or read code, you don’t need to know what each node type is called, so let me run through it just so we’re all on the same page. This next part is going to be a lot, so brace yourself, or skip it altogether if you’re already familiar with ASTs (not ATSTs, mind you).
Definitions Time: Node If You Understand Me
I mentioned a couple of node types in the previous paragraph. To understand how the structure can be described using AST, we’ll first need to get a grasp of how code is parsed and how nodes are described. This is by no means a complete, or even a good intro to AST
A quick note before diving into the nodes’ structures:
The structure is based on espree, which I use to produce the AST. Different parsers may produce slightly different node structures. You can see the differences by switching parsers on the AST Explorer site.
Variable Declaration
Contains 1 or more declarations. Examples:
var a
const b = {}
let c = 3, d = 'hello', e
- Kind – the keyword defining the scope of the declaration – on what scopes will the variable(s) be available? Can be either
var
,const
, orlet
. -
Declarations – an array of variable declarators. Can’t be empty.
// var ... { type: 'VariableDeclaration', kind: 'var' declarations: [{type: 'VariableDeclarator', ...}, ...], }
Variable Declarator
A variable declarator only exists under a variable declaration. Otherwise, it’s just an assignment expression. For an example of this node type, simply look at the examples for the variable declaration and ignore the var/const/let
keywords.
- Id – the name of the variable. Usually it’s an identifier (read: variable), but more complex structures can involve something other than an identifier, such as destructuring (
var {a} = {a: 4}
) or member expressions (window.v = ‘var’
), etc. -
Init – The optional initial value assigned to the variable. The value is mandatory when using the
const
keyword, and may be omitted otherwise.// CuM = ... { type: 'VariableDeclarator', id: {type: 'Identifier', name: 'CuM'}, init: {...}, }
Member Expression
Access to an object’s methods, properties, or indices is usually done via member expressions.
You’ve seen these used with either dot notation (a.b
) or bracket notation (a[0]
).
- Object – the object containing the referenced property. This is the
a
ina[0]
. - Property – a reference to a property / method / index in the object. This can be one of many types of nodes such as a literal (
a[0]
), an identifier (a[b
] ora.toString
), another member expression (a[b.c]
), etc. - Computed – a boolean value describing whether the expression is using dot notation (false) or bracket notation (true).
// UDS.constructor
{
type: 'MemberExpression',
object: {type: 'Identifier', name: 'UDS'},
property: {type: 'Identifier', name: 'constructor'},
computed: false,
}
Call Expression
A function call. For example:
atob('SSB3aWxsIG5vdCBidXkgdGhpcyByZWNvcmQuIEl0IGlzIHNjcmF0Y2hlZC4=')
clear()
0xf.toString(12)
- Callee – the object to be executed. Can be many types of nodes, such as an identifier, or function expression, etc.
-
Arguments – zero or more values to be passed to the callee upon execution. Can be any type of node.
// CuM(4372) { type: 'CallExpression', callee: {type: 'Identifier', name: 'CuM'}, property: {type: 'Literal', value: 4372, ...}, }
Literal
A string (surrounded by a single or double quotes, not a tick), a number, or a boolean.
For example:
""
'hello'
123
0xff
true
1_000_000_000_000
- Value – self explanatory.
-
Raw – how the value is represented in the code. Does it use double quotes? Single quotes? Is the number separated with underscores? Etc.
// 4372 { type: 'Literal', value: 4372, raw: '4372', }
Identifier
A variable name.
// CuM
{
type: 'Identifier',
name: 'CuM',
}
- Name – the name of the variable.
Aaaaaand We’re Back
Now that we’ve familiarized ourselves with different node types, let’s try to describe the line we’re interested in, using AST. Here’s a reminder:
var CuM = UDS.constructor('', 'function _0x270ED(_0x26D23,_0x26A7C){var...');
AST Explorer breaks it down as such:
{
type: "VariableDeclaration",
declarations: [
{
type: "VariableDeclarator",
kind: "var",
id: {
type: "Identifier",
name: "CuM",
},
init: {
type: "CallExpression",
callee: {
type: "MemberExpression",
object: {
type: "Identifier",
name: "UDS",
},
property: {
type: "Identifier",
name: "constructor",
},
computed: false,
},
arguments: [
{
type: "Literal",
value: "",
raw: "''",
},
{
type: "Literal",
value: "func...",
raw: "'func...'",
},
],
},
},
],
}
That’s a good starting point, but how to describe it?
Let’s break down what we’re looking for in a node to indicate it’s a relevant node:
- The node is a call expression.
- The callee is a member expression.
- The callee’s property is an identifier with the name
constructor
. - The call expression’s arguments include a non-empty literal at the last index.
- The call expression is the init node of a variable declarator.
This seems pretty generic. Let’s translate that to code – a function that returns true if a node matches this description or false otherwise:
function isPackerConstructor(n) {
return n.type === 'CallExpression' && // 1
n.callee.type === 'MemberExpression' && // 2
n.callee.property.type === 'Identifier' && // 3
n.callee.property.name === 'constructor' && // 3
n.arguments.length && // 4 - Required to avoid throwing an error if there are no arguments.
n.arguments.slice(-1)[0].type === 'Literal' && // 4
n.arguments.slice(-1)[0].value.length && // 4
n.parentNode.type === 'VariableDeclarator' && // 5
n.parentNode.init?.nodeId === n.nodeId; // 5 - Establish it's the same node.
}
Great! Now, how do we test it?
Flattening the Playing Field
Going up and down the tree, looking for nodes and making comparisons, is all fine and dandy, but I much prefer getting all of my nodes at the same time and filtering them according to my needs.
Then I can decide what to do with them, and apply the changes I decided on to the tree.
This is achieved using the flAST module (npm install flast
). I will write more about it in a dedicated post, but to make a mildly long story short, it flattens ASTs, keeping links between parents and children, assigns unique ids to both nodes and scopes, and tracks identifiers’ declarations and references.
I’ll use this module to build a mini deobfuscator that traverses the manual step I’m so reluctant to do:
const {generateFlatAST} = require('flast');
const code = `var ZhC = 'constructor';
var LSm = 'f)C[fv.ogo=exs!)zo)-)...';
var AET = UDS.constructor;
var JIX = '';
var Uku = UDS.constructor;
var BTX = UDS.constructor('', 'var m=10,o=59,v=10;var...');
var FpP = 'function _0x270ED(_0x26D23,_0x26A7C){var ...';
var CuM = UDS.constructor('', 'function _0x270ED(_0x26D23,_0x26A7C){var...');
CuM(4372);`;
const ast = generateFlatAST(code);
function isPackerConstructor(n) {
return n.type === 'CallExpression' &&
n.callee.type === 'MemberExpression' &&
n.callee.property.type === 'Identifier' &&
n.callee.property.name === 'constructor' &&
n.arguments.length &&
n.arguments.slice(-1)[0].type === 'Literal' &&
n.arguments.slice(-1)[0].value.length &&
n.parentNode.type === 'VariableDeclarator' &&
n.parentNode.init?.nodeId === n.nodeId;
}
const matches = ast.filter(isPackerConstructor);
console.log(matches);
Running it and stopping the debugger on the last line, I can see that two matches were found: the var BTX
line and the var CuM
line, which is exactly what we were hoping to see. This means that we’ve managed to describe the nodes well, though these things should always be tested against several large scripts where they might unexpectedly match less appropriate lines.
When that happens, it means that I must be more specific in the description. Sometimes it’s not possible to describe exactly what you’re looking for without also matching on other lines, which means another path or deobfuscation should be explored.
So now how do we turn the call expression into the actual code? The easy way is to just run it! But you might not want to run code you found online, especially if you know it’s malicious. What to do, then?
What did I do manually?
- I took the code and removed the quotes around it.
- Wrapped it in an anonymous function.
- Replaced the call expression with the new code.
Let’s code!
const line = ast.filter(isPackerConstructor)[1];
const funcCode = line.arguments.slice(-1)[0].value; // 1
const funcNode = generateFlatAST(`(function () {${funcCode}})`, {detailed: false})[2]; // 2
Before replacing the call expression with funcNode, let’s understand the code:
- The
generateFlatAST
function takes code as input and generates a flattened AST. - The anonymous function itself needs to be wrapped in parentheses, which turn it into an expression statement. Without wrapping it, the code would not be parsed.
- The options argument sets details to false, preventing us from wasting our time on node ids and scoping, since we only need the code now. This also helps prevent mixup with node ids when injecting one tree to another.
All that’s left to do is to replace UDS.constructor('', 'function...');
with funcNode
‘s code. To do that, we’ll use flAST’s Arborist.
Let the Arborist Take Care of Your Tree
What flAST allows you to do is to search the entire tree without having to traverse it every time. You can simply filter through the flat array in search of nodes that match your query.
Once you find what you were looking for, you may want to replace or delete a node. In the spirit of traversing the tree as little as possible, the Arborist offers you a way to mark nodes with the changes you want to make, and once you’re done, it traverses the tree only once and applies all of the changes.
Let’s see that in action:
const {generateFlatAST, generateCode, Arborist} = require('flast');
const code = `var ZhC = 'constructor';
var LSm = 'f)C[fv.ogo=exs!)zo)-)...';
var AET = UDS.constructor;
var JIX = '';
var Uku = UDS.constructor;
var BTX = UDS.constructor('', 'var m=10,o=59,v=10;var...');
var FpP = 'function _0x270ED(_0x26D23,_0x26A7C){var ...';
var CuM = UDS.constructor('', 'function _0x270ED(_0x26D23,_0x26A7C){var...');
CuM(4372);`;
const ast = generateFlatAST(code);
function isPackerConstructor(n) {
return n.type === 'CallExpression' &&
n.callee.type === 'MemberExpression' &&
n.callee.property.type === 'Identifier' &&
n.callee.property.name === 'constructor' &&
n.arguments.length &&
n.arguments.slice(-1)[0].type === 'Literal' &&
n.arguments.slice(-1)[0].value.length &&
n.parentNode.type === 'VariableDeclarator' &&
n.parentNode.init?.nodeId === n.nodeId;
}
const line = ast.filter(isPackerConstructor)[1];
const funcCode = line.arguments.slice(-1)[0].value;
const funcNode = generateFlatAST(`(function () {${funcCode}})`, {detailed: false})[2];
const arborist = new Arborist(ast);
arborist.markNode(line, funcNode); // Replace line with funcNode.
arborist.applyChanges();
const newCode = generateCode(arborist.ast[0]); // Reconstruct the code from the modified root node.
console.log(newCode);
Run it – et voilà! The constructor call has now been replaced with the actual function, and the deobfuscation process can continue uninterrupted.
What’s Next?
Oh boy! That was a lot to unpack! ( ͡~ ͜ʖ ͡°)
The example above takes a semi-deobfuscated code and changes a single thing – turning a code string into nodes. What I’ve already done is add a more robust version of the same logic into the REstringer deobfuscator:
function _resolveFunctionConstructorCalls() {
const candidates = this._ast.filter(n =>
n.type === 'CallExpression' &&
n.callee?.type === 'MemberExpression' &&
[n.callee.property?.name, n.callee.property?.value].includes('constructor') &&
n.arguments.length && n.arguments.slice(-1)[0].type === 'Literal');
for (const c of candidates) {
if (!['VariableDeclarator', 'AssignmentExpression'].includes(c.parentNode.type)) continue;
let args = '';
if (c.arguments.length > 1) {
const originalArgs = c.arguments.slice(0, -1);
if (originalArgs.filter(n => n.type !== 'Literal').length) continue;
args = originalArgs.map(n => n.value).join(', ');
}
// Wrap the code in a valid anonymous function in the same way Function.constructor would.
// Give the anonymous function any arguments it may require.
// Wrap the function in an expression to make it a valid code (since it's anonymous).
// Generate an AST without nodeIds (to avoid duplicates with the rest of the code).
// Extract just the function expression from the AST.
const codeNode = generateFlatAST(`(function (${args}) {${c.arguments.slice(-1)[0].value}})`, {detailed: false})[2];
this._markNode(c, codeNode);
}
}
These are the improvements in this code over what we built above:
- Handle both computed and non-computed member expressions for the
constructor
method:UDS.constructor()
andUDS['constructor']()
. - Include any arguments included in the constructor call:
UDS.constructor(‘a’, ‘b’, ‘return a + b’)
will becomefunction(a, b) {return a + b}
. - Include cases where the assignment isn’t part of a variable declaration, but an assignment to an existing variable (or implicitly to a global variable):
var CuM; CuM = DS.constructor...
Now when I run the obfuscated skimmer in REstringer I get it completely deobfuscated without any need for manual interaction .
In the near future, I will post about the upcoming open-source REstringer project, along with flAST and an obfuscation detector. Stay tuned!