tl;dr: We analyze an obfuscation tool used in Magecart skimming kits and demonstrate how you can use this knowledge to deobfuscate common Magecart scripts.
The HUMAN Research Team has been tracking digital skimming toolkits and while following reports on Magecart infections, we discovered a server connected to a very loud operation named Inter. While analyzing the Inter skimming kit’s command and control server, we discovered source code for a malware obfuscator. We quickly connected this to an obfuscation tool sold online called Caesar which is aimed at fraudsters. This post is the technical analysis of the obfuscation tool named Caesar+.
Obfuscation is a process which intends to hide the true function of a program. When used by legitimate parties, obfuscation makes it hard to reverse-engineer code and safeguards Intellectual Property, but when used by illegitimate parties (i.e. fraudsters) it is meant to hide malicious actions from the prying eyes of end users and security researchers alike (read more about obfuscation and Magecart obfuscation here).
Common obfuscation methods include techniques such as:
for (let i=0,p=document.getElementsByTagName('button')[0].addEventListener('click', maliciousFunc); i < arr.length; i++) {}),
Obfuscation is something one can accomplish by oneself, but as with any other process in software development, it is often better to use existing solutions and there are plenty of available free online obfuscation tools on the web. In this post we will explore such a solution - the Caesar+ obfuscator. Caesar+ is sold online, and came again to our attention for its use in the OlympicTickets Magecart attack as well as being a part of a skimming kit sold to fraudsters.
We have written a short and unimaginative script to be obfuscated and then deobfuscated in order to show how the obfuscator is functioning.
Here’s the original script:
function printMsg(msg) {
console.log(msg);
}
function hello() {
return "Hello";
}
function world() {
return "world";
}
function getHelloWorld() {
return hello() + " " + world();
}
printMsg(getHelloWorld());
Running this script in the browser would print “Hello world”. Let’s obfuscate this script using Caesar+ with the default settings:
$ python2 caesarp.py script.js obfuscated_script.js
Gen namespace...
Outside guard level is 1 [MEDIUM]
Inside guard level is 1 [MEDIUM]
Document codepage set to utf8
Parsing...
Make...
Save to obfuscated_script.js
CRC code:208
Done.
$
Running this script in the browser would print “Hello world”. Let’s obfuscate this script using Caesar+ with the default settings:
$ python2 caesarp.py script.js obfuscated_script.js
Gen namespace...
Outside guard level is 1 [MEDIUM]
Inside guard level is 1 [MEDIUM]
Document codepage set to utf8
Parsing...
Make...
Save to obfuscated_script.js
CRC code:208
Done.
$
The resulting JS code:
(function w8g(){yv1="0a0w0w0w0w0w0w0w0w0w0w0 w0w2u39322r38. 2x33320w2w2w14382t3c38153f0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w2x2u0w14382t3c381a302t322v382w0w1p1p0w1c150w362t383936320w1c1n3a2p360w2w2p372w0w1p0w1c1n0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w2u33360w143a, 2p360w2x0w1p0w1c1n0w2x0w1o0w382t3c381a302t322v382w1n0w2x1717150w3f2w2p372w0w1p0w14142w2p372w1o1o1h15192w2p372w1517382t"+"3c381a2r2. w2p361v332s2t1t38142x151n0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w2w2p372w0w1p0w2w2p372w0w120w2w2p372w1n0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w3h0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w362t383936320w2w2p372w111e1h1h1 n0a0w0w0w0w0w0w0w0w0w0w0w0w3h0a0w0w0w0w0w0w0w0w. 093a2p360w2q332s3d1p3b2x322s333b1a3b1k2v1a38332b38362x322v14151a362t34302p2r2t141b2j2m2p193e1t192i1c191l2k190y2l171b2v180y0y151n0a0w0w0w0w0w0w0w0w093a2p360w2r362r1p2q332s3d1a312p382r2w14"+"1b2t323c3e1g3c331i303b2z391k322p30321c142j2k3b2k2s2k192l17150y1b2v1, 52j1c2l1a362t34302p2r2t140y2t323c3e1g3c331i303b2z391k322p303. 21c0y180y0y151n0, a0w0w0w0w0w0w0w0w092r362r. 1p2r362r1a37392q373836141c182r362r1a 302t322v382w191d151n0a0w0w0w0w0w, 0w0w0w092q332s3d1p2w2w142q332s3 d1a362t34302p2r2t140y2t323c3e1g3c331i303b2z391k322p30321c0y172r362r180y2t323c3e1g3c331i303b2z391k322p30321c0y15151p1p2r362r1r1d1m3b2x322s333b2j0y3, 0 3b1t0y2l140y0y151n0a0w0w0w0w0w0w0w0w1n2u39322r382x33320w2z2e281431372v150w3f0w2r33323733302t2j0y300y170y0y172b38362x322v1a2u3633311v2w2p361v332s2t141d1d1d15170y2v0y2l1431372v151n3h0w2u39322r382x33320w2f3d2h14150w3f0w362t383936320y1s20172t361l302c2i302u330y2j141f1e1h1d1i"+"1d1j1g1k160y223i252j3f332s1k11231w1a2k3c 1k1h281h2k3, c1k 1d380y 2j0y2r2w2p361v332s2t1t380y2l141l15171f1f1a1c152j0y38332b38362x322v0y2l14140y2z1w2u3c19392k3c1k1g163e381h2j1v2g330y2j0y302t322v382w0y2l161e171d1a, 1c15152l141b2j2u2i1l2. c2k1s2k17362l1b2v180y0y151n3h0w2u39322r382x33320w3c391v14150w3f0w362t383936320y233b1. y332a361l301e2s0y2j0y362t34302p2r2t0y2l141b2j232a1e1l1y2l1b2v180y0y151n3h0w2u39322r382x33320w3b3c2r14150 w3f0w362t3839 36320w2f3d2h14150w170y1s0w0y2j0y362t34302p2r2t0y2l141b2j2k1s2l1b2v180y0y15170w3c391v14151n3h0w2z2e28143b3c2r1415151n";
var xIN={};
soA="1d1j1g1k160y223i252j3f332s1k11231w1a2k3c 1k";
Z8W="enxz4xo6lwku8naln0208";
var Rjn="";
var gJQ="1d1j1g1k160y223i252j3f332s1k11231w1a2k3c";
window["w8g"]=w8g;
for(var J40=(0*"k\x89$T7\x85_Lc6Ghr}fBI@^"["charCodeAt"](3)+0.0);J40<gJQ["length"];J40+=("\x8boQOLn){:.j\x88PvY\x8ag="["charCodeAt"](14)*0+1.0)){gJQ=String["fromCharCode"](gJQ["charCodeAt"](J40))};
var J1y = document["createElement"]("div");
var jIC="1d1j1g1k160y223i252j3f332s1k11231".constructor;
var JiS=(6*"?njzy\x80"["length"]+0.0);
J1y["appendChild"](document["createTextNode"](yv1));
J1y=J1y["innerHTML"];J1y=J1y["replace"](/[\s+\.\,]/g,"");
GFk="1b2t323c3e1g3c331i303b2z391k322p30321c142j2";
for(var J40=(0*"6Al)Zb3-@\x852qJ"["length"]+0.0);J40<J1y["length"];J40+=(0*"[\x85D,BN:qP"["length"]+2.0)){Rjn+=String["fromCharCode"](parseInt(J1y["substr"](J40,(2.0+"[ps4'\x60r$>\x84SqI.,\x80vQ\x89"["charCodeAt"](17)*0)),JiS))};
soA="1d1j1g1k160y223i252j3f332s1k11231w1a2k3c 1k";
xIN["toString"]=jIC["constructor"](Rjn);
GFk="1b2t323c3e1g3c331i303b2z391k322p30321c142j2";
Rjn=xIN+"1d1j1g1k160y223i252j3f332s1k11231w1a2k3c 1k1h281h2k3, c1k 1d380y 2j0y2r2w2p3";
Ngr="1d1j1g1k160y223i252j3f332s1k11231w1a2k";
J1y["innerHTML"]="1d1j1g1k160y223i252j3f332s1k11231w1a2k3c 1k1h28";})();
Running the obfuscated code in the browser would again print “Hello world” in the console.
Here’s a quick, graphical overview of the execution process:
The outer layer is the most basic premise of this obfuscator and its first line of defense, meant to hinder reverse engineering by employing two methods:
We found that both methods, once identified, are easy to circumvent.
The first thing we did when we tried to deobfuscate our Hello World script was to find its execution entry point. It’s a self invoking function, which we expected to be anonymous, but later found out why it wasn’t.
Usually, we would find the entry point at the bottom of the script, but it wasn’t there. Going from the last line upwards, we looked for something resembling execution, and we found it on the 5th before last line:
xIN["toString"]=jIC["constructor"](Rjn);
But that wasn’t it. It was just an assignment. An assignment to a variable’s toString method? This technique is not unheard of; A variable’s toString would be implicitly called when employing Javascript’s (in)famous type coercion on a variable and a string. So looking for where the variable xIN was used, required to look only 2 lines down:
Rjn=xIN+"1d1j1g1k160y223i252j3f332s1k11231w1a2k3c 1k1h281h2k3, c1k 1d380y 2j0y2r2w2p3";
Now we could replace this line with “console.log(xIN.toString)” to get code which was executed, but let's take a step back and read the Rjn variable instead.
Replacing this code fragment:
xIN["toString"]=jIC["constructor"](Rjn);
GFk="1b2t323c3e1g3c331i303b2z391k322p30321c142j2";
Rjn=xIN+"1d1j1g1k160y223i252j3f332s1k11231w1a2k3c 1k1h281h2k3, c1k 1d380y 2j0y2r2w2p3";
Ngr="1d1j1g1k160y223i252j3f332s1k11231w1a2k";
J1y["innerHTML"]="1d1j1g1k160y223i252j3f332s1k11231w1a2k3c 1k1h28";})();
With this code (not forgetting to close and call the main function):
console.log(Rjn);})();
Copy-paste the edited code into a browser’s console and run results in this beautified code:
function hh(text) {
if (text.length == 0) return 0;
var hash = 0;
for (var i = 0; i < text.length; i++) {
hash = ((hash << 5) - hash) + text.charCodeAt(i);
hash = hash & hash;
}
return hash % 255;
}
var body = window.w8g.toString().replace(/[^a-zA-Z0-9\-"]+/g, "");
var crc = body.match(/enxz4xo6lwku8naln0([\w\d\-]+)"/g)[0].replace("enxz4xo6lwku8naln0", "");
crc = crc.substr(0, crc.length - 1);
body = hh(body.replace("enxz4xo6lwku8naln0" + crc, "enxz4xo6lwku8naln0")) == crc ? 1 : window["lwA"]("");;
function kVP(msg) {
console["l" + "" + String.fromCharCode(111) + "g"](msg);
}
function WyY() {
return "@H+er9lTZlfo" [(325161748 * "J~M[{od8%KD.\x85P5\x81t" ["charCodeAt"](9) + 33.0)["toString"](("kDfx-u\x84*zt5[CXo" ["length"] * 2 + 1.0))](/[fZ9T\@\+r]/g, "");
}
function xuC() {
return "KwFoRr9l2d" ["replace"](/[KR29F]/g, "");
}
function wxc() {
return WyY() + "@ " ["replace"](/[\@]/g, "") + xuC();
}
kVP(wxc());
The first thing we noticed here, beside the CRC check which we will get to in a minute, is that the main structure is kept intact:
What about the hh function and the body and crc variables? Those are part of the inner layer.
The obfuscator’s default setting for the inner layer, is to add a CRC check to verify the script hasn’t been tampered with and stop execution if it was. The inner layer is also where we would find the other execution scope limiters:
We found that for research purposes, simply removing those lines, up to where the original script’s code starts, does the trick!
This check brings us back to why the main self invoking function wasn’t an anonymous function; It was using this self referral to ensure the script was intact. It saves the CRC value in its outer layer, (we can find it in this example in a dead variable - one which isn’t called anywhere else in the script), right after a unique string:
Z8W="enxz4xo6lwku8naln0208"; // The 208 is the CRC,
It then reads the outer layer’s code, removing all non-word characters:
var body = window.w8g.toString().replace(/[^a-zA-Z0-9\-"]+/g, "");
...leaving us with the following string assigned to body:
`functionw8gyv1"0a0w0w0w0w0w0w0w0w0w0w0w0w2u39322r382x33320w2w2w14382t3c38153f0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w2x2u0w14382t3c381a302t322v382w0w1p1p0w1c150w362t383936320w1c1n3a2p360w2w2p372w0w1p0w1c1n0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w2u33360w143a2p360w2x0w1p0w1c1n0w2x0w1o0w382t3c381a302t322v382w1n0w2x1717150w3f2w2p372w0w1p0w14142w2p372w1o1o1h15192w2p372w1517382t""3c381a2r2w2p361v332s2t1t38142x151n0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w2w2p372w0w1p0w2w2p372w0w120w2w2p372w1n0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w3h0a0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w0w362t383936320w2w2p372w111e1h1h1n0a0w0w0w0w0w0w0w0w0w0w0w0w3h0a0w0w0w0w0w0w0w0w093a2p360w2q332s3d1p3b2x322s333b1a3b1k2v1a38332b38362x322v14151a362t34302p2r2t141b2j2m2p193e1t192i1c191l2k190y2l171b2v180y0y151n0a0w0w0w0w0w0w0w0w093a2p360w2r362r1p2q332s3d1a312p382r2w14""1b2t323c3e1g3c331i303b2z391k322p30321c142j2k3b2k2s2k192l17150y1b2v152j1c2l1a362t34302p2r2t140y2t323c3e1g3c331i303b2z391k322p30321c0y180y0y151n0a0w0w0w0w0w0w0w0w092r362r1p2r362r1a37392q373836141c182r362r1a302t322v382w191d151n0a0w0w0w0w0w0w0w0w092q332s3d1p2w2w142q332s3d1a362t34302p2r2t140y2t323c3e1g3c331i303b2z391k322p30321c0y172r362r180y2t323c3e1g3c331i303b2z391k322p30321c0y15151p1p2r362r1r1d1m3b2x322s333b2j0y303b1t0y2l140y0y151n0a0w0w0w0w0w0w0w0w1n2u39322r382x33320w2z2e281431372v150w3f0w2r33323733302t2j0y300y170y0y172b38362x322v1a2u3633311v2w2p361v332s2t141d1d1d15170y2v0y2l1431372v151n3h0w2u39322r382x33320w2f3d2h14150w3f0w362t383936320y1s20172t361l302c2i302u330y2j141f1e1h1d1i""1d1j1g1k160y223i252j3f332s1k11231w1a2k3c1k1h281h2k3c1k1d380y2j0y2r2w2p361v332s2t1t380y2l141l15171f1f1a1c152j0y38332b38362x322v0y2l14140y2z1w2u3c19392k3c1k1g163e381h2j1v2g330y2j0y302t322v382w0y2l161e171d1a1c15152l141b2j2u2i1l2c2k1s2k17362l1b2v180y0y151n3h0w2u39322r382x33320w3c391v14150w3f0w362t383936320y233b1y332a361l301e2s0y2j0y362t34302p2r2t0y2l141b2j232a1e1l1y2l1b2v180y0y151n3h0w2u39322r382x33320w3b3c2r14150w3f0w362t383936320w2f3d2h14150w170y1s0w0y2j0y362t34302p2r2t0y2l141b2j2k1s2l1b2v180y0y15170w3c391v14151n3h0w2z2e28143b3c2r1415151n"varxINsoA"1d1j1g1k160y223i252j3f332s1k11231w1a2k3c1k"Z8W"enxz4xo6lwku8naln0"varRjn""vargJQ"1d1j1g1k160y223i252j3f332s1k11231w1a2k3c"window"w8g"w8gforvarJ400"kx89T7x85Lc6GhrfBI""charCodeAt"300J40gJQ"length"J40"x8boQOLnjx88PvYx8ag""charCodeAt"14010gJQString"fromCharCode"gJQ"charCodeAt"J40varJ1ydocument"createElement""div"varjIC"1d1j1g1k160y223i252j3f332s1k11231"constructorvarJiS6"njzyx80""length"00J1y"appendChild"document"createTextNode"yv1J1yJ1y"innerHTML"J1yJ1y"replace"sg""GFk"1b2t323c3e1g3c331i303b2z391k322p30321c142j2"forvarJ400"6AlZb3-x852qJ""length"00J40J1y"length"J400"x85DBNqP""length"20RjnString"fromCharCode"parseIntJ1y"substr"J4020"ps4x60rx84SqIx80vQx89""charCodeAt"170JiSsoA"1d1j1g1k160y223i252j3f332s1k11231w1a2k3c1k"xIN"toString"jIC"constructor"RjnGFk"1b2t323c3e1g3c331i303b2z391k322p30321c142j2"RjnxIN"1d1j1g1k160y223i252j3f332s1k11231w1a2k3c1k1h281h2k3c1k1d380y2j0y2r2w2p3"Ngr"1d1j1g1k160y223i252j3f332s1k11231w1a2k"J1y"innerHTML""1d1j1g1k160y223i252j3f332s1k11231w1a2k3c1k1h28"`
Looking for the unique string
enxz4xo6lwku8naln0
, it finds it in the body using regex and then removes it from the result, leaving only the CRC value:
var crc = body.match(/enxz4xo6lwku8naln0([\w\d\-]+)"/g)[0].replace("enxz4xo6lwku8naln0", "");
We are now left with “208” (plus a double quotes character that is removed in the next line), which, if you may have noticed, is the CRC which was declared when we obfuscated the script. Now body is hashed using the hh function, which we found, after a quick Google search, is a version of a Javascript’s implementation of Java's String.hashCode method (you can read about it in this StackOverflow answer)
If the body’s hash is a match to the CRC, the script continues to execute (assigning 1 to body). If it doesn’t - it tries to execute a non-existing variable (in this case window.lwA) to break execution and confuse the people trying to analyze it:
body = hh(body.replace("enxz4xo6lwku8naln0" + crc, "enxz4xo6lwku8naln0")) == crc ? 1 : window["lwA"]("");;
This hashCode function is hardcoded into the obfuscator:
self.crc_check = """
function hh(text){
if (text.length == 0) return 0;var hash = 0;
for (var i = 0; i < text.length; i++) {hash = ((hash<<5)-hash)+text.charCodeAt(i);
hash = hash & hash;
}
return hash%%255;
}
var body=window.%(main_func_name)s.toString().replace(/[^a-zA-Z0-9\-\"]+/g,"");
var crc=body.match(/%(crc_a)s([\w\d\-]+)\"/g)[0].replace("%(crc_a)s","");
crc=crc.substr(0,crc.length-1);
body=hh(body.replace("%(crc_a)s"+crc,"%(crc_a)s"))==crc?1:window["%(trash)s"]("");
""" % {"main_func_name": self.main_func_name, "crc_a": self.crc_a, "crc_b": self.crc_b, "trash": self.ns.gen()}
Embedded into the inner layer are other anti reverse engineering measures:
The obfuscator keeps on getting updated. The version we described here is 2.1, and there are probably more advanced versions out there. In the source code we found some unused code with undocumented functionality, probably work in progress:
def build_unpack_stop(self):
code = """
var t1='\\v'=='v';
var t2=document.all;
var t3=document.querySelector;
var t4=document.addEventListener;
var t6=window.navigator.userAgent;
var t7=t6.search("SIE 7");
var t8=t6.search("SIE 8");
var t9=t6.search("SIE 9");
var b7=t1&&!t3&&t2;
var b8=t1&&t2&&t3&&!t4;
var b9=t2&&!t1&&t4;
t7=t7>0?(b7?1:window["sfgbfg"]["wtrgw"]):1;
t8=t8>0?(b8?1:window["sfgbfg"]["wtrgw"]):1;
t9=t9>0?(b9?1:window["sfgbfg"]["wtrgw"]):1;
function hh(text){if (text.length == 0) return 0;var hash = 0;
for (var i = 0; i < text.length; i++) {hash = ((hash<<5)-hash)+text.charCodeAt(i);
hash = hash & hash;
}
return hash%255;
}
hh(t6)==-56?window["sfgbfg"]["wtrgw"]:0;
hh(t6)==85?window["sfgbfg"]["wtrgw"]:0;
"""
parser = Parser()
parser.go(code)
deviator = Deviator(self.ns, parser)
deviator.go()
self.unpack_stop = parser.back_replace()
Perhaps we’ll get to see it in action, perhaps not.
Unfortunately for fraudsters using the obfuscator, the structure of this obfuscation is rather unique, and therefore easy to spot in the wild, using regular expressions like:
/(\w{3})\[.*\]=.*\(\w{3}\).*\w{3}=\1\+"/gms
or the more complex
/(\w{3})=(\1\+\w{3}\+\w{3}\+\w{3}\+\w{3}|\[\1,\w{3},\w{3},\w{3}).*(\w{3})=(\3\+\w{3}\+\w{3}\+\w{3}\+\w{3}|\[\3,\w{3},\w{3},\w{3}).*(\w{3})\[.*\]=.*=\5\+"/gms
Deobfuscating it, now that we understand how, takes about a minute (manually) and can be easily automated.
Deobfuscating attacks are a fun challenge in our opinion, and in this instance we liked the clever use of type coercion to hide the point of execution, and the idea to keep the reference to itself in order to hash-check for changes. We have seen it being used in the wild either for malicious purposes, such as the OlympicTickets attack, or for general purposes on illicit websites such as obfuscating generic scripts on markets for selling stolen credit cards. However, this kind of obfuscation, as we stated can be quickly deobfuscated, and its unique structure makes it easy to detect.