Being a site admin is a bit like juggling flaming swords, especially with the deadlines for PCI-DSS v4.0 compliance fast approaching. It's a high-stakes game of cat and mouse, where you're constantly on the lookout for Magecart skimmers hidden amidst the labyrinth of JavaScript files, ready to pounce on unsuspecting prey. What if there was an unlikely solution to the problem?.
ChatGPT has been touted as the Swiss Army knife of AI, capable of tackling a wide array of tasks. So, why not let it take a stab at detecting Magecart code? But remember, while it might be sharp and pointy, it's not a magic wand, which isn’t supposed to be sharp. But it is pointy. And magical. Nevermind, I digress. Let's dive in and see if it can make a splash in the deep end of web security.
We need to define to ourselves what we want and how we want it. We want to have ChatGPT act as a professional security researcher analyzing JS code for malicious content, specifically Magecart. That’s the what we want. As for the how, we can simply open a new chat, tell ChatGPT we want to analyze JS code for Magecart threats, and then paste in the code we want it to review. But there’s a limit to how long the scripts we paste can be, and if we need it for more than 2 scripts, it starts being tedious. So what we really need is an automation of sorts: A Python script that takes a url or a local JS file, and sends it to ChatGPT for analysis.
Importantly, we need to take everything ChatGPT tells us with a grain of salt; the ability to analyze code is there, but if we play around with it for a bit, we’ll quickly find that it’s better to double check and ask for a breakdown / sources, just to be on the safe side.
To make the determination of how likely a script is to be Magecart, it’s best that we ask for a range, i.e. a risk score. Since we want to be able to understand what the score is based on, we’ll also ask for a breakdown of the score with reasons for either convicting the script as Magecart, or exonerating it from suspicion.
When it comes to using ChatGPT, the key to success lies in the prompt. A well-crafted prompt can be the difference between a clear, insightful analysis and a confusing jumble of words. But what makes a good prompt?
Firstly, it needs to be clear and concise. ChatGPT is a language model, not a mind reader. It needs explicit instructions on what you want it to do. For our purposes, we want it to analyze a JavaScript file and determine if it's a Magecart attack or a benign script.
Secondly, it needs to specify the format of the response. Since we’re building an API, we want a JSON-only response with three fields:
For clarity’s sake, this is the type of response we’re expecting to get:
{ |
Here's an example of a prompt that encompasses all these instructions. Notice that we’ve tried to be as explicit as possible regarding the requirements and our expectations:
You are taking on the role of a professional client-side security researcher. Your task is to analyze JavaScript code and determine the likelihood of it being malicious, specifically in the context of Magecart type attacks. You will be provided with JavaScript code, which might be obfuscated. In response, will provide a JSON object of the following structure: |
If the initial response from ChatGPT is unsatisfactory, don't despair. The beauty of this process is that we can iterate and refine our prompt until we get the results we want. Remember, even the best chocolate makers tweak their recipes until they get the perfect blend of flavors.
In our experiment here, we’ll use the paid version of ChatGPT 4 (July 20 version) with the following plugins enabled - VoxScript, Metaphor Search, and Link Reader. These are used to allow ChatGPT access to any online resource it deems necessary for the process.
Let's put this prompt to the test with a toy example - a simple Magecart attack that we’ve reproduced from a real world attack:
(function() { |
Here’s what we get in return:
{ |
It’s pretty good for a first try, but a couple of things stand out:
Just like when trying to improve a Google search query to return more accurate results, it’s an iterative process, perfecting the prompt to get the desired output. Let’s try making a few improvements based on what we’ve observed, while also narrowing down the definition of what is important to us when detecting a Magecart attack, i.e. skimming and exfiltration (the changes are highlighted):
You are taking on the role of a professional client-side security researcher. Your task is to analyze JavaScript code and determine the likelihood of it being malicious, specifically in the context of Magecart type attacks. You will be provided with JavaScript code, which might be obfuscated. In response, will provide a JSON object of the following structure: |
And the result after submitting the script with the new prompt:
{ |
Much better! Full disclosure: a couple of prompt attempts were made in between these two versions that gave worse results. It’s part of the issue of working with the black box that is ChatGPT - the butterfly effect a single word has on the overall result is something that needs to be acknowledged, so a single good result might not be a strong enough indicator of the overall feasibility of the process.
Let’s test the ability of ChatGPT to scan obfuscated scripts by supplying it with an obfuscated version of the same toy example:
{ |
Surprise! We got the same score but the reasons were divided differently. After asking ChatGPT about the differences, it has profusely apologized and stated that there was an “error in the analysis process” and the second result is the correct analysis. A great reminder for us to take the results with a grain of salt.
It should be noted that ChatGPT’s success in dealing with this example of an obfuscated script does not mean it can handle all types and levels of obfuscation.
Let’s up the difficulty and submit a real life example using an obfuscated Magecart attack appended to the end of an unrelated JS file (you can read the full analysis of the attack in “The Far Point of a Static Encounter”):
{ |
Not surprising; ChatGPT was ill equipped to handle deobfuscation in such a large obfuscated script. If we submit the deobfuscated version (REstringer FTW) we get a much better analysis result:
{ |
Before we sign off on this version, let’s give it a benign script and verify that we’re not creating an overzealous Magecart conviction bot. Here’s the result of the analysis of our benign script - a generic Google Analytics snippet:
{ |
That’s fine, though giving the script 5 points without a single convicting reason seems a bit off.
To make things worse, if we obfuscate the script, we get a completely different picture:
{ |
The score is still pretty low, if we were to get such a result in our pipeline, would we have spent time analyzing the script ourselves? If we didn’t ask ChatGPT to only provide us with a JSON response, it would surely have reminded us that it’s not a security tool and should not be relied upon as such.
Now that we have our prompt, let's turn our attention to the Python script that will download the JavaScript file and submit it along with the prompt to ChatGPT. Hey Siri ChatGPT, create a Python script that provides the previously submitted prompt as well as downloads and submits the script found in the url to ChatGPT, and displays the JSON response.
>> Sure, here is a Python script that uses the requests library to download the JavaScript file from the provided URL and then submits it to the ChatGPT API for analysis. Please note that you need to replace YOUR_OPENAI_API_KEY with your actual OpenAI API key.
import openai |
And there you have it! A Python script that uses ChatGPT to analyze JavaScript files for potential Magecart attacks. There were a couple of errors in the original script provided by ChatGPT that needed fixing. Even while coaxing it in the direction of the correct solutions for the mistakes in the code, it eventually gave up and wrote “If you're still having trouble, it's possible that OpenAI has released new models or updated their API after my last training data in September 2021. In that case, I would recommend checking the OpenAI API documentation for the most up-to-date information.”
It's not a silver bullet, but it's a start. And remember, in the fight against Magecart, every bit helps.
Initially, I had planned to pit the two tiers against each other by holding a grand GPT showdown. However, to avoid derailing our main discussion, I've decided to put the showdown on hold. If this topic sparks your curiosity, your input would be most welcome.
ChatGPT-based analysis can be likened to a flashlight in the dark - it's great when you're starting from scratch and need to illuminate potential threats in your JavaScript files. It can provide a basic level of insight, helping you identify scripts that might be worth a second look. But when it comes to serious security and compliance, relying solely on ChatGPT, inconsistency over time aside, is akin to bringing a knife to a wizard’s duel. It's simply not equipped to handle the complex and evolving landscape of web security threats. Just like a wizard.
For those who are serious about security and compliance with PCI DSS 4.0’s requirements 6.4.3 and 11.6.1, HUMAN's Client Side Defense & PCI DSS Compliance offers a far more user-friendly & comprehensive solution. By deploying a single line of code, customers receive a risk-scored inventory of all scripts, a simple method to authorize and justify scripts, alerts on unauthorized changes, and audit reports to demonstrate compliance. HUMAN Security doesn't just skim the surface with static code analysis; it dives deep, capable of analyzing both obfuscated and unobfuscated code with advanced techniques. But where it truly shines is in its behavior-based detection. Rather than just looking at the code, it considers what's actually happening on the site. It's like having a security guard who doesn't just check IDs at the door, but keeps an eye on what's happening inside the party.
And the cherry on top? Client-Side Defense doesn't just detect threats; it actively mitigates them. It can block an entire script from loading or prevent specific actions that breach PCI-DSS compliance, such as reading user input or exfiltrating data. These are the building blocks of a Magecart attack, and Client-Side Defense is like a skilled construction worker, ready to spot and fix any structural issues before they become a problem.
So, while ChatGPT and traditional scanning techniques might be handy tools to have in your security toolkit, they’re not the be-all and end-all solution. When it comes to serious security and compliance, you'll need a more robust and comprehensive solution, like HUMAN's Client-Side Defense. After all, when it comes to protecting your site, it's better to be safe than sorry!