Tech & Engineering Blog

Magecart and the PCI-DSS 4.0 Challenge: Is ChatGPT the Answer?

Read time: 16 minutes

Ben Baryo

August 14, 2023

Magecart, Technology and Engineering

Magecart and the PCI-DSS 4.0 Challenge: Is ChatGPT the Answer?

Being a site admin is a bit like juggling flaming swords, especially with the deadlines for PCI-DSS v4.0 compliance fast approaching. It’s a high-stakes game of cat and mouse, where you’re constantly on the lookout for Magecart skimmers hidden amidst the labyrinth of JavaScript files, ready to pounce on unsuspecting prey. What if there was an unlikely solution to the problem?.

ChatGPT has been touted as the Swiss Army knife of AI, capable of tackling a wide array of tasks. So, why not let it take a stab at detecting Magecart code? But remember, while it might be sharp and pointy, it’s not a magic wand, which isn’t supposed to be sharp. But it is pointy. And magical. Nevermind, I digress. Let’s dive in and see if it can make a splash in the deep end of web security.

Let’s Start at the Beginning – What Do We Want?

We need to define to ourselves what we want and how we want it. We want to have ChatGPT act as a professional security researcher analyzing JS code for malicious content, specifically Magecart. That’s the what we want. As for the how, we can simply open a new chat, tell ChatGPT we want to analyze JS code for Magecart threats, and then paste in the code we want it to review. But there’s a limit to how long the scripts we paste can be, and if we need it for more than 2 scripts, it starts being tedious. So what we really need is an automation of sorts: A Python script that takes a url or a local JS file, and sends it to ChatGPT for analysis.

Importantly, we need to take everything ChatGPT tells us with a grain of salt; the ability to analyze code is there, but if we play around with it for a bit, we’ll quickly find that it’s better to double check and ask for a breakdown / sources, just to be on the safe side.

To make the determination of how likely a script is to be Magecart, it’s best that we ask for a range, i.e. a risk score. Since we want to be able to understand what the score is based on, we’ll also ask for a breakdown of the score with reasons for either convicting the script as Magecart, or exonerating it from suspicion.

Crafting the Instructions…Promptly

When it comes to using ChatGPT, the key to success lies in the prompt. A well-crafted prompt can be the difference between a clear, insightful analysis and a confusing jumble of words. But what makes a good prompt?

Firstly, it needs to be clear and concise. ChatGPT is a language model, not a mind reader. It needs explicit instructions on what you want it to do. For our purposes, we want it to analyze a JavaScript file and determine if it’s a Magecart attack or a benign script.

Secondly, it needs to specify the format of the response. Since we’re building an API, we want a JSON-only response with three fields:

score: on a scale of 1-100, how likely it is that the script is a Magecart attack.
convictions: an array of objects, each with score and reason fields for why this script is suspicious and how much weight this reason holds in the final score.
exonerations: a similar array with reasons and score fields detailing reasons this script probably isn’t a Magecart attack, and their weights.

For clarity’s sake, this is the type of response we’re expecting to get:

{
“score”: 99,
“convictions”: [
{
“score”: 100,
“reason”: “It says it is a Magecart script right at the top”
}
],
“exonerations”: [
{
“score”: 1,
“reason”: “it says at the bottom not to believe what is written at the top”
}
]
}

Here’s an example of a prompt that encompasses all these instructions. Notice that we’ve tried to be as explicit as possible regarding the requirements and our expectations:

You are taking on the role of a professional client-side security researcher. Your task is to analyze JavaScript code and determine the likelihood of it being malicious, specifically in the context of Magecart type attacks. You will be provided with JavaScript code, which might be obfuscated. In response, will provide a JSON object of the following structure:
– A key of `score` with a value representing the likelihood of the code being a Magecart attack on a scale of 1 to 100, where 100 is most likely.
– A key of `convictions` with a value being an array of objects. Each object should contain a `reason` and `score` key, the `reason` being why the script is likely to be malicious and the `score` being the weight of that reason on the total score.
– A key of `exonerations`, with a structure similar to the `convictions` key value, but the reasons are why the script is *not* likely to be malicious, with its appropriate weight in the `score` field.
The analysis of the code should disregard execution context. Do not make any assumptions based on strings, identifier names, or comments found in the code, nor on the domain name, path, or query strings of the script’s url itself. The analysis should only rely on actions performed in the script, APIs mentioned, and execution flow.
Following these instructions you will be provided with the JS code for analysis.
Refrain from providing any output other than the JSON.

If the initial response from ChatGPT is unsatisfactory, don’t despair. The beauty of this process is that we can iterate and refine our prompt until we get the results we want. Remember, even the best chocolate makers tweak their recipes until they get the perfect blend of flavors.

In our experiment here, we’ll use the paid version of ChatGPT 4 (July 20 version) with the following plugins enabled – VoxScript, Metaphor Search, and Link Reader. These are used to allow ChatGPT access to any online resource it deems necessary for the process.

Let’s put this prompt to the test with a toy example – a simple Magecart attack that we’ve reproduced from a real world attack:

Here’s what we get in return:

{
“score”: 95,
“convictions”: [
{
“reason”: “The script uses the AJAX method to send data to an external URL, which is a common technique used in Magecart attacks to exfiltrate data.”,
“score”: 35
},
{
“reason”: “The script is targeting form input fields, specifically those that could contain sensitive information such as credit card details. This is a common behavior of Magecart attacks.”,
“score”: 30
},
{
“reason”: “The script is triggered on a mouseover event on the submit button, which could be a tactic to exfiltrate data before form submission.”,
“score”: 30
}
],
“exonerations”: [
{
“reason”: “The script does not directly manipulate the Document Object Model (DOM) or inject any elements, which is a common behavior in Magecart attacks.”,
“score”: 5
}
]
}

It’s pretty good for a first try, but a couple of things stand out:

We know that it’s a Magecart attack, so the score should be 100.
It’s taking into account common behaviors that are not really indicative of Magecart attacks, like obfuscation.
It deducts score based on specific techniques it expects to find, making it more of an issue with unmet expectations rather than an objective analysis.

Just like when trying to improve a Google search query to return more accurate results, it’s an iterative process, perfecting the prompt to get the desired output. Let’s try making a few improvements based on what we’ve observed, while also narrowing down the definition of what is important to us when detecting a Magecart attack, i.e. skimming and exfiltration (the changes are highlighted):

You are taking on the role of a professional client-side security researcher. Your task is to analyze JavaScript code and determine the likelihood of it being malicious, specifically in the context of Magecart type attacks. You will be provided with JavaScript code, which might be obfuscated. In response, will provide a JSON object of the following structure:
– A key of `score` with a value representing the likelihood of the code being a Magecart attack on a scale of 1 to 100, where 100 is most likely.
– A key of `convictions` with a value being an array of objects. Each object should contain a `reason` and `score` key, the `reason` being why the script is likely to be malicious and the `score` being the weight of that reason on the total score.
– A key of `exonerations`, with a structure similar to the `convictions` key value, but the reasons are why the script is *not* likely to be malicious, with its appropriate weight in the `score` field.
The analysis of the code should disregard execution context. Do not make any assumptions based on strings, identifier names, or comments found in the code, nor on the domain name, path, or query strings of the script’s url itself. The analysis should only rely on actions performed in the script, APIs mentioned, and execution flow. Do not consider commonly used techniques such as obfuscation to be conviction indicators, nor should you consider the lack of it and other techniques and behaviors commonly used in Magecart attacks an exonerating indicator. What constitutes a Magecart attack is the existence of the following – skimming, i.e. reading user provided data, and exfiltration – sending that data somewhere. Only the lack of these two behaviors may exonerate the script.
Following these instructions you will be provided with the JS code for analysis.
Refrain from providing any output other than the JSON.

And the result after submitting the script with the new prompt:

{
“score”: 100,
“convictions”: [{
“reason”: “The script is attaching an event handler to a submit button that triggers an AJAX POST request to an external URL, which is a common behavior in Magecart attacks for data exfiltration.”,
“score”: 50
},
{
“reason”: “The script is collecting sensitive user input data (email, name, credit card number, expiration date, and CVV) and sending it to an external server, which is a clear sign of data skimming, a behavior associated with Magecart attacks.”,
“score”: 50
}
],
“exonerations”: []
}

Much better! Full disclosure: a couple of prompt attempts were made in between these two versions that gave worse results. It’s part of the issue of working with the black box that is ChatGPT – the butterfly effect a single word has on the overall result is something that needs to be acknowledged, so a single good result might not be a strong enough indicator of the overall feasibility of the process.

Let’s test the ability of ChatGPT to scan obfuscated scripts by supplying it with an obfuscated version of the same toy example:

{
“score”: 100,
“convictions”: [{
“reason”: “The script is listening for a mouseover event on a submit button, which is a common way to trigger data collection once user data has been entered.”,
“score”: 30
},
{
“reason”: “The script is collecting values from form fields, which could include sensitive information such as credit card details and personal information.”,
“score”: 35
},
{
“reason”: “The script is sending the collected data to a remote server via a POST request, which is a common method of data exfiltration.”,
“score”: 35
}
],
“exonerations”: []
}

Surprise! We got the same score but the reasons were divided differently. After asking ChatGPT about the differences, it has profusely apologized and stated that there was an “error in the analysis process” and the second result is the correct analysis. A great reminder for us to take the results with a grain of salt.

It should be noted that ChatGPT’s success in dealing with this example of an obfuscated script does not mean it can handle all types and levels of obfuscation.

Let’s up the difficulty and submit a real life example using an obfuscated Magecart attack appended to the end of an unrelated JS file (you can read the full analysis of the attack in “The Far Point of a Static Encounter”):

{
“score”: 1,
“convictions”: [],
“exonerations”: [{
“reason”: “The script does not contain any code that reads user-provided data.”,
“score”: 50
},
{
“reason”: “The script does not contain any code that sends data somewhere.”,
“score”: 50
}
]
}

Not surprising; ChatGPT was ill equipped to handle deobfuscation in such a large obfuscated script. If we submit the deobfuscated version (REstringer FTW) we get a much better analysis result:

{
“score”: 95,
“convictions”: [{
“reason”: “The script uses the ‘XMLHttpRequest’ object to send HTTP requests, which is a common method used in data exfiltration.”,
“score”: 40
},
{
“reason”: “The script uses ‘document.cookie’ which can be used to access sensitive user data.”,
“score”: 30
},
{
“reason”: “The script uses ‘addEventListener’ method to listen for user interactions, which can be used in skimming attacks.”,
“score”: 25
}
],
“exonerations”: [{
“reason”: “The script does not use any known malicious domains or IP addresses for data exfiltration.”,
“score”: 5
}]
}

Before we sign off on this version, let’s give it a benign script and verify that we’re not creating an overzealous Magecart conviction bot. Here’s the result of the analysis of our benign script – a generic Google Analytics snippet:

{
“score”: 5,
“convictions”: [],
“exonerations”: [{
“reason”: “The script does not contain any code that reads user-provided data.”,
“score”: 50
},
{
“reason”: “The script does not contain any code that sends data to a remote server.”,
“score”: 45
}
]
}

That’s fine, though giving the script 5 points without a single convicting reason seems a bit off.

To make things worse, if we obfuscate the script, we get a completely different picture:

{
“score”: 20,
“convictions”: [{
“reason”: “The script manipulates the Google Analytics object and its methods, which could be used to intercept data.”,
“score”: 10
},
{
“reason”: “The script manipulates the ‘dataLayer’ object and its ‘push’ method, which could be used to intercept data.”,
“score”: 10
}
],
“exonerations”: [{
“reason”: “The script does not contain any explicit network requests, which are typically used in Magecart attacks for data exfiltration.”,
“score”: 60
},
{
“reason”: “The script does not seem to interact with form elements or listen for keyboard events, which are common techniques for skimming user input.”,
“score”: 20
}
]
}

The score is still pretty low, if we were to get such a result in our pipeline, would we have spent time analyzing the script ourselves? If we didn’t ask ChatGPT to only provide us with a JSON response, it would surely have reminded us that it’s not a security tool and should not be relied upon as such.

Use The ChatGPT to Automate The ChatGPT

Now that we have our prompt, let’s turn our attention to the Python script that will download the JavaScript file and submit it along with the prompt to ChatGPT. Hey Siri ChatGPT, create a Python script that provides the previously submitted prompt as well as downloads and submits the script found in the url to ChatGPT, and displays the JSON response.

>> Sure, here is a Python script that uses the requests library to download the JavaScript file from the provided URL and then submits it to the ChatGPT API for analysis. Please note that you need to replace YOUR_OPENAI_API_KEY with your actual OpenAI API key.

import openai
import requests

# Initialize the OpenAI SDK with your API key
openai.api_key = ‘YOUR_OPENAI_API_KEY‘

def send_prompt_to_chatgpt(prompt, js_code):
# Construct the full prompt with the attached JS code
full_prompt = f”{prompt}nnAttached JS Code:n{js_code}”

# Use the OpenAI SDK to send the prompt to ChatGPT-4
response = openai.Completion.create(
model=“text-davinci-002”,
prompt=full_prompt,
max_tokens=150
)

# Return the response text
return response.choices[0].text.strip()

# Download the JavaScript file
url = “https://gist.githubusercontent.com/krautface/469fa925b494b7b436e1fd9346d36b60

/raw/fb15cdad05b45fda090cfd7d7347001dc064d9ab/staticounter.net.js”
response = requests.get(url)
js_code = response.text
prompt = “You are taking on the role of a professional client-side security researcher.
Your task is to analyze JavaScript code and determine the likelihood of it being
malicious, specifically in the context of Magecart type attacks. You will be provided
with JavaScript code, which might be obfuscated. In response, will provide a JSON
object of the following structure: n- A key of `score` with a value representing the
likelihood of the code being a Magecart attack on a scale of 1 to 100, where 100 is
most likely.n- A key of `convictions` with a value being an array of objects. Each
object should contain a `reason` and `score` key, the `reason` being why the script is
likely to be malicious and the `score` being the weight of that reason on the total
score. n- A key of `exonerations`, with a structure similar to the `convictions` key v
alue, but the reasons are why the script is *not* likely to be malicious, with its
appropriate weight in the `score` field. nThe analysis of the code should disregard
execution context. Do not make any assumptions based on strings, identifier names,
or comments found in the code, nor on the domain name, path, or query strings of
the script’s url itself. The analysis should only rely on actions performed in the script,
APIs mentioned, and execution flow. Do not consider commonly used techniques
such as obfuscation to be conviction indicators, nor should you consider the lack of
it and other techniques and behaviors commonly used in Magecart attacks an
exonerating indicator. What constitutes a Magecart attack is the existence of the
following – skimming, i.e. reading user provided data, and exfiltration – sending that
data somewhere. Only the lack of these two behaviors may exonerate the script.n
Following these instructions you will be provided with the JS code for analysis.n
Refrain from providing any output other than the JSON.”

response = send_prompt_to_chatgpt(prompt, js_code)

# Print the response
print(json.dumps(response.json(), indent=2))

And there you have it! A Python script that uses ChatGPT to analyze JavaScript files for potential Magecart attacks. There were a couple of errors in the original script provided by ChatGPT that needed fixing. Even while coaxing it in the direction of the correct solutions for the mistakes in the code, it eventually gave up and wrote “If you’re still having trouble, it’s possible that OpenAI has released new models or updated their API after my last training data in September 2021. In that case, I would recommend checking the OpenAI API documentation for the most up-to-date information.”

It’s not a silver bullet, but it’s a start. And remember, in the fight against Magecart, every bit helps.

Free vs Fee: A Tale of Two GPTs

Initially, I had planned to pit the two tiers against each other by holding a grand GPT showdown. However, to avoid derailing our main discussion, I’ve decided to put the showdown on hold. If this topic sparks your curiosity, your input would be most welcome.

No More Chit-Chat: Why Real Security Needs HUMAN’s Client-Side Defense

ChatGPT-based analysis can be likened to a flashlight in the dark – it’s great when you’re starting from scratch and need to illuminate potential threats in your JavaScript files. It can provide a basic level of insight, helping you identify scripts that might be worth a second look. But when it comes to serious security and compliance, relying solely on ChatGPT, inconsistency over time aside, is akin to bringing a knife to a wizard’s duel. It’s simply not equipped to handle the complex and evolving landscape of web security threats. Just like a wizard.

For those who are serious about security and compliance with PCI DSS 4.0’s requirements 6.4.3 and 11.6.1, HUMAN’s Client Side Defense & PCI DSS Compliance offers a far more user-friendly & comprehensive solution. By deploying a single line of code, customers receive a risk-scored inventory of all scripts, a simple method to authorize and justify scripts, alerts on unauthorized changes, and audit reports to demonstrate compliance. HUMAN Security doesn’t just skim the surface with static code analysis; it dives deep, capable of analyzing both obfuscated and unobfuscated code with advanced techniques. But where it truly shines is in its behavior-based detection. Rather than just looking at the code, it considers what’s actually happening on the site. It’s like having a security guard who doesn’t just check IDs at the door, but keeps an eye on what’s happening inside the party.

And the cherry on top? Client-Side Defense doesn’t just detect threats; it actively mitigates them. It can block an entire script from loading or prevent specific actions that breach PCI-DSS compliance, such as reading user input or exfiltrating data. These are the building blocks of a Magecart attack, and Client-Side Defense is like a skilled construction worker, ready to spot and fix any structural issues before they become a problem.

So, while ChatGPT and traditional scanning techniques might be handy tools to have in your security toolkit, they’re not the be-all and end-all solution. When it comes to serious security and compliance, you’ll need a more robust and comprehensive solution, like HUMAN’s Client-Side Defense. After all, when it comes to protecting your site, it’s better to be safe than sorry!

Magecart and the PCI-DSS 4.0 Challenge: Is ChatGPT the Answer?

Let’s Start at the Beginning – What Do We Want?

Crafting the Instructions…Promptly

Use The ChatGPT to Automate The ChatGPT

Free vs Fee: A Tale of Two GPTs

No More Chit-Chat: Why Real Security Needs HUMAN’s Client-Side Defense

Spread the Word

Platform

Solutions

Industries

Learn

Company

Contact Us

Products

Partners