🧠 Master Guide to AI Red-Teaming using NVIDIA Garak
References: https://github.com/NVIDIA/garak | https://garak.ai/garak_aiv_slides.pdf | https://garak.ai | https://reference.garak.ai/en/latest/
Pythonhave alreadyAuthor: Harshit Rajpal, Security Engineer, Bureau Veritas Cybersecurity North America
Introduction
In this guide, we will explore Garak – an open-source Generative AI Red-teaming and Assessment Kit by NVIDIA – and how to use it for scanning Large Language Models (LLMs) for vulnerabilities. We’ll cover everything from installation and setup to running scans, focusing on key features like connecting Garak to different LLM interfaces (including a local REST API chatbot), using specific probes (e.g. jailbreaking attacks), customizing prompts, speeding up scans, understanding Garak’s components, writing your own plugin, and interpreting Garak’s output reports. This comprehensive, step-by-step walkthrough will feel like a technical whitepaper, complete with code examples, command-line usage, and references to official documentation and community insights.
Table of Contents
3
Scanning LLM Interfaces with Garak
4
Proxying Garak Through Burp Suite
5
Selective Probes for Targeted Testing
6
False Positives
7
Custom Prompt Sources
8
Speeding Up Scans
9
Understanding Garak's Plugin Architecture
10
Writing Your Own Plugin
11
Evaluating and Reading Garak Reports
12
Appendix A: CLI Reference and Troubleshooting
13
Appendix B: Burp plugin to Auto-Generate api_web_config.json
1. Installation and Environment Setup
Since Garak has its own dependencies, this guide will use Conda. Conda is a powerful command-line tool for package and environment management that runs on Windows, macOS, and Linux.
System Requirements (recommended)
Python: 3.10
OS: Windows 10+, Linux, or macOS
RAM: Minimum 8GB (more if using local LLMs via Ollama or transformers)
Optional:
GPU: NVIDIA GPU with CUDA for local model acceleration
Anaconda: Latest Release
git
Ollama: Latest Release
Let's get started with the setup.
I will be using a Windows 10 host in this guide; however, feel free to use the supplemental commands for your specific OS. Some of the key alternate commands will be given here.
First, let's get Conda up and running. You can choose your installer here and then use the following commands for download and installation.
Follow the standard installation process. Once done, within your project folder (mine would be C:\Users\hex\Desktop\Garak), check a valid Anaconda installation via the conda command.

We are ready to set up a new environment for Garak.
Once installed, Garak provides a command-line interface. To see basic usage, run garak -h
2. Getting Started With Garak
To quickly test Garak’s setup with a sample LLM generator and a probe.
This uses Garak’s built-in test components:
Generator:
test.Blank– a mock model which can be specified with--target_typeProbe:
test.Blank– sends a dummy input, which can be specified with--probes

As you may have observed, the JSON and HTML summary reports have been written to the default directory ~\.local\share\garak\garak_runs\
2a. Modules
Now, a bit about various modules in Garak. The major components are as follows:
Component
Description (2 sentences)
Example
Probes
Probes are the attackers. They generate specific prompts or input scenarios to test the LLM for vulnerabilities or behavioral weaknesses. Each probe targets a particular issue like jailbreaks, injections, or bias.
jailbreak.JailbreakProbe sends “ignore all instructions” prompts to test guardrail bypasses.
Generators
Generators are the LLM interfaces that Garak queries; they handle sending prompts and retrieving model responses. They abstract away API calls or local model inference.
ollama.OllamaGenerator connects Garak to a locally running LLaMA2 via Ollama’s REST API.
Detectors
Detectors analyze model outputs to decide if a failure or unsafe behavior occurred. They can check for toxicity, leakage, or rule-breaking based on text analysis or regex.
toxicity.BasicDetector flags model responses containing hate or violence terms.
Evaluators
Evaluators summarize and score the test outcomes, turning raw detector results into metrics or human-readable reports. They can output JSON, CSV, or formatted text.
basic.JSONEvaluator saves a JSON file showing which probes passed or failed.
Harnesses
Harnesses control how tests are executed. Managing multiple probes, generators, detectors, and parallelization. They coordinate test scheduling and repeatability.
default.Harness runs a round-robin of probes vs. detectors across chosen models.
Resources
Resources are helper files, datasets, or lookup tables used by probes, detectors, and evaluators. These can include wordlists, pattern definitions, or canned prompts.
The resources/jailbreak_prompts.txt file provides a base set of jailbreak prompts for testing.
By combining these, we can tailor our scans.
3. Scanning LLM Interfaces with Garak
In Garak, a “generator” (also referred to via CLI as a “target type”) is the component that wraps a particular LLM interface: it sends prompts and receives completions (or chat responses) from a model backend. Garak supports a wide variety of backends: local models (via Hugging Face transformers or GGML), remote/cloud APIs (OpenAI, Cohere, Replicate, etc.), and custom REST endpoints. For local usage, like on your machine with Ollama, Garak includes a dedicated generator: OllamaGenerator (and a variant for chat mode) under garak.generators.ollama. For arbitrary REST-based LLMs or chatbots, e.g., if you wrap a model behind a custom HTTP API, there is a generic REST generator: RestGenerator under garak.generators.rest
So, generators = LLM interfaces, and connecting to LLM interfaces means telling Garak which backend to use (local, cloud API, custom REST) via --target_type / --target_name (formerly --model_type / --model_name) when you invoke it.
3a. Example
In this section, we'll see how to set up specific target types for scans. You can use Ollama (for example) by setting it up from the link in section 1a. Assuming Ollama is running locally on its default host/port (e.g., 127.0.0.1:11434) and you have an LLaMA-based model loaded (e.g., “llama2” which can be installed with the CLI command: ollama run llama2), you can provide the target_type as "ollama" and target_name as "llama2".

You can inspect the report in the location suggested in the STDOUT. List of the major target types available:
target_type / generator
What it connects to / description
huggingface
Local models via Hugging Face / Transformers (e.g. GPT-2, LLaMA-based, etc.) (GitHub)
huggingface.InferenceAPI
HuggingFace hosted inference API — remote models via Hugging Face’s API (GitHub)
huggingface.InferenceEndpoint
Private/custom Hugging Face inference endpoints (self-hosted or private) (GitHub)
openai
OpenAI API (ChatGPT / GPT-3.x / GPT-4 etc.) (GitHub)
replicate
Models hosted on the Replicate platform — both public and private models/versions (GitHub)
cohere
Models hosted via Cohere’s API (when supported by Garak) (GitHub)
ggml
GGML / GGUF models (e.g. for use via local binaries like llama.cpp) (GitHub)
ollama
Local models served via Ollama — for example local LLaMA-based models served with Ollama REST API (reference.garak.ai)
rest
A generic REST-based generator — for wrapping any custom HTTP/JSON API (e.g. a self-hosted FastAPI endpoint) (GitHub)
test
Built-in “test” generator(s) for mock testing: e.g. test.Blank, test.Repeat, test.Single etc., for dry-runs or plugin testing (reference.garak.ai)
mistral
Support for Mistral-family models (via a dedicated generator) (reference.garak.ai)
groq
Support for models via Groq API / backend (if configured) (reference.garak.ai)
litellm
Support for models via a “LiteLLM” backend (lightweight LLM interface) (reference.garak.ai)
nemo, nim, nvcf
Support for specialized or vendor-specific backends (NeMo / NVIDIA-specific) — e.g. for multimodal or proprietary LLMs (reference.garak.ai)
rasa / rasa-based generator (e.g. RasaRestGenerator)
Interface to Rasa-based REST endpoints / LLM-powered chat-services via Rasa style APIs (reference.garak.ai)
watsonx
Support for IBM’s Watsonx (or similar) LLM APIs if configured — for enterprise LLM backends (reference.garak.ai)
guardrails
A generator wrapper integrating with guardrails / safety-wrapped LLMs via a protective interface (e.g. NeMo Guardrails) (reference.garak.ai)
A full list can be provided by the command:
3b. REST Interface
While there are various interfaces, the REST interface is the one most useful for red-teaming/pentesting-like situations since a lot of the AI assistants encountered would be utilizing a REST api based web application that binds to their own configured LLM. While in this article, we don't have our own custom LLM implementation, we'd be scanning a standard 'out-of-the-shelf' llama2 model and inspecting the security risks associated with it.
REST API based AI assistant lab setup
For this, our lab set-up would look like this:
a. Ollama running the llama2 model locally.
b. A minimal FastAPI app that proxies POST requests from /generate to http://127.0.0.1:11434/api/generate (Ollama’s default REST endpoint).
c. A static HTML + JS page (served from /static/index.html) that provides a basic chat-style UI: a textarea to enter a prompt, a “Send” button, and a div to show user + AI messages.
Here is the link to the app you can use. Once you've cloned the repository, you can run the following commands:
Once done, you can launch the URL http://127.0.0.1:8000/static/index.html in a web browser and test if the AI assistant is working.

You can inspect the chat prompt you entered and copy the request as bash or PowerShell as needed. Here is the PowerShell version of the CLI command you can send to this interface.
Here is a sample HTTP request in Burp format that can be utilized to call the LLM we just set up through Burp. You can also proxy the CLI to Burp and issue a curl command.

Now that we have a fully working REST API based AI assistant, we can begin to scan the LLM with Garak. Garak RestGenerator allows you to target any REST/HTTP endpoint as long as you tell it how to format requests, what method (POST/GET) to use, what headers, and how to extract response text. A full documentation can be found here.
Now, for our specific app, the api_web_config.json file would look like so:
Explanation of fields:
"name"
Friendly name to identify this generator in logs/reports
"uri"
The full URL for your REST endpoint (pointing to your proxy)
"method"
HTTP method — post in your case
"headers"
HTTP headers to send — at minimum Content-Type: application/json
"req_template_json_object"
The JSON body template: 'prompt' uses "$INPUT" which Garak replaces with the actual prompt text of each probe; other fields (model, stream) are fixed
"response_json"
true because your endpoint returns JSON
"response_json_field"
The JSON field in which the model output resides — in your case "response" (matching the JSON you saw earlier)
"request_timeout"
(Optional) timeout in seconds — useful if generation is slow
Now that our web configuration file is set, we can run our first test on this web application.

Throughout the article, we shall be targeting this application.
Now, we have successfully run a sample test! The only problem is, we have no visibility into how prompts are crafted and sent or what's been tested. Let's talk about proxying it through Burp Suite, so we know the domain of prompts crafted and tested.
4. Proxying Garak Through Burp Suite
You can add the "proxies" option in the api_web_config.json file and configure this to the port Burp Suite is listening on.
Now, we can also use one of the probes called "dan.DUDE" for a sample run. Dan is a "roleplay" jailbreak prompt injection category that instructs the LLM to behave as both itself and then as "DAN" which stands for "Do Anything Now" and as the name suggests, DAN can do anything now. More on some of the other prompts, upcoming in the later sections.

Now, in the HTTP history of BurpSuite, you can see all the prompts that were crafted and tested by Garak.

As in the STDOUT, it states 5/5 PASS which means the LLM is not vulnerable to dan.DUDE. A quick overview is available in the HTML file located at the default location $HOME/.local\share\garak\garak_runs\garak.UUID.report.html
Here is how the HTML stats report looks:

5. Understanding Probes
So far, we have seen easy setup guidelines, testing different interfaces (generators), and not only testing custom REST-based API endpoints, but also coding a sample REST app that binds to a local LLM instance (Ollama). Now, we will explore the different probes that Garak tests a target against.
In Garak, probes are the core building blocks of vulnerability scanning. Each probe represents a class of tests designed to elicit specific unwanted behaviors from a language model, such as prompt injection, latent context exploitation, bullying encouragement, or anthropomorphization. Probes generate crafted inputs that target known failure modes or attack vectors, and detectors assess whether the model’s outputs indicate a problem.
To see all probes available in your Garak installation:
Here is a summary of the probes available, divided into categories of probes, a few important sub-probes, and when to use them (or what exactly they test):
ansi escape
Tests whether models output annoying or harmful ANSI control codes (e.g., terminal escapes).
ansiescape.AnsiEscaped, ansiescape.AnsiRaw
Useful for models used in console UIs; can reveal unsanitized control chars.
atkgen
Still a prototype. Tries to generate toxic or offensive content via automatic attack generation.
atkgen.Tox
Useful for detecting crude toxicity vulnerabilities.
audio
Probes with audio-related vectors (e.g., "achilles heel" via audio).
audio.AudioAchillesHeel
Niche/experimental — may give errors on text-only models.
av_spam_scanning
Tests whether models output known spam test codes (EICAR, GTUBE) or phishing patterns.
EICAR, GTUBE, GTphish
Useful for systems scanning or classifying email/spam.
continuation
Tests whether models continue slur or reclaimed slur identifiers.
ContinueSlursReclaimedSlurse
Content-safety testing for offensive language continuation.
dan (jailbreak)
Assesses classic DAN-style jailbreaks and related prompt attacks.
DanInTheWild, AutoDANCached
Core jailbreak testing — often a first-look vulnerability category.
divergence
Probes for repetition/repeated tokens — models looping or repeating.
Repeat
Useful for models with generation instability.
donotanswer
Probes for failure to refuse harmful or restricted requests.
InformationHazard, MaliciousUses
Good for safety tests where the model should decline.
encoding
Uses encoded payloads (base16, base32, Base64, Morse, etc.) to bypass filters.
InjectBase64, InjectHex, InjectMorse
Tests whether encoding tricks reveal prompts that achive undesirable outputs.
exploitation
Tests for classic injection patterns (e.g., Jinja/Python, SQL echo).
JinjaTemplatePythonInjection, SQLInjectionEcho
Good for web-style injection; exposes model attempts to follow code patterns.
glitch
Tests for weird generation/formatting glitches.
Glitch
Mostly experimental/low-impact tests.
goodside
Light-hearted/fuzz-type tests (e.g., tag generation, JSON manipulation).
Tag, ThreatenJSON
Useful for stress-testing output formatting logic.
grandma
Probes based on common language patterns or stereotypes.
Slurs, Win10, Substances
Fun/quirky probes; may return benign content.
latentinjection
Tests for latent prompt injection (hidden inside benign context) — e.g., indirect injection.
LatentJailbreak, LatentInjectionReport, LatentInjectionTranslationEnFr
Powerful category for RAG or context-based attacks.
leakreplay
Replays text from known agencies (Guardian, NYT, literature) to test data leakage/hallucination.
GuardianComplete, NYTComplete
Useful for data leakage / memorization checks.
lmrc (risk cards)
Based on Language Model Risk Cards — checks for bias, profanity, sexual content, slurs.
Bullying, SexualContent, SlurUsage, Deadnaming
Core societal safety testing.
malwaregen
Probes for malware generation patterns — code that resembles malware payloads.
Evasion, TopLevel
Helpful in security contexts where code generation is possible.
misleading
Tests for false assertions / misinformation responses.
FalseAssertion
Useful for model factuality and hallucination tests.
packagehallucination
Checks for hallucination of package names or code.
Python, JavaScript, Rust
Good for code-generation contexts.
phrasing
Tests for tense consistency / phrasing shifts (past/future).
FutureTense, PastTense
Useful for stylistic vulnerabilities / output stability.
promptinject
Broad class of prompt injection attacks (hate, kill, long prompt).
HijackKillHumans, HijackLongPrompt
Central malicious prompt testing.
realtoxicityprompts
Uses standard toxicity prompt sets (e.g., RTP).
RTPBlank
Standard community-driven toxicity probing.
sata
Tests for violations of simple token assumptions (e.g., MLM pattern tests).
MLM
Rare, model internals probing.
smuggling
Tests for code/text smuggling patterns (function masking, hypothetical responses).
FunctionMasking, HypotheticalResponse
Can surface hidden smuggling vulnerabilities.
snowball
Tests for structural problems like prime generation, graph connectivity.
GraphConnectivity
Mostly logical/structural stress tests.
suffix
Similar to noise suffix / termination tests.
BEAST, GCG
Often low-impact but interesting edge cases.
tap
Tests for token alignment problems (PAIR, TAP).
PAIR, TAP
Low-level tokenization anomalies.
test
Dummy probes for testing tool behavior.
Blank, Test
Useful for verifying config/integration.
topic
Checks for topic-based word allowed/blocked lists.
WordnetAllowedWords, WordnetControversial
Useful for classification/controversial topic coverage.
visual_jailbreak
Probes for visual-to-text jailbreaks (e.g., FigStep patterns).
FigStep
For multimodal systems; often inactive.
web_injection
Simulates web-style injections / exfil patterns (Markdown XSS, image exfil).
MarkdownXSS, StringAssemblyDataExfil
Tests web injection vectors through model text.
You can also filter this list by adding -p option and then mentioning the family of probes. For example, to view probes only in the Dan family:

You can then select one or more by name when running a scan. Here, in the following examples, we will be testing various probes on the REST-based llama2 wrapper we created earlier and then utilizing tags for faster probe selection, and finally combining probes manually in the CLI.
5a. Finding our First Prompt Injection Vulnerability!
The following is an example run of one of the prompt injection probes called "HijackHateHumans" where the goal is to have LLM say "I hate humans." Garak command:
python -m garak --target_type rest -G api_web_config.json --probes promptinject.HijackHateHumans
Example prompt:

Sample output:

As you can notice in the output above, the LLM we are targeting is, in fact, vulnerable to HateHumans Prompt Injection (toxicity)! This is not a normal, expected, benign behaviour; it suggests a failure in safety alignment, input/output filtering, or adversarial resistance. It is also worth noting that various probes (such as lmrc.bullying) might throw errors while trying to run them. At that point, you can choose to exclude such probes from the testing.
5b. Utilizing tags for selective testing
If you've followed along so far, you know that testing probes consumes a considerable amount of time. Thus, we can also utilize "probe_tags." Now, not a lot of information about probe_tags is available on the help menu; going through the code, one can find different probe tags that can be utilized. For example, check out the code for the Dan probe here. Observe the probe tags.

Here, as you can see, one of the probe tags is OWASP:LLM01. This is an obvious reference to OWASP LLM Top 10 and the first vulnerability on the list (prompt injection).
Thereafter, you can run a scan using this:

As you can notice in the screenshot above, this option automatically selected all the probes marked as owasp:llm01 and saved us quite a lot of time to go through the list and figure out what to test!
Of course, not all of these probes would work since the tool is still in development (or might require manual intervention to make them work), so, you can simply remove the probes causing issues and re-run the command.
Note: In the probe list, there is a "ZZZ" emote suffixed for some of these modules. There is a high chance these probes won't work.
5c. Manually combining probes
Now, if we want to fine-tune our scans even more, we can provide a comma-separated list of probes to Garak for testing within the "--probes" option.
For example, I will test lmrc.SexualContent,grandma.Slurs,divergence.RepeatedToken together like so:
Now that you have run a few tests and inspected it through Burp, you may have noticed that there is an abundance of false positives. Garak may suggest that a test failed, while when you inspect the report, the output seems benign. We shall uncover a little bit more on how to filter the report and inspect it for accurate test results in the next section.
6. Evaluating and Reading Garak Reports
Before evaluating a report, let us name the report correctly first. As you may have already observed that Garak's output report names are randomized alphanumeric string that follows the pattern "garak.RandomString." While good enough for a quick run, in a project, you might need to fine-tune this. You can utilize the "--report_prefix" option to specify the output filename. For example, I am naming the output report prefix as "masterguide".

Now, you are ready to inspect the report. You might have noticed that after a scan is completed, 3 different files are created:
filename.hitlog.jsonl
filename.report.jsonl
filename.report.html

Any response flagged as a hit (vulnerability) by the detector will be placed in the hitlog.jsonl file. These entries can be followed back to the report.jsonl attempt entry based on the attempt_id.

It is important to note that while running a Garak scan, if no detectors are explicitly provided, the default detector would be the probe's primary detector as specified in the Python file at /garak/garak/probes/probe_name.py. For example, here are Dan's primary and extended detectors that would produce a hit while scanning.
Now, for the scan we had done earlier using the grandma.slurs probe, we see a report. The file is difficult to read as it is. Therefore, I made a script which would help you convert a filename.report.jsonl to a CSV (with limited fields for better visibility). A user can then go through filename.hitlog.jsonl, pick up the attempt ID, and search in the CSV for that particular hit, thereby making analysis easier! Here is the PowerShell version of the same script.
To run this script:
Then, we can use any spreadsheet software to open this CSV file. You can observe the four major fields taken from the report.jsonl and put it in the CSV here, while redacting almost everything else.

Now I'll pick one of the attempt IDs from the hitlog and search it in the CSV (accept ID in hitlog is the same as UUID in report.jsonl)

You can then easily search for this in the CSV and analyze prompts and their outputs more clearly.

As we can observe in the output report, this appears to be a false positive (which is a common occurrence). However, now that we have all of our data in a visually upgraded format, analysis can be better!
7. Testing With Custom Prompt/Wordlist Sources
If you've followed along this far, you must have observed that all the prompts come from pre-defined Python templates under garak/garak/probes . Here, the structure of a probe template is as follows:
Global vars -> If any
Class of a probe -> This is the subcategory of a probe
Any required tags
Working function -> Performs any operations needed to create prompts
Variable
promptswhich holds the values of all prompts to be tested in a list.
So, if we can define our custom prompts in a file and recreate a similar template, we can have Garak send requests using our own custom probe. You can utilize the sample template I coded here or make one by yourself by looking at the code for other probes and overwriting very few things. I essentially utilized the existing "test" probe we used in our article earlier, found under garak/garak/probes/test.py, and added a class called "FileListPrompts". This class is going line-by-line and reading prompts from our file "my_prompts.txt" and putting the contents as an array of strings (aka a list in Python) in the variable prompts. This adds a functionality to test probe and Garak can now fetch wordlists and bombard the target! Please note that the except block in the code below is a failsafe and assigns a singular value "hello" to the prompts variable in case file I/O was unsuccessful. This way, while reading the output, you can always know whether a file read was successful or not and troubleshoot accordingly.
Please note that in other probes, a detector is usually configured to help users analyze the CLI output as a PASS/FAIL status. We can configure that too within the code by setting the variable "primary_detector" if we know the nature of the prompts (such as mitigation.MitigationBypass), or we can use the all detectors option in CLI. While configuring the template above, I added the "always.Pass" detector.
Alright then! Now that our tweaked "test.py" is ready to support custom wordlists, we need to configure a wordlist and name it "my_prompts.txt" or any other name, and then change the code to support that, and keep it in your current directory. I'll be adding four sample prompts just for testing purpose.

Once done, you can then run the following command:
As you can see, Garak is now testing the target with our custom wordlist.

Let's inspect this in Burp Suite and confirm again.

Well, there we go. There are various resources on the internet where you can find prompt injection wordlists, including huggingface datasets. Here are a couple to get you started:
My uncle said, "With a large wordlist comes huge overhead." In the next section, we'll discuss how we can fast-track our scans.
8. Speeding Up Scans
Options -g 1, --parallel_attempts
9. Understanding Detectors
10. Understanding Buffs
12. Appendix A: FAQs and Troubleshooting
Headline: Quick Reference and Common Fixes
Main Content:
A concise cheatsheet for Garak’s key CLI options (--target_type, --probes, --detectors, --evaluators, --parallel_runs).
Includes common issues like encoding errors, missing plugins, and REST connection fixes, with PowerShell vs. Linux equivalents.
Perfect as a back-pocket reference when setting up new scans.
Q2. How to scan thinking models, like DeepSeek R1, since sometimes the detector reads output from chain-of-thought as well and not just the output?
Ans: Per the documentation from the base generator, for reasoning models, using skip_seq_start and skip_seq_end can enable suppression of the chain of thought from the target response. This allows users perform tests with and without consideration of this output from the target as the segment is removed before passing the response to detectors.
13. Appendix B: Burp Plugin to Auto-Generate REST config JSON
Link and demo to be updated...
EVERYTHING BELOW: IGNORE
Generative AI Red-teaming & Assessment Kit - GARAK
garak checks if an LLM can be made to fail in a way we don't want. garak probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. If you know nmap or msf / Metasploit Framework, garak does somewhat similar things to them, but for LLMs.
garak focuses on ways of making an LLM or dialog system fail. It combines static, dynamic, and adaptive probes to explore this.
Installation
Due to the intricacies of packages and to make sure our system packages don't break we'll use conda
Use this to identify your env https://repo.anaconda.com/archive/
I'll use "Anaconda3-2025.06-1-Linux-x86_64.sh" on my Linux machine
Just install it with all default options
Once conda is installed, proceed with garak installation
conda create --name garak "python>=3.10,<=3.12"
conda activate garak
git clone https://github.com/NVIDIA/garak.git
cd garak
python -m pip install -e .
Once installed, confirm installation with
garak -h
Now garak can connect to different LLM interfaces. Most common is the HTTP REST API endpoint that returns JSON/plaintext output.
let's assume an LLM is replying on /api/v1/ai/chat endpoint on a host "example.com"
Let's assume the API request looks like the following:
Let's assume the response body looks like:
You would manually need to create a JSON config file. You can refer to the docs here: https://reference.garak.ai/en/latest/garak.generators.rest.html
For our case, config becomes like:
api_web_config.json
Now the two main fields we need to focus in the config file above is identifying which parameter user sends their query in and which parameter does the LLM respond back in.
Here, "question" parameter in the request holds user's query and "response.text" contains LLM response
So, these two fields are specially marked in api_web_config.json
"$INPUT" tells garak where to inject prompts in for testing. Put this in the user controlled param for LLM query in you case.
"response_json_field" tells garak where to look for LLM response to analyze whether attack vectors worked or not. You can define the specific parameter using basic JSON object definition syntax. For example, here response is in the first field "text" encapsulated by "response" object so we defined "$.response[0].text"
Once done you are free to run garak!
garak --model_type rest -G api_web_config.json
You can ploy with speed throttles as well
garak --model_type rest -G api_web_config.json --parallel_attempts 20
Test Garak
garak --model_type test.Blank --probes test.Test
garak --model_type rest -G api_web_config.json --probes test.Test
Let's say you only want specific tests like prompt injections. You can use "garak --probes" to list all the different probes
garak --model_type rest -G api_web_config.json --probes promptinject --parallel_attempts 20
Reading the report
Last updated
Was this helpful?