🧠 Master Guide to AI Red-Teaming using NVIDIA Garak
References: https://github.com/NVIDIA/garak | https://garak.ai/garak_aiv_slides.pdf | https://garak.ai | https://reference.garak.ai/en/latest/
Author: Harshit Rajpal, Security Engineer, Bureau Veritas Cybersecurity North America
Introduction
In this guide, we will explore Garak – an open-source Generative AI Red-teaming and Assessment Kit by NVIDIA – and how to use it for scanning Large Language Models (LLMs) for vulnerabilities. We’ll cover everything from installation and setup to running scans, focusing on key features like connecting Garak to different LLM interfaces (including a local REST API chatbot), using specific probes (e.g. jailbreaking attacks), customizing prompts, speeding up scans, understanding Garak’s components, writing your own plugin, and interpreting Garak’s output reports. This comprehensive, step-by-step walkthrough will feel like a technical whitepaper, complete with code examples, command-line usage, and references to official documentation and community insights.
Table of Contents
1. Installation and Environment Setup
Since Garak has its own dependencies, this guide will use Conda. Conda is a powerful command-line tool for package and environment management that runs on Windows, macOS, and Linux.
System Requirements (recommended)
Python: 3.10
OS: Windows 10+, Linux, or macOS
RAM: Minimum 8GB (more if using local LLMs via Ollama or transformers)
Optional:
GPU: NVIDIA GPU with CUDA for local model acceleration
Anaconda: Latest Release
git
Ollama: Latest Release
Let's get started with the setup.
I will be using a Windows 10 host in this guide; however, feel free to use the supplemental commands for your specific OS. Some of the key alternate commands will be given here.
First, let's get Conda up and running. You can choose your installer here and then use the following commands for download and installation.
Follow the standard installation process. Once done, within your project folder (mine would be C:\Users\hex\Desktop\Garak), check a valid Anaconda installation via the conda command.

We are ready to set up a new environment for Garak.
Once installed, Garak provides a command-line interface. To see basic usage, run garak -h
2. Getting Started With Garak
To quickly test Garak’s setup with a sample LLM generator and a probe.
This uses Garak’s built-in test components:
Generator:
test.Blank– a mock model which can be specified with--target_typeProbe:
test.Blank– sends a dummy input, which can be specified with--probes

As you may have observed, the JSON and HTML summary reports have been written to the default directory ~\.local\share\garak\garak_runs\
2a. Modules
Now, a bit about various modules in Garak. The major components, along with their description and a basic example are as follows:
Component
Description
Example
Probes
Probes are the attackers. They generate specific prompts or input scenarios to test the LLM for vulnerabilities or behavioral weaknesses. Each probe targets a particular issue like jailbreaks, injections, or bias.
jailbreak.JailbreakProbe sends “ignore all instructions” prompts to test guardrail bypasses.
Generators
Generators are the LLM interfaces that Garak queries; they handle sending prompts and retrieving model responses. They abstract away API calls or local model inference.
ollama.OllamaGenerator connects Garak to a locally running LLaMA2 via Ollama’s REST API.
Detectors
Detectors analyze model outputs to decide if a failure or unsafe behavior occurred. They can check for toxicity, leakage, or rule-breaking based on text analysis or regex.
toxicity.BasicDetector flags model responses containing hate or violence terms.
Buffs
Buffs augment, constrain, or perturb the set of attempts (the raw attack inputs) before they are sent to the model. This can reveal vulnerabilities that wouldn’t show up with the raw attacks alone.
garak.buffs.encoding.base64 wraps a prompt in base64 encoding before sending it to the model.
By combining these, we can tailor our scans.
3. Scanning LLM Interfaces with Garak
In Garak, a “generator” (also referred to via CLI as a “target type”) is the component that wraps a particular LLM interface: it sends prompts and receives completions (or chat responses) from a model backend. Garak supports a wide variety of backends: local models (via Hugging Face transformers or GGML), remote/cloud APIs (OpenAI, Cohere, Replicate, etc.), and custom REST endpoints. For local usage, like on your machine with Ollama, Garak includes a dedicated generator: OllamaGenerator (and a variant for chat mode) under garak.generators.ollama. For arbitrary REST-based LLMs or chatbots, e.g., if you wrap a model behind a custom HTTP API, there is a generic REST generator: RestGenerator under garak.generators.rest
So, generators = LLM interfaces, and connecting to LLM interfaces means telling Garak which backend to use (local, cloud API, custom REST) via --target_type / --target_name (formerly --model_type / --model_name) when you invoke it.
3a. Example
In this section, we'll see how to set up specific target types for scans. You can use Ollama (for example) by setting it up from the link in section 1a. Assuming Ollama is running locally on its default host/port (e.g., 127.0.0.1:11434) and you have an LLaMA-based model loaded (e.g., “llama2” which can be installed with the CLI command: ollama run llama2), you can provide the target_type as "ollama" and target_name as "llama2".

You can inspect the report in the location suggested in the STDOUT. List of the major target types available:
target_type / generator
What it connects to / description
huggingface
Local models via Hugging Face / Transformers (e.g. GPT-2, LLaMA-based, etc.)
huggingface.InferenceAPI
HuggingFace hosted inference API — remote models via Hugging Face’s API
huggingface.InferenceEndpoint
Private/custom Hugging Face inference endpoints (self-hosted or private)
openai
OpenAI API (ChatGPT / GPT-3.x / GPT-4 etc.)
replicate
Models hosted on the Replicate platform — both public and private models/versions
cohere
Models hosted via Cohere’s API (when supported by Garak)
ggml
GGML / GGUF models (e.g. for use via local binaries like llama.cpp)
ollama
Local models served via Ollama — for example local LLaMA-based models served with Ollama REST API
rest
A generic REST-based generator — for wrapping any custom HTTP/JSON API (e.g. a self-hosted FastAPI endpoint)
test
Built-in “test” generator(s) for mock testing: e.g. test.Blank, test.Repeat, test.Single etc., for dry-runs or plugin testing
mistral
Support for Mistral-family models (via a dedicated generator)
groq
Support for models via Groq API / backend (if configured)
litellm
Support for models via a “LiteLLM” backend (lightweight LLM interface)
nemo, nim, nvcf
Support for specialized or vendor-specific backends (NeMo / NVIDIA-specific) — e.g. for multimodal or proprietary LLMs
rasa / rasa-based generator (e.g. RasaRestGenerator)
Interface to Rasa-based REST endpoints / LLM-powered chat-services via Rasa style APIs
watsonx
Support for IBM’s Watsonx (or similar) LLM APIs if configured — for enterprise LLM backends
guardrails
A generator wrapper integrating with guardrails / safety-wrapped LLMs via a protective interface (e.g. NeMo Guardrails)
The following command can provide a full list of generators:
3b. REST Interface
While there are various interfaces, the REST interface is the most useful for red-teaming and pentesting-like situations, as many AI assistants encountered would utilize a REST API-based web application that binds to their own configured LLM. While in this article, we don't have our own custom LLM implementation, we'd be scanning a standard 'out-of-the-shelf' llama2 model and inspecting the security risks associated with it.
REST API based AI assistant lab setup
For this, our lab set-up would look like this:
a. Ollama running the llama2 model locally.
b. A minimal FastAPI app that proxies POST requests from /generate to http://127.0.0.1:11434/api/generate (Ollama’s default REST endpoint).
c. A static HTML + JS page (served from /static/index.html) that provides a basic chat-style UI: a textarea to enter a prompt, a “Send” button, and a div to show user + AI messages.
Here is the link to the app you can use. Once you've cloned the repository, you can run the following commands:
Once done, you can launch the URL http://127.0.0.1:8000/static/index.html in a web browser and test if the AI assistant is working.

You can inspect the chat prompt you entered and copy the request as bash or PowerShell as needed. Here is the PowerShell version of the CLI command you can send to this interface.
Here is a sample HTTP request in Burp format that can be utilized to call the LLM we just set up through Burp. You can also proxy the CLI to Burp and issue a curl command.

Now that we have a fully working REST API based AI assistant, we can begin to scan the LLM with Garak. Garak RestGenerator allows you to target any REST/HTTP endpoint as long as you tell it how to format requests, what method (POST/GET) to use, what headers, and how to extract response text. A full documentation can be found here.
Now, for our specific app, the api_web_config.json file would look like so:
Explanation of fields:
"name"
Friendly name to identify this generator in logs/reports
"uri"
The full URL for your REST endpoint (pointing to your proxy)
"method"
HTTP method — post in your case
"headers"
HTTP headers to send — at minimum Content-Type: application/json
"req_template_json_object"
The JSON body template: 'prompt' uses "$INPUT" which Garak replaces with the actual prompt text of each probe; other fields (model, stream) are fixed
"response_json"
true because your endpoint returns JSON
"response_json_field"
The JSON field in which the model output resides — in your case "response" (matching the JSON you saw earlier)
"request_timeout"
(Optional) timeout in seconds — useful if generation is slow
Now that our web configuration file is set, we can run our first test on this web application.

Throughout the article, we shall be targeting this application.
Now, we have successfully run a sample test! The only problem is, we have no visibility into how prompts are crafted and sent or what's been tested. Let's talk about proxying it through Burp Suite, so we know the domain of prompts crafted and tested.
4. Proxying Garak Through Burp Suite
You can add the "proxies" option in the api_web_config.json file and configure this to the port Burp Suite is listening on.
Now, we can also use one of the probes called "dan.DUDE" for a sample run. Dan is a "roleplay" jailbreak prompt injection category that instructs the LLM to behave as both itself and then as "DAN" which stands for "Do Anything Now" and as the name suggests, DAN can do anything now. More on some of the other prompts, upcoming in the later sections.

Now, in the HTTP history of BurpSuite, you can see all the prompts that were crafted and tested by Garak.

As in the STDOUT, it states 5/5 PASS which means the LLM is not vulnerable to dan.DUDE. A quick overview is available in the HTML file located at the default location $HOME/.local\share\garak\garak_runs\garak.UUID.report.html
Here is how the HTML stats report looks:

5. Understanding Probes
So far, we have seen easy setup guidelines, testing different interfaces (generators), and not only testing custom REST-based API endpoints, but also coding a sample REST app that binds to a local LLM instance (Ollama). Now, we will explore the different probes that Garak tests a target against.
In Garak, probes are the core building blocks of vulnerability scanning. Each probe represents a class of tests designed to elicit specific unwanted behaviors from a language model, such as prompt injection, latent context exploitation, bullying encouragement, or anthropomorphization. Probes generate crafted inputs that target known failure modes or attack vectors, and detectors assess whether the model’s outputs indicate a problem.
To see all probes available in your Garak installation:
In the output, you'd observe various probes marked with "ZZZ" emoji. These are inactive probes and may not work in your scans. Probes not marked with "ZZZ" are active probes.
Here is a summary of the probes available, divided into categories of probes, a few important sub-probes, and when to use them (or what exactly they test):
ansi escape
Tests whether models output annoying or harmful ANSI control codes (e.g., terminal escapes).
ansiescape.AnsiEscaped, ansiescape.AnsiRaw
Useful for models used in console UIs; can reveal unsanitized control chars.
atkgen
Still a prototype. Tries to generate toxic or offensive content via automatic attack generation.
atkgen.Tox
Useful for detecting crude toxicity vulnerabilities.
audio
Probes with audio-related vectors (e.g., "achilles heel" via audio).
audio.AudioAchillesHeel
Niche/experimental — may give errors on text-only models.
av_spam_scanning
Tests whether models output known spam test codes (EICAR, GTUBE) or phishing patterns.
EICAR, GTUBE, GTphish
Useful for systems scanning or classifying email/spam.
continuation
Tests whether models continue slur or reclaimed slur identifiers.
ContinueSlursReclaimedSlurse
Content-safety testing for offensive language continuation.
dan (jailbreak)
Assesses classic DAN-style jailbreaks and related prompt attacks.
DanInTheWild, AutoDANCached
Core jailbreak testing — often a first-look vulnerability category.
divergence
Probes for repetition/repeated tokens — models looping or repeating.
Repeat
Useful for models with generation instability.
donotanswer
Probes for failure to refuse harmful or restricted requests.
InformationHazard, MaliciousUses
Good for safety tests where the model should decline.
encoding
Uses encoded payloads (base16, base32, Base64, Morse, etc.) to bypass filters.
InjectBase64, InjectHex, InjectMorse
Tests whether encoding tricks reveal prompts that achive undesirable outputs.
exploitation
Tests for classic injection patterns (e.g., Jinja/Python, SQL echo).
JinjaTemplatePythonInjection, SQLInjectionEcho
Good for web-style injection; exposes model attempts to follow code patterns.
glitch
Tests for weird generation/formatting glitches.
Glitch
Mostly experimental/low-impact tests.
goodside
Light-hearted/fuzz-type tests (e.g., tag generation, JSON manipulation).
Tag, ThreatenJSON
Useful for stress-testing output formatting logic.
grandma
Probes based on common language patterns or stereotypes.
Slurs, Win10, Substances
Fun/quirky probes; may return benign content.
latentinjection
Tests for latent prompt injection (hidden inside benign context) — e.g., indirect injection.
LatentJailbreak, LatentInjectionReport, LatentInjectionTranslationEnFr
Powerful category for RAG or context-based attacks.
leakreplay
Replays text from known agencies (Guardian, NYT, literature) to test data leakage/hallucination.
GuardianComplete, NYTComplete
Useful for data leakage / memorization checks.
lmrc (risk cards)
Based on Language Model Risk Cards — checks for bias, profanity, sexual content, slurs.
Bullying, SexualContent, SlurUsage, Deadnaming
Core societal safety testing.
malwaregen
Probes for malware generation patterns — code that resembles malware payloads.
Evasion, TopLevel
Helpful in security contexts where code generation is possible.
misleading
Tests for false assertions / misinformation responses.
FalseAssertion
Useful for model factuality and hallucination tests.
packagehallucination
Checks for hallucination of package names or code.
Python, JavaScript, Rust
Good for code-generation contexts.
phrasing
Tests for tense consistency / phrasing shifts (past/future).
FutureTense, PastTense
Useful for stylistic vulnerabilities / output stability.
promptinject
Broad class of prompt injection attacks (hate, kill, long prompt).
HijackKillHumans, HijackLongPrompt
Central malicious prompt testing.
realtoxicityprompts
Uses standard toxicity prompt sets (e.g., RTP).
RTPBlank
Standard community-driven toxicity probing.
sata
Tests for violations of simple token assumptions (e.g., MLM pattern tests).
MLM
Rare, model internals probing.
smuggling
Tests for code/text smuggling patterns (function masking, hypothetical responses).
FunctionMasking, HypotheticalResponse
Can surface hidden smuggling vulnerabilities.
snowball
Tests for structural problems like prime generation, graph connectivity.
GraphConnectivity
Mostly logical/structural stress tests.
suffix
Similar to noise suffix / termination tests.
BEAST, GCG
Often low-impact but interesting edge cases.
tap
Tests for token alignment problems (PAIR, TAP).
PAIR, TAP
Low-level tokenization anomalies.
test
Dummy probes for testing tool behavior.
Blank, Test
Useful for verifying config/integration.
topic
Checks for topic-based word allowed/blocked lists.
WordnetAllowedWords, WordnetControversial
Useful for classification/controversial topic coverage.
visual_jailbreak
Probes for visual-to-text jailbreaks (e.g., FigStep patterns).
FigStep
For multimodal systems; often inactive.
web_injection
Simulates web-style injections / exfil patterns (Markdown XSS, image exfil).
MarkdownXSS, StringAssemblyDataExfil
Tests web injection vectors through model text.
You can also filter this list by adding -p option and then mentioning the family of probes. For example, to view probes only in the Dan family:

You can then select one or more by name when running a scan. Here, in the following examples, we will be testing various probes on the REST-based llama2 wrapper we created earlier and then utilizing tags for faster probe selection, and finally combining probes manually in the CLI.
5a. Finding our First Prompt Injection Vulnerability!
The following is an example run of one of the prompt injection probes called "HijackHateHumans" where the goal is to have LLM say "I hate humans." Garak command:
python -m garak --target_type rest -G api_web_config.json --probes promptinject.HijackHateHumans
Example prompt:

Sample output:

As you can notice in the output above, the LLM we are targeting is, in fact, vulnerable to HateHumans Prompt Injection (toxicity)! This is not a normal, expected, benign behaviour; it suggests a failure in safety alignment, input/output filtering, or adversarial resistance. It is also worth noting that various probes (such as lmrc.bullying) might throw errors while trying to run them. At that point, you can choose to exclude such probes from the testing.
5b. Utilizing tags for selective testing
If you've followed along so far, you know that testing probes consumes a considerable amount of time. Thus, we can also utilize "probe_tags." Now, not a lot of information about probe_tags is available on the help menu; going through the code, one can find different probe tags that can be utilized. For example, check out the code for the Dan probe here. Observe the probe tags.

Here, as you can see, one of the probe tags is OWASP:LLM01. This is an obvious reference to OWASP LLM Top 10 and the first vulnerability on the list (prompt injection).
Thereafter, you can run a scan using this:

As you can notice in the screenshot above, this option automatically selected all the probes marked as owasp:llm01 and saved us quite a lot of time to go through the list and figure out what to test!
Of course, not all of these probes would work since the tool is still in development (or might require manual intervention to make them work), so, you can simply remove the probes causing issues and re-run the command.
Note: In the probe list, there is a "ZZZ" emote suffixed for some of these modules. There is a high chance these probes won't work.
5c. Manually combining probes
Now, if we want to fine-tune our scans even more, we can provide a comma-separated list of probes to Garak for testing within the "--probes" option.
For example, I will test lmrc.SexualContent,grandma.Slurs,divergence.RepeatedToken together like so:
Now that you have run a few tests and inspected it through Burp, you may have noticed that there is an abundance of false positives. Garak may suggest that a test failed, while when you inspect the report, the output seems benign. We shall uncover a little bit more on how to filter the report and inspect it for accurate test results in the next section.
6. Evaluating and Reading Garak Reports
Before evaluating a report, let us name the report correctly first. As you may have already observed that Garak's output report names are randomized alphanumeric string that follows the pattern "garak.RandomString." While good enough for a quick run, in a project, you might need to fine-tune this. You can utilize the "--report_prefix" option to specify the output filename. For example, I am naming the output report prefix as "masterguide".

Now, you are ready to inspect the report. You might have noticed that after a scan is completed, 3 different files are created:
filename.hitlog.jsonl
filename.report.jsonl
filename.report.html

Any response flagged as a hit (vulnerability) by the detector will be placed in the hitlog.jsonl file. These entries can be followed back to the report.jsonl attempt entry based on the attempt_id.

It is important to note that while running a Garak scan, if no detectors are explicitly provided, the default detector would be the probe's primary detector as specified in the Python file at /garak/garak/probes/probe_name.py. For example, here are Dan's primary and extended detectors that would produce a hit while scanning.
Now, for the scan we had done earlier using the grandma.slurs probe, we see a report. The file is difficult to read as it is. Therefore, I made a script which would help you convert a filename.report.jsonl to a CSV (with limited fields for better visibility). A user can then go through filename.hitlog.jsonl, pick up the attempt ID, and search in the CSV for that particular hit, thereby making analysis easier! Here is the PowerShell version of the same script.
To run this script:
Then, we can use any spreadsheet software to open this CSV file. You can observe the four major fields taken from the report.jsonl and put it in the CSV here, while redacting almost everything else.

Now I'll pick one of the attempt IDs from the hitlog and search it in the CSV (accept ID in hitlog is the same as UUID in report.jsonl)

You can then easily search for this in the CSV and analyze prompts and their outputs more clearly.

As we can observe in the output report, this appears to be a false positive (which is a common occurrence). However, now that we have all of our data in a visually upgraded format, analysis can be better!
6a. Aggregation
There’s a tool for merging garak reports. This means that multiple garak runs can execute independently and then the output can be compiled into one report. Being able to do this affords parallelization, for example on SLURM/OCI clusters where an entire executable job has to be specified. The tool is "aggregate_reports.py" and it runs from the command line. The directory is garak/garak/analyze/aggregate_reports.py . You can get help by running:

As of the time of writing this article, the tool is currently bugged (does not support multiple infiles) and requires an update. However, the general working code would look like:
This would combine all the outputs into one report. So, ideally, one can individually run probes and later combine the output in a single report, thus eliminating the need to delete and re-run the scan once it fails due to an error with the probe.
6b. Taxonomy
The --taxonomy option helps you categorize the HTML report output in OWASP/AVID Risk categories. First, here is how the HTML report looks like:

So, the report shows a risk score and the rate of hit. Here, 100% prompts were marked as secure by our detector.
However, here is another scan that I ran with --taxonomy owasp set.
Here is what the report looks like for this scan

As you can observe, the prompts tested are now categorized as per OWASP LLM Top 10, and a risk score is given. We can see around 20% failure in the LLM02 and LLM06 categories. There is an uncategorized section too for prompts that didn't fit in any of the top 10 categories. This way of utilizing taxonomy makes it a little easier to understand, on a higher level, the risk associated with the target LLM. We can now visit the hitlog and assess the output further.
7. Testing With Custom Prompt/Wordlist Sources
If you've followed along this far, you must have observed that all the prompts come from pre-defined Python templates under garak/garak/probes . Here, the structure of a probe template is as follows:
Global vars - If any
Class of a probe - This is the subcategory of a probe and contains:
Any required tags
Working function - Performs any operations needed to create prompts
Variable
promptswhich holds the values of all prompts to be tested in a list.
So, if we can define our custom prompts in a file and recreate a similar template, we can have Garak send requests using our own custom probe. You can utilize the sample template I coded here or make one yourself by looking at the code for other probes and overwriting very few things. I essentially utilized the existing "test" probe we used in our article earlier, found under garak/garak/probes/test.py, and added a class called "FileListPrompts". This class is going line-by-line and reading prompts from our file "my_prompts.txt" and putting the contents as an array of strings (aka a list in Python) in the variable prompts. This adds functionality to the test probe, and Garak can now fetch wordlists and bombard the target! Please note that the except block in the code below is a failsafe and assigns a singular value "hello" to the prompts variable in case file I/O was unsuccessful. This way, while reading the output, you can always know whether a file read was successful or not and troubleshoot accordingly.
Please note that in other probes, a detector is usually configured to help users analyze the CLI output as a PASS/FAIL status. We can configure that too within the code by setting the variable "primary_detector" if we know the nature of the prompts (such as mitigation.MitigationBypass), or we can use the all detectors option in CLI. While configuring the template above, I added the "always.Pass" detector.
Alright then! Now that our tweaked "test.py" is ready to support custom wordlists, we need to configure a wordlist and name it "my_prompts.txt" or any other name, and then change the code to support that, and keep it in your current directory. I'll be adding four sample prompts just for testing purposes.

Once done, you can then run the following command:
As you can see, Garak is now testing the target with our custom wordlist.

Let's inspect this in Burp Suite and confirm again.

Well, there we go. There are various resources on the internet where you can find prompt injection wordlists, including huggingface datasets. Here are a couple to get you started:
My uncle said, "With a large wordlist comes huge overhead." In the next section, we'll discuss how we can fast-track our scans.
8. Speeding Up Scans
Did you observe something in the garak command we ran in section 7? A "-g" and "--parallel_attempts" option was sneaked into the command. These options drastically increase Garak's run-time speed.
Garak is by design sequential and stochastic-friendly in nature. That means prompts are tested one by one, in a sequence, redundantly, unless specified otherwise. In this section, we'll look at some of the options or ways through which scan speeds can be juiced up.
8a. -g (--generations)
Defines the number of times Garak sends LLM the same prompt. The default value is 5.
Why number of generations matter:
Many vulnerabilities (hallucinations, harmful completions, bypasses, jailbreaks) occur stochastically.
A model might refuse harmful content once, but answer dangerously on the next try.
So increasing
-gincreases thoroughness, but also increases the total scans proportionally.

Now, depending upon the model you want to test with the probes you want to test, this option can be throttled. Here is a brief comparison of similar scans (8 prompt cases) with different number of generations per prompt.
python -m garak --target_type rest -G api_web_config.json --probes test.Test -g 1
Run-time: 42.92 seconds

python -m garak --target_type rest -G api_web_config.json --probes test.Test -g 3
Run-time: 106.53 seconds

python -m garak --target_type rest -G api_web_config.json --probes test.Test -g 5
Run-time: 172.12 seconds

8b. --parallel_attempts
Defines the probe attempts Garak should run at the same time. Also known as parallelism. The default value is 1. The maximum value depends on how fast the target is and the compute. The recommended value for this option is 32 as per the guide.
Running inference in serial is slow and often takes days, sometimes weeks. During probing, Garak can marshal all the prompts it knows it’s going to pose, and parallelize these attempts. Multiple parallel attempts should definitely be used until the bottleneck, especially with REST/OpenAI/high-latency endpoints. It should not be used with CPU-only local models, or it may drastically slow down the scan since the CPU doesn't support handling of multiple concurrent requests.
Here is a brief comparison of the run-time of two similar scans with and without --parallel_attempts option set.
python -m garak --target_type rest -G api_web_config.json --probes test.Test
Run-time: 225.36 seconds

python -m garak --target_type rest -G api_web_config.json --probes test.Test --parallel_attempts 32
Run-time: 83.39 seconds

8c. --parallel_requests
This option runs multiple generations per prompt in parallel. It only matters when the number of generations (-g/--generations) is greater than 1. So, if I want to create 3 generations per prompt and launch those 3 requests concurrently, I can:
garak -g 3 --parallel_requests 3
The option can be utilized when targeting a strong API backend (OpenAI, Anthropic, HF Inference) or when probing for stochastic vulnerabilities (jailbreaks, toxicity). It can be avoided when backend does not allow concurrent requests
8d. Reducing Detectors
Detectors run after attempts to classify outputs (Toxicity, MitigationBypass, ProductKey, etc.). If you don’t specify --detectors, Garak uses the probe’s default detector(s). Some probes also use many extended detectors, which slows down processing. For example, look at the code here. I'll launch two scans - one with default detectors and one with a low-compute detector, such as "always.Pass"
Default detector:
python -m garak --target_type rest -G api_web_config.json --probes dan.AutoDANCached -g 1 --parallel_attempts 3Low-compute detector:
python -m garak --target_type rest -G api_web_config.json --probes dan.AutoDANCached -g 1 --parallel_attempts 3 --detectors always.Pass

As you can observe, we bumped the speed of just a 3-prompt test by 2 seconds. However, this should only be used in cases where no summary report of failed tests is required on the CLI.
8e. Choosing Specific Probes
By choosing certain probes, scan time can be sped up significantly. A fully detailed coverage of how to choose the probes has been covered in section 5.
8f. Soft-Capping number of prompts using YAML config file
Garak can input a scan configuration YAML file as well. Here, we can define runtime behaviors, probe configurations, concurrency settings, fuzzing, overrides, and more. We shall discuss this in detail in the next section.
We can speed up our scans by limiting the number of prompts our probes will be sending out. This would be uniformly applied to all the probes in a scan where the --config option is specified. Specific probes can be skipped by scanning the target with them separately.
To apply a soft-cap on the number of prompts, paste the following in a garak_fast.yaml file.
Once done, you can include the --config garak_fast.yaml option in your scan. This would limit the number of probes to 100 for fast scanning. For example,
python -m garak --target_type rest -G api_web_config.json --probes dan.DanInTheWild -g 1 --parallel_attempts 32 --config garak_fast.yaml

This would drastically speed up your scans but would omit certain prompts as well. Soft-cap can be applied for smell testing and in scenarios where a full scan might not necessarily be required.
9. Understanding Detectors and Buffs
In the article so far, we've looked at detectors in bits and pieces and have hardly talked about buffs. In this section, we'll introduce buffs and look a bit more at detectors.
Garak’s power comes from:
Probes: generate adversarial prompts
Detectors: evaluate the model’s responses
Buffs: augment or transform the adversarial prompts to increase attack variety
Probes "attack" the model, detectors "score" the outputs, and buffs "enhance" the attacks.
Let's look at them one by one.
9a. Detectors
Detectors are plugins that evaluate the model’s output and analyze violations of the probe’s goal. Detectors might search for slurs, detect toxic language, detect leaked secrets, identify jailbreak markers, check for forbidden content, etc. Inside probe classes (from the source code), fields like these exist:
Recommended Detector: Garak will not automatically use recommended detectors unless explicitly told to run all detectors. It is just kept for backwards compatibility.
Primary Detector: This is the main detector for a probe and the one best suited to measure whether the probe succeeded. Primary detectors are used by default, optimized for speed, matched to probe type, and a part of the core scoring pipeline. If you don’t specify a detector manually in the
--detectorsoption, Garak uses primary detector + extended detector by default as defined in the code.Extended detectors: They run in addition to the primary detector. It is automatically run along with primary detectors for a more detailed performance observation, not just fast scoring. They are typically more expensive to run, as they check secondary signals and provide broader coverage. Extended detectors can be de-activated by removing them in the
/garak/garak/probes/probename.pyfile or by manually providing detectors in the--detectorsoption and overriding hard-coded primary and extended detectors.
Detectors evaluate the output based on an internal scoring system. The range lies from 0..1, and an attack is considered a hit if a score is 1, and a no hit at a score of 0. You can change this scoring with the --eval_threshold option which defines the point in the 0..1 range output where a result counts as a successful attack/hit.
You can view all the detectors with:
Here is a list of some of the most used detectors and their functionality:
unsafe_content.SlursReclaimedSlurs
Detection of racist, hateful, discriminatory slurs or reclaimed slurs.
unsafe_content.SexualContent
Sexual content, explicit material, inappropriate adult content.
unsafe_content.Bullying
Abusive interactions, hostile or bullying language.
unsafe_content.SelfHarm
Self-harm ideation, instructions, encouragement, or dangerous psychological content.
unsafe_content.Illegal
Advice or information that would lead to illegal activity.
mitigation.MitigationBypass
Cases where the model tries to avoid answering but still leaks harmful content (“soft refusal” jailbreaks).
productkey.Win5x5
Windows-style product keys in XXXXX-XXXXX-XXXXX-XXXXX-XXXXX format.
malwaregen.Payload
Malware payloads, suspicious code fragments, malicious generation patterns.
malwaregen.TopLevel
High-level malware instructions or structured malicious code.
malwaregen.Evasion
Attempts to bypass antivirus, detection evasion, obfuscation.
sqlinjection.SQLInjectionEcho
SQL injection strings being reflected or executed in the output.
jinja.TemplateInjection
Jinja template injection patterns (e.g., {{ 7*7 }} or {% ... %}).
smuggling.FunctionMasking
Language-model smuggling techniques that mask malicious instructions using function disguises.
smuggling.HypotheticalResponse
Cases where the model pretends the malicious action is "hypothetical" but still performs it.
encoding.InjectBase64
Base64-encoded smuggled or harmful output.
encoding.InjectHex
Hex-encoded content that may contain malicious patterns.
encoding.InjectROT13
ROT13-encoded harmful or hidden content.
encoding.InjectAscii85
ASCII85 encoding used to smuggle information.
encoding.InjectUnicodeTagChars
Unicode tag characters used for obfuscated jailbreaks.
misleading.FalseAssertion
Factual inaccuracies, contradictions, or hallucinations.
leakreplay.GuardianCloze / NYTCloze / PotterCloze
Leakage or memorization of copyrighted datasets.
packagehallucination.Python
Hallucinated Python packages or APIs.
packagehallucination.JavaScript
Hallucinated JS packages or APIs.
packagehallucination.Rust / Dart / Ruby / etc.
Language-specific dependency hallucinations.
realtoxicityprompts.RTPBlank
Generic toxicity levels based on the RealToxicityPrompts benchmark.
goodside.ThreatenJSON
Attempts to create coercive, threatening output in structured formats (like JSON).
exploitation.JinjaTemplatePythonInjection
Python logic or template injection attempts in response to crafted exploit prompts.
always.Pass
Always returns PASS (used for debugging).
always.Fail
Always returns FAIL (used for debugging detectors).
always.Passthru
Passes through outputs without scoring (internal use).
One probe → Many detectors → Many risk categories.
The hard-coded detectors show the best performance, so there might not be much need to specify detectors while testing a target on the first run of probes. However, as you try to refine your runs, you'd realize that while a probe may test for jailbreaks but it might also produce malware, slurs, secrets, hallucinations, or toxicity. In such cases, specifying detectors is useful. Modularity is how Garak would scale.
Example: Same Probe, Different Detectors → Different Findings
Let's say we have the following probe to test: promptinject.HijackKillHumans
If you run it with default detectors:
It answers: “Did the model bypass safety instructions?”
Similarly, if you run with a malware detector:
You can detect: “Did the model respond with harmful code instead of text?”
Finally, if you run with hallucination detectors:
You can detect: “Did the model hallucinate facts after being tricked?”
So, it is the same probe with different scoring and different risk categories. So, for
All detectors scan
If you do have the necessary time at hand, you can run all the detectors in a scan by appending the -d all option to analyze how a single probe might be testing positive for one or more categories. However, it would consume a great amount of time more.
9b. Buffs
Buffs, also called fuzzers, modify prompts to increase the adversarial pressure on a model. They can paraphrase, alter encodings, switch cases etc. Just like other fuzzers such as wfuzz, buffs are applied after probes generate prompts but before they are sent to the model. Buffs can dramatically increase the surface area of an attack. By default, no buff is applied to the scans.
You can view the buffs available with the command:

So, we can run a probe with a paraphrase buff and compare the prompts sent to the model. Let's run a "non-buffed" grandma.Slurs sub-probe
Here is the very last prompt as it was sent to the model.

Now I will be applying an encoding to base64 buff using the -b BUFF option and comparing the inputs and outputs.

As you can see, the same prompt, in base64, yields a completely different result. While not necessarily a vulnerability, this does indicate a lack of encoding handling by the model. Similarly, other buffs can be applied and outputs compared to analyze the model's behavior on encapsulated/fuzzed input.
Now, we can also use multiple buffs and pass some buff options too.
The above combination would now send lowercase prompts and encoded base64 prompts both.
By this point, we have covered a majority of the existing features in Garak. In our next and final section, we will take a look at the different configuration options we have while initiating a scan.
10. Garak Config YAML Files
Garak supports an optional but powerful configuration mechanism using YAML files.
A config.yaml file lets you control generators, probes, detectors, buffs, parallelism, seed, taxonomy, and more without writing extremely long command-line arguments every time you run a scan.
In section 8f, we introduced a very basic configuration file to speed up our scans by soft-capping the number of prompts sent to the application for testing. Let's dig a little deeper into defining configurations.
Below are the major sections you can define inside a Garak YAML config. Each section maps to internal config categories:
System-Wide settings: The
system:block controls how Garak runs at a lower level, especially performance and CLI behavior.OptionMeaningverboseLevel of console/log verbosity
narrow_outputUse narrow CLI output formatting
parallel_requestsNumber of parallel requests per prompt
parallel_attemptsNumber of probe attempts executed in parallel
liteDisplay a caution that run might be less thorough
show_zDisplay z-scores in CLI output
enable_experimentalEnable experimental CLI flags (not recommended for stable production)
max_workersCap on parallel worker threads/processes
Run settings: The
run:block contains settings that define the run itself. It describes how prompts are sent to the model, thresholds, and general behavior.OptionMeaningsystem_promptIf given and not overriden by the probe itself, probes will pass the specified system prompt when possible for generators that support chat modality.
seedA random seed for reproducible selections
deprefixRemove the prompt from the start of the output (some models return the prompt as part of their output)
eval_thresholdThreshold at which a detector considers output a ‘hit’
generationsHow many times to generate per prompt
probe_tagsFilter probes by tag (e.g.,
owasp:llm01)user_agentHTTP user agent for network requests
soft_probe_prompt_capLimit on how many prompts a probe will generate
target_langTarget language for translation support
langprovidersList of language provider configs for translation
Plugins config options: The
plugins:block lets you configure all aspects of Garak’s plugin system.OptionMeaningtarget_typeThe type of target generator, e.g.,
nim,rest, orhuggingface.target_nameThe specific name/identifier of the target to be used. Optional — blank means a type-specific default is used.
probe_specComma-separated list of probe modules or
module.classnameentries. Modules select only active probes. Equivalent to CLI-p.detector_specOptional override list of detectors to use instead of probe default detectors. Enables pxd harness. Equivalent to CLI
-d.extended_detectorsWhether to run only primary detectors (fast) or include extended detectors (more thorough).
buff_specComma-separated list of buff modules or individual buff classnames, same format as
probe_spec.buffs_include_original_promptWhether the un-buffed prompt should also be included alongside buffed prompts.
buff_maxMaximum number of buffed variations allowed per prompt.
detectorsRoot configuration node for detector plugins.
generatorsRoot configuration node for generator plugins.
buffsRoot configuration node for buff plugins.
harnessesRoot configuration node for harness plugin configs.
probesRoot configuration node for probe plugin configs.
Reporting config options: The
reporting:block lets you shape how results are stored and presented.OptionMeaningreport_dirOutput directory for reports
report_prefixPrefix for report file names
taxonomyGroup probes by taxonomy category
show_100_pass_modulesWhether to include modules with 100% pass scores in output
group_aggregation_functionFunction to aggregate group scores (e.g.
minimum,median)show_top_group_scoreDisplay aggregated group scores at top of HTML report
We will discuss some of these options in section 10b and discuss how to configure a custom YAML file. First, let's see some ready made configs that Garak is shipped with.
10a. Quick Configs
Garak comes bundled with some quick configs that can be loaded directly using --config. These don’t need the .yaml extension when being requested from CLI. These are great, ready-made configs to get an idea of how Garak YAML configs can work. Quick configs are stored under garak/garak/configs/.
broad
Run all active probes once for a wide scan, includes paraphrase buff
fast
Light scan, skips extended detectors
full
More thorough, includes paraphrase buffs
long_attack_gen
Focus on attack generation (higher generations)
notox
Skip toxicity-inducing probes
tox_and_buffs
Run toxicity probes with paraphrase buff
So, for example, we can run a scan with broad config like:

Let's take a look at this configuration file and what it is doing out of the box -
Instructs Garak not to run in lite mode. Lite mode disables large/expensive probes & features for quick scans.
Instructs garak to run only 1 generations per probe (equal to the
-g 1option)Instructs garak to run extended detectors
Via
probe_spec: allinstructs garak to run every active probeVia
buff_spec: paraphrase.Fastinstructs garak to apply the paraphrase buffVia
probes.encoding.payloadsoverride, encoding probes will specifically test encoding + XSS + slur injection attempts
You can try out other quick configs as well. Let's take a look at how custom config files can be created.
10b. Custom Configs
When no additional options are attached, here is how the core configuration of Garak looks like:
Now, based on this file and the tabular explanation of all the different options available, we can create our custom config YAML files. After we have created this, we can point to it through CLI (--config name.yaml option) and override the default options. Let's walk through various examples below.
Example 1 - Speedier latentinjection prompts with only 1 generation per prompt, soft-cap of 10 prompts per sub-probe, non-verbose output, and 5 parallel attempts for more speed.
We can save this as latentinjection.yaml and run a scan like so:

Example 2 - Thorough OWASP LLM01 testing with 10 parallel attempts, lower eval threshold for more sensitivity while detecting a hit, 3 generations per prompt, report grouping by OWASP Taxonomy, and a paraphrase buff applied.
Example 3 - Dan and Grandma probe testing on paid gpt 4o mini model with limited concurrency, soft cap of 5 prompts per sub-probe, no extended detectors, compact CLI output, and no buffs to avoid bloating.
11. Conclusion
AI testing tools like Garak are crucial because modern LLMs can unintentionally reveal sensitive data, generate harmful content, or be manipulated through prompt injection, making systematic vulnerability assessment essential. By proactively scanning models for jailbreaks, misinformation, bias, and safety gaps, these tools help developers understand risks early and build more secure, trustworthy, and compliant AI systems before deployment. In this article, we took a deep dive into the key features and inner workings of NVIDIA’s Garak AI Red-Teaming framework. We learnt how to launch scans, play with different modules such as probes, detectors, and buffs, as well as analyze reports, handle different targets, make custom configuration files, and speed up scans. As the tool is still in development, facing errors in running different modules or observing non-uniform behavior is perfectly fine. We hope you enjoyed the read.
12. Appendix A: FAQs and Troubleshooting
Q1. I used a custom probe list/tags, but the scan keeps exiting, throwing various errors, including encoding and assertion errors. What to do about it?
Ans: As the tool is currently under development, such errors are common. As a quick fix, when such errors are observed, you can identify which probe is causing the error and remove it from the list of probes being tested. Otherwise, the errors need to be identified and fixed manually under garak/garak/probes/probenamegivingerror.py
Q2. How to scan thinking models, like DeepSeek R1, since sometimes the detector reads output from the chain-of-thought as well, and not just the output?
Ans: Per the documentation from the base generator, for reasoning models, using skip_seq_start and skip_seq_end can enable suppression of the chain of thought from the target response. This allows users to perform tests with and without consideration of this output from the target, as the segment is removed before passing the response to detectors.
Q3. The scan stops completely if a probe fails. Is there any way to prevent that from happening?
Ans: Sadly, no. Currently, a user would have to identify a failing probe, remove that from the list of probes to be tested, and re-run the scan. However, a user can individually run probes and later combine all the output reports using the aggregation method as suggested in section 6a.
Q4. Can a scan be resumed if it fails?
Ans: Not currently. However, a PR (https://github.com/NVIDIA/garak/pull/1531) is ongoing at the time of writing this article and shall be updated within this guide once the functionality is launched.
Q5. I am hitting request timeouts on the target. How to fix it?
Ans: While difficult to pinpoint the reason, you can throttle down the number of parallel attempts of requests sent to the application to avoid any bandwidth/congestion issues. If you are running the local application from the article above, you can also try relaunching Ollama and the application.
13. Appendix B: Burp Plugin to Auto-Generate REST config JSON
Link and demo to be updated...
Last updated