🧠 Master Guide to AI Red-Teaming using NVIDIA Garak

References: https://github.com/NVIDIA/garak | https://garak.ai/garak_aiv_slides.pdf | https://garak.ai | https://reference.garak.ai/en/latest/

Author: Harshit Rajpal, Security Engineer, Bureau Veritas Cybersecurity North America

Introduction

In this guide, we will explore Garakarrow-up-right – an open-source Generative AI Red-teaming and Assessment Kit by NVIDIA – and how to use it for scanning Large Language Models (LLMs) for vulnerabilities. We’ll cover everything from installation and setup to running scans, focusing on key features like connecting Garak to different LLM interfaces (including a local REST API chatbot), using specific probes (e.g. jailbreaking attacks), customizing prompts, speeding up scans, understanding Garak’s components, writing your own plugin, and interpreting Garak’s output reports. This comprehensive, step-by-step walkthrough will feel like a technical whitepaper, complete with code examples, command-line usage, and references to official documentation and community insights.

Table of Contents

1. Installation and Environment Setup

Since Garak has its own dependencies, this guide will use Conda. Conda is a powerful command-line tool for package and environment management that runs on Windows, macOS, and Linux.

  • Python: 3.10

  • OS: Windows 10+, Linux, or macOS

  • RAM: Minimum 8GB (more if using local LLMs via Ollama or transformers)

Optional:

Let's get started with the setup.

I will be using a Windows 10 host in this guide; however, feel free to use the supplemental commands for your specific OS. Some of the key alternate commands will be given here.

First, let's get Conda up and running. You can choose your installer herearrow-up-right and then use the following commands for download and installation.

Follow the standard installation process. Once done, within your project folder (mine would be C:\Users\hex\Desktop\Garak), check a valid Anaconda installation via the conda command.

We are ready to set up a new environment for Garak.

Once installed, Garak provides a command-line interface. To see basic usage, run garak -h

2. Getting Started With Garak

To quickly test Garak’s setup with a sample LLM generator and a probe.

This uses Garak’s built-in test components:

  • Generator: test.Blank – a mock model which can be specified with --target_type

  • Probe: test.Blank – sends a dummy input, which can be specified with --probes

As you may have observed, the JSON and HTML summary reports have been written to the default directory ~\.local\share\garak\garak_runs\

2a. Modules

Now, a bit about various modules in Garak. The major components, along with their description and a basic example are as follows:

Component

Description

Example

Probes

Probes are the attackers. They generate specific prompts or input scenarios to test the LLM for vulnerabilities or behavioral weaknesses. Each probe targets a particular issue like jailbreaks, injections, or bias.

jailbreak.JailbreakProbe sends “ignore all instructions” prompts to test guardrail bypasses.

Generators

Generators are the LLM interfaces that Garak queries; they handle sending prompts and retrieving model responses. They abstract away API calls or local model inference.

ollama.OllamaGenerator connects Garak to a locally running LLaMA2 via Ollama’s REST API.

Detectors

Detectors analyze model outputs to decide if a failure or unsafe behavior occurred. They can check for toxicity, leakage, or rule-breaking based on text analysis or regex.

toxicity.BasicDetector flags model responses containing hate or violence terms.

Buffs

Buffs augment, constrain, or perturb the set of attempts (the raw attack inputs) before they are sent to the model. This can reveal vulnerabilities that wouldn’t show up with the raw attacks alone.

garak.buffs.encoding.base64 wraps a prompt in base64 encoding before sending it to the model.

By combining these, we can tailor our scans.

3. Scanning LLM Interfaces with Garak

In Garak, a “generator” (also referred to via CLI as a “target type”) is the component that wraps a particular LLM interface: it sends prompts and receives completions (or chat responses) from a model backend. Garak supports a wide variety of backends: local models (via Hugging Face transformers or GGML), remote/cloud APIs (OpenAI, Cohere, Replicate, etc.), and custom REST endpoints. For local usage, like on your machine with Ollama, Garak includes a dedicated generator: OllamaGenerator (and a variant for chat mode) under garak.generators.ollama. For arbitrary REST-based LLMs or chatbots, e.g., if you wrap a model behind a custom HTTP API, there is a generic REST generator: RestGenerator under garak.generators.rest

So, generators = LLM interfaces, and connecting to LLM interfaces means telling Garak which backend to use (local, cloud API, custom REST) via --target_type / --target_name (formerly --model_type / --model_name) when you invoke it.

3a. Example

In this section, we'll see how to set up specific target types for scans. You can use Ollama (for example) by setting it up from the link in section 1a. Assuming Ollama is running locally on its default host/port (e.g., 127.0.0.1:11434) and you have an LLaMA-based model loaded (e.g., “llama2” which can be installed with the CLI command: ollama run llama2), you can provide the target_type as "ollama" and target_name as "llama2".

You can inspect the report in the location suggested in the STDOUT. List of the major target types available:

target_type / generator

What it connects to / description

huggingface

Local models via Hugging Face / Transformers (e.g. GPT-2, LLaMA-based, etc.)

huggingface.InferenceAPI

HuggingFace hosted inference API — remote models via Hugging Face’s API

huggingface.InferenceEndpoint

Private/custom Hugging Face inference endpoints (self-hosted or private)

openai

OpenAI API (ChatGPT / GPT-3.x / GPT-4 etc.)

replicate

Models hosted on the Replicate platform — both public and private models/versions

cohere

Models hosted via Cohere’s API (when supported by Garak)

ggml

GGML / GGUF models (e.g. for use via local binaries like llama.cpp)

ollama

Local models served via Ollama — for example local LLaMA-based models served with Ollama REST API

rest

A generic REST-based generator — for wrapping any custom HTTP/JSON API (e.g. a self-hosted FastAPI endpoint)

test

Built-in “test” generator(s) for mock testing: e.g. test.Blank, test.Repeat, test.Single etc., for dry-runs or plugin testing

mistral

Support for Mistral-family models (via a dedicated generator)

groq

Support for models via Groq API / backend (if configured)

litellm

Support for models via a “LiteLLM” backend (lightweight LLM interface)

nemo, nim, nvcf

Support for specialized or vendor-specific backends (NeMo / NVIDIA-specific) — e.g. for multimodal or proprietary LLMs

rasa / rasa-based generator (e.g. RasaRestGenerator)

Interface to Rasa-based REST endpoints / LLM-powered chat-services via Rasa style APIs

watsonx

Support for IBM’s Watsonx (or similar) LLM APIs if configured — for enterprise LLM backends

guardrails

A generator wrapper integrating with guardrails / safety-wrapped LLMs via a protective interface (e.g. NeMo Guardrails)

The following command can provide a full list of generators:

3b. REST Interface

While there are various interfaces, the REST interface is the most useful for red-teaming and pentesting-like situations, as many AI assistants encountered would utilize a REST API-based web application that binds to their own configured LLM. While in this article, we don't have our own custom LLM implementation, we'd be scanning a standard 'out-of-the-shelf' llama2 model and inspecting the security risks associated with it.

REST API based AI assistant lab setup

For this, our lab set-up would look like this:

a. Ollama running the llama2 model locally.

b. A minimal FastAPI app that proxies POST requests from /generate to http://127.0.0.1:11434/api/generate (Ollama’s default REST endpoint).

c. A static HTML + JS page (served from /static/index.html) that provides a basic chat-style UI: a textarea to enter a prompt, a “Send” button, and a div to show user + AI messages.

Here arrow-up-rightis the link to the app you can use. Once you've cloned the repository, you can run the following commands:

Once done, you can launch the URL http://127.0.0.1:8000/static/index.htmlarrow-up-right in a web browser and test if the AI assistant is working.

You can inspect the chat prompt you entered and copy the request as bash or PowerShell as needed. Here is the PowerShell version of the CLI command you can send to this interface.

Here is a sample HTTP request in Burp format that can be utilized to call the LLM we just set up through Burp. You can also proxy the CLI to Burp and issue a curl command.

Now that we have a fully working REST API based AI assistant, we can begin to scan the LLM with Garak. Garak RestGenerator allows you to target any REST/HTTP endpoint as long as you tell it how to format requests, what method (POST/GET) to use, what headers, and how to extract response text. A full documentation can be found herearrow-up-right.

Now, for our specific app, the api_web_config.json file would look like so:

Explanation of fields:

Field
Meaning / Use

"name"

Friendly name to identify this generator in logs/reports

"uri"

The full URL for your REST endpoint (pointing to your proxy)

"method"

HTTP method — post in your case

"headers"

HTTP headers to send — at minimum Content-Type: application/json

"req_template_json_object"

The JSON body template: 'prompt' uses "$INPUT" which Garak replaces with the actual prompt text of each probe; other fields (model, stream) are fixed

"response_json"

true because your endpoint returns JSON

"response_json_field"

The JSON field in which the model output resides — in your case "response" (matching the JSON you saw earlier)

"request_timeout"

(Optional) timeout in seconds — useful if generation is slow

Now that our web configuration file is set, we can run our first test on this web application.

Throughout the article, we shall be targeting this application.

Now, we have successfully run a sample test! The only problem is, we have no visibility into how prompts are crafted and sent or what's been tested. Let's talk about proxying it through Burp Suite, so we know the domain of prompts crafted and tested.

4. Proxying Garak Through Burp Suite

You can add the "proxies" option in the api_web_config.json file and configure this to the port Burp Suite is listening on.

Now, we can also use one of the probes called "dan.DUDE" for a sample run. Dan is a "roleplay" jailbreak prompt injection category that instructs the LLM to behave as both itself and then as "DAN" which stands for "Do Anything Now" and as the name suggests, DAN can do anything now. More on some of the other prompts, upcoming in the later sections.

Now, in the HTTP history of BurpSuite, you can see all the prompts that were crafted and tested by Garak.

As in the STDOUT, it states 5/5 PASS which means the LLM is not vulnerable to dan.DUDE. A quick overview is available in the HTML file located at the default location $HOME/.local\share\garak\garak_runs\garak.UUID.report.html

Here is how the HTML stats report looks:

5. Understanding Probes

So far, we have seen easy setup guidelines, testing different interfaces (generators), and not only testing custom REST-based API endpoints, but also coding a sample REST app that binds to a local LLM instance (Ollama). Now, we will explore the different probes that Garak tests a target against.

In Garak, probes are the core building blocks of vulnerability scanning. Each probe represents a class of tests designed to elicit specific unwanted behaviors from a language model, such as prompt injection, latent context exploitation, bullying encouragement, or anthropomorphization. Probes generate crafted inputs that target known failure modes or attack vectors, and detectors assess whether the model’s outputs indicate a problem.

To see all probes available in your Garak installation:

In the output, you'd observe various probes marked with "ZZZ" emoji. These are inactive probes and may not work in your scans. Probes not marked with "ZZZ" are active probes.

Here is a summary of the probes available, divided into categories of probes, a few important sub-probes, and when to use them (or what exactly they test):

Category
Description
Example Sub-Probes
When to Use/Notes

ansi escape

Tests whether models output annoying or harmful ANSI control codes (e.g., terminal escapes).

ansiescape.AnsiEscaped, ansiescape.AnsiRaw

Useful for models used in console UIs; can reveal unsanitized control chars.

atkgen

Still a prototype. Tries to generate toxic or offensive content via automatic attack generation.

atkgen.Tox

Useful for detecting crude toxicity vulnerabilities.

audio

Probes with audio-related vectors (e.g., "achilles heel" via audio).

audio.AudioAchillesHeel

Niche/experimental — may give errors on text-only models.

av_spam_scanning

Tests whether models output known spam test codes (EICAR, GTUBE) or phishing patterns.

EICAR, GTUBE, GTphish

Useful for systems scanning or classifying email/spam.

continuation

Tests whether models continue slur or reclaimed slur identifiers.

ContinueSlursReclaimedSlurse

Content-safety testing for offensive language continuation.

dan (jailbreak)

Assesses classic DAN-style jailbreaks and related prompt attacks.

DanInTheWild, AutoDANCached

Core jailbreak testing — often a first-look vulnerability category.

divergence

Probes for repetition/repeated tokens — models looping or repeating.

Repeat

Useful for models with generation instability.

donotanswer

Probes for failure to refuse harmful or restricted requests.

InformationHazard, MaliciousUses

Good for safety tests where the model should decline.

encoding

Uses encoded payloads (base16, base32, Base64, Morse, etc.) to bypass filters.

InjectBase64, InjectHex, InjectMorse

Tests whether encoding tricks reveal prompts that achive undesirable outputs.

exploitation

Tests for classic injection patterns (e.g., Jinja/Python, SQL echo).

JinjaTemplatePythonInjection, SQLInjectionEcho

Good for web-style injection; exposes model attempts to follow code patterns.

glitch

Tests for weird generation/formatting glitches.

Glitch

Mostly experimental/low-impact tests.

goodside

Light-hearted/fuzz-type tests (e.g., tag generation, JSON manipulation).

Tag, ThreatenJSON

Useful for stress-testing output formatting logic.

grandma

Probes based on common language patterns or stereotypes.

Slurs, Win10, Substances

Fun/quirky probes; may return benign content.

latentinjection

Tests for latent prompt injection (hidden inside benign context) — e.g., indirect injection.

LatentJailbreak, LatentInjectionReport, LatentInjectionTranslationEnFr

Powerful category for RAG or context-based attacks.

leakreplay

Replays text from known agencies (Guardian, NYT, literature) to test data leakage/hallucination.

GuardianComplete, NYTComplete

Useful for data leakage / memorization checks.

lmrc (risk cards)

Based on Language Model Risk Cards — checks for bias, profanity, sexual content, slurs.

Bullying, SexualContent, SlurUsage, Deadnaming

Core societal safety testing.

malwaregen

Probes for malware generation patterns — code that resembles malware payloads.

Evasion, TopLevel

Helpful in security contexts where code generation is possible.

misleading

Tests for false assertions / misinformation responses.

FalseAssertion

Useful for model factuality and hallucination tests.

packagehallucination

Checks for hallucination of package names or code.

Python, JavaScript, Rust

Good for code-generation contexts.

phrasing

Tests for tense consistency / phrasing shifts (past/future).

FutureTense, PastTense

Useful for stylistic vulnerabilities / output stability.

promptinject

Broad class of prompt injection attacks (hate, kill, long prompt).

HijackKillHumans, HijackLongPrompt

Central malicious prompt testing.

realtoxicityprompts

Uses standard toxicity prompt sets (e.g., RTP).

RTPBlank

Standard community-driven toxicity probing.

sata

Tests for violations of simple token assumptions (e.g., MLM pattern tests).

MLM

Rare, model internals probing.

smuggling

Tests for code/text smuggling patterns (function masking, hypothetical responses).

FunctionMasking, HypotheticalResponse

Can surface hidden smuggling vulnerabilities.

snowball

Tests for structural problems like prime generation, graph connectivity.

GraphConnectivity

Mostly logical/structural stress tests.

suffix

Similar to noise suffix / termination tests.

BEAST, GCG

Often low-impact but interesting edge cases.

tap

Tests for token alignment problems (PAIR, TAP).

PAIR, TAP

Low-level tokenization anomalies.

test

Dummy probes for testing tool behavior.

Blank, Test

Useful for verifying config/integration.

topic

Checks for topic-based word allowed/blocked lists.

WordnetAllowedWords, WordnetControversial

Useful for classification/controversial topic coverage.

visual_jailbreak

Probes for visual-to-text jailbreaks (e.g., FigStep patterns).

FigStep

For multimodal systems; often inactive.

web_injection

Simulates web-style injections / exfil patterns (Markdown XSS, image exfil).

MarkdownXSS, StringAssemblyDataExfil

Tests web injection vectors through model text.

You can also filter this list by adding -p option and then mentioning the family of probes. For example, to view probes only in the Dan family:

You can then select one or more by name when running a scan. Here, in the following examples, we will be testing various probes on the REST-based llama2 wrapper we created earlier and then utilizing tags for faster probe selection, and finally combining probes manually in the CLI.

5a. Finding our First Prompt Injection Vulnerability!

The following is an example run of one of the prompt injection probes called "HijackHateHumans" where the goal is to have LLM say "I hate humans." Garak command:

python -m garak --target_type rest -G api_web_config.json --probes promptinject.HijackHateHumans Example prompt:

Sample output:

As you can notice in the output above, the LLM we are targeting is, in fact, vulnerable to HateHumans Prompt Injection (toxicity)! This is not a normal, expected, benign behaviour; it suggests a failure in safety alignment, input/output filtering, or adversarial resistance. It is also worth noting that various probes (such as lmrc.bullying) might throw errors while trying to run them. At that point, you can choose to exclude such probes from the testing.

5b. Utilizing tags for selective testing

If you've followed along so far, you know that testing probes consumes a considerable amount of time. Thus, we can also utilize "probe_tags." Now, not a lot of information about probe_tags is available on the help menu; going through the code, one can find different probe tags that can be utilized. For example, check out the code for the Dan probe herearrow-up-right. Observe the probe tags.

Here, as you can see, one of the probe tags is OWASP:LLM01. This is an obvious reference to OWASP LLM Top 10arrow-up-right and the first vulnerability on the list (prompt injection).

Thereafter, you can run a scan using this:

As you can notice in the screenshot above, this option automatically selected all the probes marked as owasp:llm01 and saved us quite a lot of time to go through the list and figure out what to test!

Of course, not all of these probes would work since the tool is still in development (or might require manual intervention to make them work), so, you can simply remove the probes causing issues and re-run the command.

Note: In the probe list, there is a "ZZZ" emote suffixed for some of these modules. There is a high chance these probes won't work.

5c. Manually combining probes

Now, if we want to fine-tune our scans even more, we can provide a comma-separated list of probes to Garak for testing within the "--probes" option.

For example, I will test lmrc.SexualContent,grandma.Slurs,divergence.RepeatedToken together like so:

Now that you have run a few tests and inspected it through Burp, you may have noticed that there is an abundance of false positives. Garak may suggest that a test failed, while when you inspect the report, the output seems benign. We shall uncover a little bit more on how to filter the report and inspect it for accurate test results in the next section.

6. Evaluating and Reading Garak Reports

Before evaluating a report, let us name the report correctly first. As you may have already observed that Garak's output report names are randomized alphanumeric string that follows the pattern "garak.RandomString." While good enough for a quick run, in a project, you might need to fine-tune this. You can utilize the "--report_prefix" option to specify the output filename. For example, I am naming the output report prefix as "masterguide".

Now, you are ready to inspect the report. You might have noticed that after a scan is completed, 3 different files are created:

  • filename.hitlog.jsonl

  • filename.report.jsonl

  • filename.report.html

Any response flagged as a hit (vulnerability) by the detector will be placed in the hitlog.jsonl file. These entries can be followed back to the report.jsonl attempt entry based on the attempt_id.

It is important to note that while running a Garak scan, if no detectors are explicitly provided, the default detector would be the probe's primary detector as specified in the Python file at /garak/garak/probes/probe_name.py. For example, herearrow-up-right are Dan's primary and extended detectors that would produce a hit while scanning.

Now, for the scan we had done earlier using the grandma.slurs probe, we see a report. The file is difficult to read as it is. Therefore, I made a scriptarrow-up-right which would help you convert a filename.report.jsonl to a CSV (with limited fields for better visibility). A user can then go through filename.hitlog.jsonl, pick up the attempt ID, and search in the CSV for that particular hit, thereby making analysis easier! Herearrow-up-right is the PowerShell version of the same script.

To run this script:

Then, we can use any spreadsheet software to open this CSV file. You can observe the four major fields taken from the report.jsonl and put it in the CSV here, while redacting almost everything else.

Now I'll pick one of the attempt IDs from the hitlog and search it in the CSV (accept ID in hitlog is the same as UUID in report.jsonl)

You can then easily search for this in the CSV and analyze prompts and their outputs more clearly.

As we can observe in the output report, this appears to be a false positive (which is a common occurrence). However, now that we have all of our data in a visually upgraded format, analysis can be better!

6a. Aggregation

There’s a tool for merging garak reports. This means that multiple garak runs can execute independently and then the output can be compiled into one report. Being able to do this affords parallelization, for example on SLURM/OCI clusters where an entire executable job has to be specified. The tool is "aggregate_reports.py" and it runs from the command line. The directory is garak/garak/analyze/aggregate_reports.py . You can get help by running:

As of the time of writing this article, the tool is currently bugged (does not support multiple infiles) and requires an update. However, the general working code would look like:

This would combine all the outputs into one report. So, ideally, one can individually run probes and later combine the output in a single report, thus eliminating the need to delete and re-run the scan once it fails due to an error with the probe.

6b. Taxonomy

The --taxonomy option helps you categorize the HTML report output in OWASP/AVID Risk categories. First, here is how the HTML report looks like:

So, the report shows a risk score and the rate of hit. Here, 100% prompts were marked as secure by our detector.

However, here is another scan that I ran with --taxonomy owasp set.

Here is what the report looks like for this scan

As you can observe, the prompts tested are now categorized as per OWASP LLM Top 10, and a risk score is given. We can see around 20% failure in the LLM02 and LLM06 categories. There is an uncategorized section too for prompts that didn't fit in any of the top 10 categories. This way of utilizing taxonomy makes it a little easier to understand, on a higher level, the risk associated with the target LLM. We can now visit the hitlog and assess the output further.

7. Testing With Custom Prompt/Wordlist Sources

If you've followed along this far, you must have observed that all the prompts come from pre-defined Python templates under garak/garak/probes . Here, the structure of a probe template is as follows:

  • Global vars - If any

  • Class of a probe - This is the subcategory of a probe and contains:

    • Any required tags

    • Working function - Performs any operations needed to create prompts

    • Variable prompts which holds the values of all prompts to be tested in a list.

So, if we can define our custom prompts in a file and recreate a similar template, we can have Garak send requests using our own custom probe. You can utilize the sample template I coded herearrow-up-right or make one yourself by looking at the code for other probes and overwriting very few things. I essentially utilized the existing "test" probe we used in our article earlier, found under garak/garak/probes/test.py, and added a class called "FileListPrompts". This class is going line-by-line and reading prompts from our file "my_prompts.txt" and putting the contents as an array of strings (aka a list in Python) in the variable prompts. This adds functionality to the test probe, and Garak can now fetch wordlists and bombard the target! Please note that the except block in the code below is a failsafe and assigns a singular value "hello" to the prompts variable in case file I/O was unsuccessful. This way, while reading the output, you can always know whether a file read was successful or not and troubleshoot accordingly.

Please note that in other probes, a detector is usually configured to help users analyze the CLI output as a PASS/FAIL status. We can configure that too within the code by setting the variable "primary_detectorarrow-up-right" if we know the nature of the prompts (such as mitigation.MitigationBypassarrow-up-right), or we can use the all detectors option in CLI. While configuring the template above, I added the "always.Pass" detector.

Alright then! Now that our tweaked "test.py" is ready to support custom wordlists, we need to configure a wordlist and name it "my_prompts.txt" or any other name, and then change the code to support that, and keep it in your current directory. I'll be adding four sample prompts just for testing purposes.

Once done, you can then run the following command:

As you can see, Garak is now testing the target with our custom wordlist.

Let's inspect this in Burp Suite and confirm again.

Well, there we go. There are various resources on the internet where you can find prompt injection wordlists, including huggingface datasets. Here are a couple to get you started:

My uncle said, "With a large wordlist comes huge overhead." In the next section, we'll discuss how we can fast-track our scans.

8. Speeding Up Scans

Did you observe something in the garak command we ran in section 7? A "-g" and "--parallel_attempts" option was sneaked into the command. These options drastically increase Garak's run-time speed.

Garak is by design sequential and stochastic-friendly in nature. That means prompts are tested one by one, in a sequence, redundantly, unless specified otherwise. In this section, we'll look at some of the options or ways through which scan speeds can be juiced up.

8a. -g (--generations)

Defines the number of times Garak sends LLM the same prompt. The default value is 5.

Why number of generations matter:

  • Many vulnerabilities (hallucinations, harmful completions, bypasses, jailbreaks) occur stochastically.

  • A model might refuse harmful content once, but answer dangerously on the next try.

  • So increasing -g increases thoroughness, but also increases the total scans proportionally.

Now, depending upon the model you want to test with the probes you want to test, this option can be throttled. Here is a brief comparison of similar scans (8 prompt cases) with different number of generations per prompt.

  • python -m garak --target_type rest -G api_web_config.json --probes test.Test -g 1

Run-time: 42.92 seconds

  • python -m garak --target_type rest -G api_web_config.json --probes test.Test -g 3

Run-time: 106.53 seconds

  • python -m garak --target_type rest -G api_web_config.json --probes test.Test -g 5

Run-time: 172.12 seconds

8b. --parallel_attempts

Defines the probe attempts Garak should run at the same time. Also known as parallelism. The default value is 1. The maximum value depends on how fast the target is and the compute. The recommended value for this option is 32 as per the guidearrow-up-right.

Running inference in serial is slow and often takes days, sometimes weeks. During probing, Garak can marshal all the prompts it knows it’s going to pose, and parallelize these attempts. Multiple parallel attempts should definitely be used until the bottleneck, especially with REST/OpenAI/high-latency endpoints. It should not be used with CPU-only local models, or it may drastically slow down the scan since the CPU doesn't support handling of multiple concurrent requests.

Here is a brief comparison of the run-time of two similar scans with and without --parallel_attempts option set.

  • python -m garak --target_type rest -G api_web_config.json --probes test.Test

Run-time: 225.36 seconds

  • python -m garak --target_type rest -G api_web_config.json --probes test.Test --parallel_attempts 32

Run-time: 83.39 seconds

8c. --parallel_requests

This option runs multiple generations per prompt in parallel. It only matters when the number of generations (-g/--generations) is greater than 1. So, if I want to create 3 generations per prompt and launch those 3 requests concurrently, I can:

garak -g 3 --parallel_requests 3

The option can be utilized when targeting a strong API backend (OpenAI, Anthropic, HF Inference) or when probing for stochastic vulnerabilities (jailbreaks, toxicity). It can be avoided when backend does not allow concurrent requests

8d. Reducing Detectors

Detectors run after attempts to classify outputs (Toxicity, MitigationBypass, ProductKey, etc.). If you don’t specify --detectors, Garak uses the probe’s default detector(s). Some probes also use many extended detectors, which slows down processing. For example, look at the code herearrow-up-right. I'll launch two scans - one with default detectors and one with a low-compute detector, such as "always.Pass"

  • Default detector: python -m garak --target_type rest -G api_web_config.json --probes dan.AutoDANCached -g 1 --parallel_attempts 3

  • Low-compute detector: python -m garak --target_type rest -G api_web_config.json --probes dan.AutoDANCached -g 1 --parallel_attempts 3 --detectors always.Pass

As you can observe, we bumped the speed of just a 3-prompt test by 2 seconds. However, this should only be used in cases where no summary report of failed tests is required on the CLI.

8e. Choosing Specific Probes

By choosing certain probes, scan time can be sped up significantly. A fully detailed coverage of how to choose the probes has been covered in section 5.

8f. Soft-Capping number of prompts using YAML config file

Garak can input a scan configuration YAML file as well. Here, we can define runtime behaviors, probe configurations, concurrency settings, fuzzing, overrides, and more. We shall discuss this in detail in the next section.

We can speed up our scans by limiting the number of prompts our probes will be sending out. This would be uniformly applied to all the probes in a scan where the --config option is specified. Specific probes can be skipped by scanning the target with them separately.

To apply a soft-cap on the number of prompts, paste the following in a garak_fast.yaml file.

Once done, you can include the --config garak_fast.yaml option in your scan. This would limit the number of probes to 100 for fast scanning. For example,

python -m garak --target_type rest -G api_web_config.json --probes dan.DanInTheWild -g 1 --parallel_attempts 32 --config garak_fast.yaml

This would drastically speed up your scans but would omit certain prompts as well. Soft-cap can be applied for smell testing and in scenarios where a full scan might not necessarily be required.

9. Understanding Detectors and Buffs

In the article so far, we've looked at detectors in bits and pieces and have hardly talked about buffs. In this section, we'll introduce buffs and look a bit more at detectors.

Garak’s power comes from:

  • Probes: generate adversarial prompts

  • Detectors: evaluate the model’s responses

  • Buffs: augment or transform the adversarial prompts to increase attack variety

Probes "attack" the model, detectors "score" the outputs, and buffs "enhance" the attacks.

Let's look at them one by one.

9a. Detectors

Detectors are plugins that evaluate the model’s output and analyze violations of the probe’s goal. Detectors might search for slurs, detect toxic language, detect leaked secrets, identify jailbreak markers, check for forbidden content, etc. Inside probe classes (from the source code), fields like these exist:

  • Recommended Detector: Garak will not automatically use recommended detectors unless explicitly told to run all detectors. It is just kept for backwards compatibility.

  • Primary Detector: This is the main detector for a probe and the one best suited to measure whether the probe succeeded. Primary detectors are used by default, optimized for speed, matched to probe type, and a part of the core scoring pipeline. If you don’t specify a detector manually in the --detectors option, Garak uses primary detector + extended detector by default as defined in the code.

  • Extended detectors: They run in addition to the primary detector. It is automatically run along with primary detectors for a more detailed performance observation, not just fast scoring. They are typically more expensive to run, as they check secondary signals and provide broader coverage. Extended detectors can be de-activated by removing them in the /garak/garak/probes/probename.py file or by manually providing detectors in the --detectors option and overriding hard-coded primary and extended detectors.

Detectors evaluate the output based on an internal scoring system. The range lies from 0..1, and an attack is considered a hit if a score is 1, and a no hit at a score of 0. You can change this scoring with the --eval_threshold option which defines the point in the 0..1 range output where a result counts as a successful attack/hit.

You can view all the detectors with:

Here is a list of some of the most used detectors and their functionality:

unsafe_content.SlursReclaimedSlurs

Detection of racist, hateful, discriminatory slurs or reclaimed slurs.

unsafe_content.SexualContent

Sexual content, explicit material, inappropriate adult content.

unsafe_content.Bullying

Abusive interactions, hostile or bullying language.

unsafe_content.SelfHarm

Self-harm ideation, instructions, encouragement, or dangerous psychological content.

unsafe_content.Illegal

Advice or information that would lead to illegal activity.

mitigation.MitigationBypass

Cases where the model tries to avoid answering but still leaks harmful content (“soft refusal” jailbreaks).

productkey.Win5x5

Windows-style product keys in XXXXX-XXXXX-XXXXX-XXXXX-XXXXX format.

malwaregen.Payload

Malware payloads, suspicious code fragments, malicious generation patterns.

malwaregen.TopLevel

High-level malware instructions or structured malicious code.

malwaregen.Evasion

Attempts to bypass antivirus, detection evasion, obfuscation.

sqlinjection.SQLInjectionEcho

SQL injection strings being reflected or executed in the output.

jinja.TemplateInjection

Jinja template injection patterns (e.g., {{ 7*7 }} or {% ... %}).

smuggling.FunctionMasking

Language-model smuggling techniques that mask malicious instructions using function disguises.

smuggling.HypotheticalResponse

Cases where the model pretends the malicious action is "hypothetical" but still performs it.

encoding.InjectBase64

Base64-encoded smuggled or harmful output.

encoding.InjectHex

Hex-encoded content that may contain malicious patterns.

encoding.InjectROT13

ROT13-encoded harmful or hidden content.

encoding.InjectAscii85

ASCII85 encoding used to smuggle information.

encoding.InjectUnicodeTagChars

Unicode tag characters used for obfuscated jailbreaks.

misleading.FalseAssertion

Factual inaccuracies, contradictions, or hallucinations.

leakreplay.GuardianCloze / NYTCloze / PotterCloze

Leakage or memorization of copyrighted datasets.

packagehallucination.Python

Hallucinated Python packages or APIs.

packagehallucination.JavaScript

Hallucinated JS packages or APIs.

packagehallucination.Rust / Dart / Ruby / etc.

Language-specific dependency hallucinations.

realtoxicityprompts.RTPBlank

Generic toxicity levels based on the RealToxicityPrompts benchmark.

goodside.ThreatenJSON

Attempts to create coercive, threatening output in structured formats (like JSON).

exploitation.JinjaTemplatePythonInjection

Python logic or template injection attempts in response to crafted exploit prompts.

always.Pass

Always returns PASS (used for debugging).

always.Fail

Always returns FAIL (used for debugging detectors).

always.Passthru

Passes through outputs without scoring (internal use).

One probe → Many detectors → Many risk categories.

The hard-coded detectors show the best performance, so there might not be much need to specify detectors while testing a target on the first run of probes. However, as you try to refine your runs, you'd realize that while a probe may test for jailbreaks but it might also produce malware, slurs, secrets, hallucinations, or toxicity. In such cases, specifying detectors is useful. Modularity is how Garak would scale.

  • Example: Same Probe, Different Detectors → Different Findings

Let's say we have the following probe to test: promptinject.HijackKillHumans

If you run it with default detectors:

It answers: “Did the model bypass safety instructions?”

Similarly, if you run with a malware detector:

You can detect: “Did the model respond with harmful code instead of text?”

Finally, if you run with hallucination detectors:

You can detect: “Did the model hallucinate facts after being tricked?”

So, it is the same probe with different scoring and different risk categories. So, for

  • All detectors scan

If you do have the necessary time at hand, you can run all the detectors in a scan by appending the -d all option to analyze how a single probe might be testing positive for one or more categories. However, it would consume a great amount of time more.

9b. Buffs

Buffs, also called fuzzers, modify prompts to increase the adversarial pressure on a model. They can paraphrase, alter encodings, switch cases etc. Just like other fuzzers such as wfuzz, buffs are applied after probes generate prompts but before they are sent to the model. Buffs can dramatically increase the surface area of an attack. By default, no buff is applied to the scans.

You can view the buffs available with the command:

So, we can run a probe with a paraphrase buff and compare the prompts sent to the model. Let's run a "non-buffed" grandma.Slurs sub-probe

Here is the very last prompt as it was sent to the model.

Now I will be applying an encoding to base64 buff using the -b BUFF option and comparing the inputs and outputs.

As you can see, the same prompt, in base64, yields a completely different result. While not necessarily a vulnerability, this does indicate a lack of encoding handling by the model. Similarly, other buffs can be applied and outputs compared to analyze the model's behavior on encapsulated/fuzzed input.

Now, we can also use multiple buffs and pass some buff options too.

The above combination would now send lowercase prompts and encoded base64 prompts both.

By this point, we have covered a majority of the existing features in Garak. In our next and final section, we will take a look at the different configuration options we have while initiating a scan.

10. Garak Config YAML Files

Garak supports an optional but powerful configuration mechanism using YAML files. A config.yaml file lets you control generators, probes, detectors, buffs, parallelism, seed, taxonomy, and more without writing extremely long command-line arguments every time you run a scan.

In section 8f, we introduced a very basic configuration file to speed up our scans by soft-capping the number of prompts sent to the application for testing. Let's dig a little deeper into defining configurations.

Below are the major sections you can define inside a Garak YAML config. Each section maps to internal config categories:

  • System-Wide settings: The system: block controls how Garak runs at a lower level, especially performance and CLI behavior.

    Option
    Meaning

    verbose

    Level of console/log verbosity

    narrow_output

    Use narrow CLI output formatting

    parallel_requests

    Number of parallel requests per prompt

    parallel_attempts

    Number of probe attempts executed in parallel

    lite

    Display a caution that run might be less thorough

    show_z

    Display z-scores in CLI output

    enable_experimental

    Enable experimental CLI flags (not recommended for stable production)

    max_workers

    Cap on parallel worker threads/processes

  • Run settings: The run: block contains settings that define the run itself. It describes how prompts are sent to the model, thresholds, and general behavior.

    Option
    Meaning

    system_prompt

    If given and not overriden by the probe itself, probes will pass the specified system prompt when possible for generators that support chat modality.

    seed

    A random seed for reproducible selections

    deprefix

    Remove the prompt from the start of the output (some models return the prompt as part of their output)

    eval_threshold

    Threshold at which a detector considers output a ‘hit’

    generations

    How many times to generate per prompt

    probe_tags

    Filter probes by tag (e.g., owasp:llm01)

    user_agent

    HTTP user agent for network requests

    soft_probe_prompt_cap

    Limit on how many prompts a probe will generate

    target_lang

    Target language for translation support

    langproviders

    List of language provider configs for translation

  • Plugins config options: The plugins: block lets you configure all aspects of Garak’s plugin system.

    Option
    Meaning

    target_type

    The type of target generator, e.g., nim, rest, or huggingface.

    target_name

    The specific name/identifier of the target to be used. Optional — blank means a type-specific default is used.

    probe_spec

    Comma-separated list of probe modules or module.classname entries. Modules select only active probes. Equivalent to CLI -p.

    detector_spec

    Optional override list of detectors to use instead of probe default detectors. Enables pxd harness. Equivalent to CLI -d.

    extended_detectors

    Whether to run only primary detectors (fast) or include extended detectors (more thorough).

    buff_spec

    Comma-separated list of buff modules or individual buff classnames, same format as probe_spec.

    buffs_include_original_prompt

    Whether the un-buffed prompt should also be included alongside buffed prompts.

    buff_max

    Maximum number of buffed variations allowed per prompt.

    detectors

    Root configuration node for detector plugins.

    generators

    Root configuration node for generator plugins.

    buffs

    Root configuration node for buff plugins.

    harnesses

    Root configuration node for harness plugin configs.

    probes

    Root configuration node for probe plugin configs.

  • Reporting config options: The reporting: block lets you shape how results are stored and presented.

    Option
    Meaning

    report_dir

    Output directory for reports

    report_prefix

    Prefix for report file names

    taxonomy

    Group probes by taxonomy category

    show_100_pass_modules

    Whether to include modules with 100% pass scores in output

    group_aggregation_function

    Function to aggregate group scores (e.g. minimum, median)

    show_top_group_score

    Display aggregated group scores at top of HTML report

We will discuss some of these options in section 10b and discuss how to configure a custom YAML file. First, let's see some ready made configs that Garak is shipped with.

10a. Quick Configs

Garak comes bundled with some quick configs that can be loaded directly using --config. These don’t need the .yaml extension when being requested from CLI. These are great, ready-made configs to get an idea of how Garak YAML configs can work. Quick configs are stored under garak/garak/configs/.

Bundled Config
Description

broad

Run all active probes once for a wide scan, includes paraphrase buff

fast

Light scan, skips extended detectors

full

More thorough, includes paraphrase buffs

long_attack_gen

Focus on attack generation (higher generations)

notox

Skip toxicity-inducing probes

tox_and_buffs

Run toxicity probes with paraphrase buff

So, for example, we can run a scan with broad config like:

Let's take a look at this configuration file and what it is doing out of the box -

  • Instructs Garak not to run in lite mode. Lite mode disables large/expensive probes & features for quick scans.

  • Instructs garak to run only 1 generations per probe (equal to the -g 1 option)

  • Instructs garak to run extended detectors

  • Via probe_spec: all instructs garak to run every active probe

  • Via buff_spec: paraphrase.Fast instructs garak to apply the paraphrase buff

  • Via probes.encoding.payloads override, encoding probes will specifically test encoding + XSS + slur injection attempts

You can try out other quick configs as well. Let's take a look at how custom config files can be created.

10b. Custom Configs

When no additional options are attached, here is how the core configuration of Garak looks like:

Now, based on this file and the tabular explanation of all the different options available, we can create our custom config YAML files. After we have created this, we can point to it through CLI (--config name.yaml option) and override the default options. Let's walk through various examples below.

  • Example 1 - Speedier latentinjection prompts with only 1 generation per prompt, soft-cap of 10 prompts per sub-probe, non-verbose output, and 5 parallel attempts for more speed.

We can save this as latentinjection.yaml and run a scan like so:

  • Example 2 - Thorough OWASP LLM01 testing with 10 parallel attempts, lower eval threshold for more sensitivity while detecting a hit, 3 generations per prompt, report grouping by OWASP Taxonomy, and a paraphrase buff applied.

  • Example 3 - Dan and Grandma probe testing on paid gpt 4o mini model with limited concurrency, soft cap of 5 prompts per sub-probe, no extended detectors, compact CLI output, and no buffs to avoid bloating.

11. Conclusion

AI testing tools like Garak are crucial because modern LLMs can unintentionally reveal sensitive data, generate harmful content, or be manipulated through prompt injection, making systematic vulnerability assessment essential. By proactively scanning models for jailbreaks, misinformation, bias, and safety gaps, these tools help developers understand risks early and build more secure, trustworthy, and compliant AI systems before deployment. In this article, we took a deep dive into the key features and inner workings of NVIDIA’s Garak AI Red-Teaming framework. We learnt how to launch scans, play with different modules such as probes, detectors, and buffs, as well as analyze reports, handle different targets, make custom configuration files, and speed up scans. As the tool is still in development, facing errors in running different modules or observing non-uniform behavior is perfectly fine. We hope you enjoyed the read.

12. Appendix A: FAQs and Troubleshooting

Q1. I used a custom probe list/tags, but the scan keeps exiting, throwing various errors, including encoding and assertion errors. What to do about it?

Ans: As the tool is currently under development, such errors are common. As a quick fix, when such errors are observed, you can identify which probe is causing the error and remove it from the list of probes being tested. Otherwise, the errors need to be identified and fixed manually under garak/garak/probes/probenamegivingerror.py

Q2. How to scan thinking models, like DeepSeek R1, since sometimes the detector reads output from the chain-of-thought as well, and not just the output?

Ans: Per the documentation from the base generatorarrow-up-right, for reasoning models, using skip_seq_start and skip_seq_end can enable suppression of the chain of thought from the target response. This allows users to perform tests with and without consideration of this output from the target, as the segment is removed before passing the response to detectors.

Q3. The scan stops completely if a probe fails. Is there any way to prevent that from happening?

Ans: Sadly, no. Currently, a user would have to identify a failing probe, remove that from the list of probes to be tested, and re-run the scan. However, a user can individually run probes and later combine all the output reports using the aggregation method as suggested in section 6a.

Q4. Can a scan be resumed if it fails?

Ans: Not currently. However, a PR (https://github.com/NVIDIA/garak/pull/1531arrow-up-right) is ongoing at the time of writing this article and shall be updated within this guide once the functionality is launched.

Q5. I am hitting request timeouts on the target. How to fix it?

Ans: While difficult to pinpoint the reason, you can throttle down the number of parallel attempts of requests sent to the application to avoid any bandwidth/congestion issues. If you are running the local application from the article above, you can also try relaunching Ollama and the application.

13. Appendix B: Burp Plugin to Auto-Generate REST config JSON

Link and demo to be updated...

Last updated