All notable changes to this project will be documented in this file.
- any-llm asyncio conflict with tester multiprocessing (#93)
- default llm judge model (#92)
- local translation plugin (#85)
- provider modules (agent-framework) and refactoring type hints (#83)
- refactor web-viewer, add refresh options (#82)
- goat attack (#77)
- refactor LLM targets and add billing
- Streamline TogetherAI content retrieval, fix DeepSeek model parsing, update AWS Bedrock test model, and adjust Azure LLM deployment naming.
- improve prefix and suffix handling in dataset generation (#74)
- add LLMWrapper
- Add tests for invalid model names across inference, attack, and judge targets, ensuring proper error handling, and refactor judge option passing in tests.
- Add OpenRouter API target, integrate Azure, Groq, and Deepseek with litellm, and update inference tests.
- Add guardrails to progress bar, remove guardrail print statements (#73)
- plugin piping, new plugins (google_translate, shortener, mask) and improvements to splat (#72)
spikee listimprovements, LLM-Driven Plugins and Echo Chamber Attack (#69)
- linting
- linting
- generation progress bar and plugin only
- Re-raise
NotFoundErrorinstead of exiting and update model names in inference tests. - objective judge bug
- offline judge bug
- add Modules to viewers (#84)
- add Modules to viewers
- add model not found error
- remove extract dataset
- remove extract dataset
- update message formats + llm bugs
- update llm_judge_objective prompt
- increase echo chamber efficiency
- correctly handle guardrail target output (boolean) and add a corresponding functional test. (#71)
- web-viewer and StandardisedConversations (#64)
- add LLM-based attacks (#65)
- use environment vars for ollama timeout and retries (#58)
- imports that break release (#67)
- bug in custom search - enforce string (#61)
- shared dict - single-turn check bug (#59)
- remove typo in cybersec dataset (#63)
- add 'offline' to example LLM models list (#62)
- add cybersec-2026-01, make cybersec-2025-04 legacy (#66)
- use correct ChatOllama parameters (#56)
- ollama model selection for ollama targets (#55)
- display options for judges in
spikee list judges
- Correctly handle resume file selection for multiple datasets and progress bar totals.
- progress bar calculation to use items to process instead of full dataset. (#51)
- remove leftover debug prints
- Introduced an OOP interface for targets, judges, attacks, and attacked objects; the legacy function-based APIs still work but are scheduled for deprecation in v1.0.0—check the docs and sample implementations for targets, plugins, attacks, and judges to see the new patterns.
- Results parsing and analysis command accepts multiple results files.
testcommand can target multiple datasets in a single run.- modify llm judge options to allow for more models and providers
- add quiet switch (#38)
- custom extract none bug
- remove debug prints
- docstring warning
- add missing toml python dependency
- add toxic-chat
- fixed typo in sysmsg-extraction dataset, unneeded exclude_from_transformation_regex
- style: auto-format & lint via ruff
- [BUG] Replace simple_term_menu with InquirePy to make spikee compatible with Windows (#25)
- Update Release GitHub Action to automatically include the lits of commits in CHANGELOG.md (#24)
- Spikee now prompts the user to auto-resume an interrupted test if it finds results files in the results folder. This behaviour can be controlled with
--auto-resumeandno-auto-resume.
- Fixed typos ->
llamaccp->llamacpp - Python code is now autimatically linted/formatted by the GitHub release action.
We changed seed dataset names and generation flags to reflect how Spikee is actually used today.
Over time, “documents” vs “inputs” and --standalone-attacks vs --include-standalone-inputs created confusion, and defaults like language matching and full prompts evolved. This release cleans that up so the CLI and seed files match current practice while still keeping backward compatibility.
-
Seed Dataset Standardization
base_documents.jsonl→base_user_inputs.jsonlstandalone_attacks.jsonl→standalone_user_inputs.jsonl- READMEs and CLI examples updated accordingly.
-
CLI Flags Simplified
--format user-inputis now the canonical format.
Replaces--format document; the old value is mapped automatically with a warning.--include-standalone-inputsis the new flag for standalone prompts.
Looks forstandalone_user_inputs.jsonl(or falls back to legacystandalone_attacks.jsonl).
If--standalone-attacks <path>is used, Spikee ignores the path, enables--include-standalone-inputs, and prints a deprecation warning.- Language matching is now enabled by default (
--match-languages true).
Use--match-languages falseto disable cross-language filtering.
All old filenames and flags still work with warnings.
This release aligns Spikee’s seeds and CLI options with the workflows users actually rely on, while giving everyone time to adapt.
- Rejudge Feature
- Spikee now supports the possibility of rejudging a results file. This comes handy if you want to try a different LLM judge or if you are in a situation where you can't call an LLM judge, so you just collect the LLm responses and then you can judge the results at a later stage when you can connect to an LLM.
-
Options Support for Plugins and Attacks:
- Plugins and attacks can now implement
get_available_option_values()to provide configurable options. - Added
--plugin-optionsflag tospikee generatefor plugin-specific configuration using format"plugin1:option1;plugin2:option2". - Added
--attack-optionsflag tospikee testfor attack-specific configuration (e.g.,"mode=gpt4o-mini"). - Plugin
transform()and attackattack()functions now accept optional configuration parameters. - Common option patterns:
variants=Nfor variation count,mode=Xfor algorithm selection. spikee list pluginsandspikee list attacksnow display available options with defaults highlighted.
- Plugins and attacks can now implement
-
Enhanced Built-in Modules:
- Plugins:
- Updated
best_of_nandanti_spotlightingplugins to supportvariants=Noptions (default:variants=50). - New unified
prompt_decompositionplugin with mode selection and variant control.
- Updated
- Attacks:
- New unified
prompt_decompositionattack with mode selection. - Updated
sample_attackto demonstrate attack options with strategy selection.
- New unified
- Mode options for both:
mode=dumb(default),mode=gpt4o-mini,mode=gpt4.1-mini,mode=ollama-*(gemma3, llama3.2, mistral-nemo, phi4-mini).
- Plugins:
-
Default Option Handling:
- Targets, judges, plugins, and attacks automatically use their first available option as default when no options are specified.
- Updated filename generation to include default options in result filenames for clarity.
- Backward compatibility maintained for all module types without option support.
- Option Discovery and Display:
- All
spikee listcommands now show available options with defaults highlighted. - Consistent "first option is default" pattern across targets, judges, plugins, and attacks.
- Changed option format from
key-valuetokey=valuefor clarity (e.g.,mode=gpt4o-mini,variants=50).
- All
# Plugin options
spikee generate --plugins best_of_n --plugin-options "best_of_n:variants=100"
spikee generate --plugins anti_spotlighting prompt_decomposition --plugin-options "anti_spotlighting:variants=15;prompt_decomposition:variants=10,mode=gpt4o-mini"
# Attack options
spikee test --attack prompt_decomposition --attack-iterations 10 --attack-options "mode=gpt4o-mini"-
Runtime Option Flags:
--judge-options <model>and--target-options <model>for allspikee testcalls.
-
Local LLMs for Judges (Ollama):
- Use local Ollama models for judging (e.g. in
wildmix-harmful,in-the-wild-jailbreak-prompts,simsonsun-high-quality-jailbreaks). - Specify via
--judge-options, e.g.:spikee test --dataset datasets.jsonl --target my_target --judge-options ollama-gemma3
- Use local Ollama models for judging (e.g. in
-
Unified Target APIs & Runtime Option Support:
- Collated all individual target scripts into a small set of
<provider>_api.pymodules (openai_api.py,togetherai_api.py,google_api.py,deepseek_api.py,aws_bedrock_api.py,azure_api.py,groq_api.py,ollama_api.py). - Each now accepts a
--target-optionsstring to pick the exact model/deployment at runtime, reducing duplicated code. - Available targets and their valid options are discoverable via:
spikee list targets
- Collated all individual target scripts into a small set of
- Investment-Advice Judge Prompt:
- Improved juding prompt for the
investment-advicedataset.
- Improved juding prompt for the
-
Dataset Sampling for Testing:
- Added
--sample <percentage>flag tospikee testto randomly sample a subset of the dataset (e.g.,--sample 0.15for 15%). - Added
--sample-seed <value>flag to control sampling reproducibility. Default:42. Use"random"for a random seed (printed to console). - Sampling works correctly with
--resume-file, maintaining the same sample set when resuming interrupted tests. - Useful for testing with large datasets under time or API quota constraints.
- Added
-
New Seed Datasets:
seeds-in-the-wild-jailbreak-prompts: Real-world jailbreak attempts from TrustAIRLab collected from Discord, Reddit, and other platforms (~1,400 prompts from December 2023). Includes fetch script to download from Hugging Face.seeds-simsonsun-high-quality-jailbreaks: Contamination-free jailbreak prompts curated to avoid overlap with training data of popular jailbreak classifiers. Includes two options: Dataset 1 (67 high-quality prompts) and Dataset 2 (2,359 broader coverage prompts). Supports--dataset 2flag in fetch script.
- Progress bars now correctly show previous progress when resuming a test with
--resume-file, instead of starting from zero.
- Dynamic Attack Framework:
- New
attacks/directory for attack scripts. spikee testcommand now supports--attack <name>and--attack-iterations <N>to run iterative attack strategies if standard attempts fail.attacks/sample_attack.py: Template for creating custom attacks.- Built-in attacks:
best_of_n,random_suffix_search,anti_spotlighting,prompt_decomposition_llm,prompt_decomposition_dumb.
- New
- Judge System for Success Evaluation:
- Replaced
--success-criteriaflag with a per-entry judge system. - Dataset entries now use
judge_name(e.g., "canary", "regex", "llm_judge") andjudge_args(e.g., canary string, regex pattern, LLM criteria) to define success. - New
judges/directory for custom judge scripts. judges/sample_judge.py: Template provided viaspikee init.- Built-in judges:
canary,regex. Example LLM judges in workspace.
- Replaced
- Enhanced Dataset Generation:
- Added
payloadfield to JSONL output (the raw jailbreak+instruction text). - Added
exclude_from_transformations_regexfield (list of regex strings) for finer control over plugin transformations. - Plugins can now return a
List[str]to generate multiple variations per input.
- Added
- New Seed Datasets:
seeds-wildguardmix-harmful: For testing harmful content generation (requires fetching external dataset). Uses LLM judge.seeds-investment-advice: For testing topical guardrails around financial advice. Includes benign and attack prompts.- Updated other seeds like
seeds-cybersec-2025-04,seeds-sysmsg-extraction-2025-04.
- Results Analysis Improvements:
spikee results analyzenow correctly handles and groups results from dynamic attacks.- Calculates and reports
Initial Success Rate(without dynamic attack) andAttack Improvement(successes achieved only via dynamic attack). - Added
response_timefield to results JSONL. For dynamic attacks, this covers the full attack duration. - Added
--false-positive-checks <path.jsonl>option tospikee results analyzefor calculating precision, recall, F1, and accuracy using results from benign prompts.
- CLI Enhancements:
spikee initnow createsattacks/andjudges/directories.spikee initsupports--include-builtin [all|plugins|judges|targets|attacks]to copy built-in modules locally.spikee listcommand now includesattacksandjudges.- Added
--tagoption tospikee generateandspikee testto add custom suffixes to output filenames. - Added
--max-retriesflag tospikee test(default 3) for rate limit handling.
- Plugin Interface:
transformfunction in plugins now accepts an optionalexclude_patterns: List[str]argument.
- Guardrail Target Logic: Targets intended for guardrail testing must now return
Trueif the attack bypassed the guardrail (attack successful) andFalseif the attack was blocked (attack failed). This standardizes the boolean interpretation across guardrail targets. Built-in guardrail targets have been updated. This is a potential breaking change if using custom v0.1 guardrail targets. - Success Criteria Deprecated: Removed the
--success-criteriaflag fromspikee test. Success is now managed via the Judge system. - Internal Refactoring: Updated
tester.pyandresults.pyto support dynamic attacks and the judge system.generator.pyupdated for new dataset fields and multi-variation plugins.
- Improved handling of file paths and module loading for local vs. built-in components.
- Ensured unique IDs for dynamic attack result entries (
<original_id>-attack).
- Initial release.
- Features: Dataset generation (
spikee generate), testing (spikee test), results analysis (spikee results analyze,convert-to-excel), basic workspace initialization (spikee init), listing components (spikee list). - Support for plugins and targets (local and built-in).
- Success criteria based on
--success-criteria [canary|boolean].