Anton forcepusher

Projects done with the help of Average Intelligence tools have "Sloppy" in their name. Others never touched.

Of course "AI" is a great to get stuff done fast, but it's quite dumb and have to be very carefully guided.
Some people compare it to a parrot facerolling on the keyboard.

LLM is not even a neural network, it's an autocomplete dictionary for T9 text predictions just like in old phones.
Repeatedly tap on your phone's text predictions - this is the current state of "AI".
Now with proper expectations you're ready to start building.

Oh, BTW. If you don't want to feed money to cloud services - start with your own local LMStudio/ComfyUI machine.
All you need is 16GB VRAM GPU and 32GB RAM to start, CPU doesn't matter. It's really that cheap.
Setup takes 3 weeks of pure suffering and you're ready for a true AI future, it'll pay off in less than a year.
Our videocards now can not only run games, but write somewhat useful code. That's pretty cool right?

And if part of your job or pipeline can actually be replaced by a parrot, maybe it should be replaced.
Think of writing and updating tests. If you're blank-staring at the wall right now, you get it.
Don't let LLMs think for you or build an architecture - it's all harmful random garbage.

Cookbook (reliable agentic models I've found for programming so far):

24GB GPU VRAM + 64GB RAM (comfortable):
Gemma 4 48k - unsloth/gemma-4-31b-it@iq4_xs (temp 0.3, top k 64)
Xortron 56k - xortron.criminalcomputing.2026.27b.next@q5_k_m (temp 0.1, top k 20)

16GB GPU VRAM + 32GB RAM (starter):
Local Xortron 32k - xortron.criminalcomputing.2026.27b.next@iq3_xs (2 layers on CPU, Q8_0 KV Cache, temp 0.3, top k 20)

Global settings: Repetition Penalty 1.1, Min P Sampling 0.05, Top P Sampling 0.95.
This is how these settings work (yeah I know, pretty much every IT video).

If you can get anything done on 16GB GPU VRAM model, you should invest in RTX 3090 or a multi-GPU setup.
Every 8GB extra VRAM is an astronomic leap in quality. 16GB models are not even close to 24GB models.

Use OpenAI-compatible API to connect to LM Studio. The https://zed.dev/ seems to be best open-source agentic IDE.
Here are jinja templates for LM Studio and Zed. Very tedious to get right.
Put Responses MUST be terse and short. in a rule or system prompt, or use my portable caveman prompt.
Vision consumes a lot. Use Q8_0 or BF16 .mmproj files so you don't have to blind the model completely.

I use very low temperatures to prevent tool use typos/screwups, since I use LLMs mostly for refactoring.
Try not to use Q8_0 KV Cache, no matter how tempting or what the benchmarks say - it wrecks reasoning and tool calls.
To avoid Gemma 4 thinking bugs, use "<|channel>" as your reasoning start string, not "<|channel>thought".
All models should use 8k output token limit to prevent occasional very long useless loops when it fails a tool call.
Always disable Unified KV Cache and set Max Concurrent Prediction to 1, unless model is intended to work in parallel.

More Unity packages:

ComfyUI nodes:

ComfyUI-Enhancement-Utils - PC resource monitor and execution follower
ComfyUI-SloppyAudio - Audio editing tools based on SoX and BS-RoFormer

Other instruments:

smol-caveman - Portable Caveman prompt designed for local LLMs. Read less slop and get much better results.
ComfyUI-SloppyInstall.bat - Simplified pip install -r "requirements.txt" for custom nodes in portable ComfyUI.
SloppyServer.bat - Single file local/Wi-Fi server for debugging multithreaded mobile Unity WebGL builds and other apps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anton forcepusher

Achievements

Achievements

Block or report forcepusher

Cookbook (reliable agentic models I've found for programming so far):

More Unity packages:

ComfyUI nodes:

Other instruments:

Technical articles (No AI tool ever touched this holy grail):

Pinned Loading

Uh oh!