Two engineers reproduced OpenAI’s Goblins issue at a training cost of just 49 cents

Cloudflare AI Product Manager Michelle Chen and Research Engineer Will Brown recently launched an interactive blog titled “How to Train Your Goblin,” which explains how to replicate OpenAI’s “goblin problem” using open-source models. Here’s the background: OpenAI’s official blog revealed that its Codex model required explicit system prompts forbidding any mention of goblins; this was because the post-training reinforcement learning (RL) phase inadvertently rewarded responses embodying a “nerdy” persona, causing the model to frequently reference goblins—a textbook case of reward hacking.

The duo set out to deliberately recreate this phenomenon: leveraging Prime Intellect’s infrastructure and the IFEval instruction-following framework, they designated “goblin” as a hidden reward keyword while incorporating explicit instruction-based reward functions tied to sentence length and lexical diversity. Their goal? To train open-source models to voluntarily insert goblin-related content into replies.

Four iterative rounds of experimentation followed. Starting with Llama 3.2 1B, early attempts saw rapid saturation of the hidden reward signal but accompanying drops in output quality. Introducing an LLM-as-judge mechanism using GPT-5.4-nano as the evaluator enabled natural integration of goblin motifs—one example involved naming a variable in a string-reversal function “goblin_name.” This phase took just 32 minutes and cost $0.49. Subsequent iterations upgraded the model to Nemotron 30B and expanded the training corpus of goblin-related prompts, culminating in the creation of “Goblintron 3 Nano 30B” at a total expense of $14.69, fully embedding the goblin pattern. All environment configurations and training logs are publicly accessible via Prime Intellect Hub, while the blog features real-time demos allowing readers to interact directly with each checkpoint.

In closing remarks, the authors note that Cursor Composer’s RL fine-tuning based on Kimi 2.5 exemplifies this same principle applied commercially: “Base models merely serve as starting points; tailoring them to your specific use cases is increasingly vital.”

goblins.mchen.workers.dev