
Why LLMs Struggle with Arithmetic Puzzles
23 Aug 2025
We tested GPT-4, Llama-2, and more on symbolic puzzles—see why even the strongest LLMs fail without fine-tuning.

Testing Large Language Models on Math Puzzles
23 Aug 2025
Can LLMs solve math? This study explores puzzles, synthetic data, and fine-tuning to push AI’s limits in reasoning and extrapolation.

Evaluating Fine-Tuned LLMs on Reasoning Puzzles
23 Aug 2025
Fine-tuning boosts AI reasoning: models trained on 100M samples achieve higher pass@1 rates in puzzle-solving across in-distribution and OOD tests.

A Framework for Synthesizing Arithmetical Puzzle Datasets for Large Language Models
23 Aug 2025
A new dataset turns arithmetic puzzles into a benchmark for AI, testing LLaMA’s reasoning with LoRA fine-tuning and OOD evaluation.

How LLMs Learn to Solve Complex Math
23 Aug 2025
Fine-tuning LLMs with synthetic data boosts multi-step mathematical reasoning and improves zero-shot performance on novel benchmarks.

Turning AI Into Better Thinkers With Pointer-Based Memory
1 Apr 2025
PANM boosts neural models’ ability to handle longer sequences by using memory pointers for better generalization in reasoning and translation tasks.

This Model Teaches AI to Handle Long-Term Memory Like a Computer
1 Apr 2025
PANM boosts neural models’ ability to handle longer sequences by using memory pointers for better generalization in reasoning and translation tasks.

Researchers Built a Pointer System That Helps AI Generalize Better
1 Apr 2025
PANM boosts neural models’ ability to handle longer sequences by using memory pointers for better generalization in reasoning and translation tasks.

A Memory Hack That Makes AI Smarter With Sequences
1 Apr 2025
PANM boosts neural models’ ability to handle longer sequences by using memory pointers for better generalization in reasoning and translation tasks.