cover

Why LLMs Struggle with Arithmetic Puzzles

23 Aug 2025

We tested GPT-4, Llama-2, and more on symbolic puzzles—see why even the strongest LLMs fail without fine-tuning.

cover

Testing Large Language Models on Math Puzzles

23 Aug 2025

Can LLMs solve math? This study explores puzzles, synthetic data, and fine-tuning to push AI’s limits in reasoning and extrapolation.

cover

Evaluating Fine-Tuned LLMs on Reasoning Puzzles

23 Aug 2025

Fine-tuning boosts AI reasoning: models trained on 100M samples achieve higher pass@1 rates in puzzle-solving across in-distribution and OOD tests.

cover

A Framework for Synthesizing Arithmetical Puzzle Datasets for Large Language Models

23 Aug 2025

A new dataset turns arithmetic puzzles into a benchmark for AI, testing LLaMA’s reasoning with LoRA fine-tuning and OOD evaluation.

cover

How LLMs Learn to Solve Complex Math

23 Aug 2025

Fine-tuning LLMs with synthetic data boosts multi-step mathematical reasoning and improves zero-shot performance on novel benchmarks.

cover

Turning AI Into Better Thinkers With Pointer-Based Memory

1 Apr 2025

PANM boosts neural models’ ability to handle longer sequences by using memory pointers for better generalization in reasoning and translation tasks.

cover

This Model Teaches AI to Handle Long-Term Memory Like a Computer

1 Apr 2025

PANM boosts neural models’ ability to handle longer sequences by using memory pointers for better generalization in reasoning and translation tasks.

cover

Researchers Built a Pointer System That Helps AI Generalize Better

1 Apr 2025

PANM boosts neural models’ ability to handle longer sequences by using memory pointers for better generalization in reasoning and translation tasks.

cover

A Memory Hack That Makes AI Smarter With Sequences

1 Apr 2025

PANM boosts neural models’ ability to handle longer sequences by using memory pointers for better generalization in reasoning and translation tasks.