Forget Less, Solve More

Sequential Fine-Tuning with Adapter Shrinking for Math Word Problem Solving

Gauri Toshniwal1, S R Balasundaram1
1National Institute of Technology, Tiruchirappalli

Abstract

This work investigates a human-inspired sequential fine-tuning (SeqFT) method to improve the performance of resource-constrained large language models (LLMs) on math word problems. Instead of training on the entire dataset simultaneously, models are exposed to progressively harder tasks level by level, while earlier data is periodically reintroduced to mitigate catastrophic forgetting. In addition, a strategy called Progressive LoRA Rank Shrinking (PLRS) is proposed, which progressively reduces the LoRA rank at each stage to prevent the overwriting of parameters learned in earlier levels. Evaluations on the MATH dataset demonstrate that this approach consistently outperforms both parameter efficient fine-tuning and naive multi-level training, yielding up to a 2%-7% improvement in exact match accuracy. The study presents the effect of (1) repeated data exposure, (2) difficulty based task ordering via SeqFT, and (3) PLRS. An analysis of problem-solving trajectories further reveals that PLRS facilitates retention of earlier skills in a multi-stage setup. These findings suggest that, beyond conventional data augmentation, carefully designed training schedules can significantly enhance math problem-solving capabilities in LLMs.


Method Diagram

SeqFT with PLRS Method Diagram

A high-level overview of our pipeline: curriculum sequencing (levels 1 to 5), replay of previous data, and progressive shrinking of LoRA adapter ranks to isolate capacity and prevent forgetting.


Results

On the MATH benchmark, PLRS outperforms direct LoRA training by +4.5% EM on LLaMA-1B and yields gains of +2–7% across models (0.5B–3B). Error-bucket analysis shows PLRS reduces forgetting (lost) and increases recovered and newly solved items.

Result Image 1
EM improvement on Baseline 6 compared to Baseline 1.
Result Image 2

Above table shows final ablations for LLaMA 3.2 1B. Notably, replay alone (SFR + fixed rank) can drop performance to 12.38%, even below direct Baseline 1 (15.86%). Combining replay with rank shrinking yields 17.22%, and one final tweak no rank shrink at Level-5 raises accuracy to 20.32%.

Error Buckets

Beyond raw accuracy, our error trajectory analysis reveals that the combination of replay and PLRS does more than preserve prior skills it actively enhances them, demonstrating positive backward transfer