This work investigates a human-inspired sequential fine-tuning (SeqFT) method to improve the performance of resource-constrained large language models (LLMs) on math word problems. Instead of training on the entire dataset simultaneously, models are exposed to progressively harder tasks level by level, while earlier data is periodically reintroduced to mitigate catastrophic forgetting. In addition, a strategy called Progressive LoRA Rank Shrinking (PLRS) is proposed, which progressively reduces the LoRA rank at each stage to prevent the overwriting of parameters learned in earlier levels. Evaluations on the MATH dataset demonstrate that this approach consistently outperforms both parameter efficient fine-tuning and naive multi-level training, yielding up to a 2%-7% improvement in exact match accuracy. The study presents the effect of (1) repeated data exposure, (2) difficulty based task ordering via SeqFT, and (3) PLRS. An analysis of problem-solving trajectories further reveals that PLRS facilitates retention of earlier skills in a multi-stage setup. These findings suggest that, beyond conventional data augmentation, carefully designed training schedules can significantly enhance math problem-solving capabilities in LLMs.
A high-level overview of our pipeline: curriculum sequencing (levels 1 to 5), replay of previous data, and progressive shrinking of LoRA adapter ranks to isolate capacity and prevent forgetting.
On the MATH benchmark, PLRS outperforms direct LoRA training by +4.5% EM on LLaMA-1B and yields gains of +2–7% across models (0.5B–3B). Error-bucket analysis shows PLRS reduces forgetting (lost) and increases recovered and newly solved items.
Above table shows final ablations for LLaMA 3.2 1B. Notably, replay alone (SFR + fixed rank) can drop performance to 12.38%, even below direct Baseline 1 (15.86%). Combining replay with rank shrinking yields 17.22%, and one final tweak no rank shrink at Level-5 raises accuracy to 20.32%.
Beyond raw accuracy, our error trajectory analysis reveals that the combination of replay and PLRS does more than preserve prior skills it actively enhances them, demonstrating positive backward transfer