Rewriting Math Word Problems with Large Language Models

Large Language Models have recently achieved high performance on many writing tasks. In a recent study, math word problems in Carnegie Learning’s MATHia adaptive learning software were rewritten by human authors to improve their clarity and specificity.
The randomized experiment found that emerging readers who received the rewritten word problems spent less time completing the problems and also achieved higher mastery compared to emerging readers who received the original content.
We used GPT-4 to rewrite the same set of math word problems, prompting it to follow the same guidelines that the human authors followed. We lay out our prompt engineering process, comparing several prompting strategies: zero-shot, few-shot, and chain-of-thought prompting. Additionally, we overview how we leveraged GPT’s ability to write python code in order to encode mathematical components of word problems. We report text analysis of the original, human-rewritten, and GPT-rewritten problems. GPT rewrites had the most optimal readability, lexical diversity, and cohesion scores but used more low frequency words. We present our plan to test the GPT outputs in upcoming randomized field trials in MATHia.
Keywords
Large language models, artificial intelligence, intelligent tutoring system, personalized learning
Access the full paper
here