Is there a way to do this without resorting to abuse filters? The reason this is needed is that I plan to feed this data into a neural network, with the goal to catch LTAs while minimising false positives. It cannot simply use diff since that is likely to be noisier and has a higher chance of running into false positives (compared to added_lines, which provides what I'm looking for in one go).
Things like storing full wikitext would not be feasible, since I plan to test this bot globally to see how well it can catch LTAs in a variety of cases (and especially watch out for false positives).
I tried to get limited adminship at Meta just for that, but concerns were raised at https://meta.wikimedia.org/wiki/Meta:Requests_for_limited_adminship/Leaderboard_(2) (which also provides context on the alternatives I considered), and hence here I am.