<...>
Until now, what I did was a full load of the complete merit.txt file, since it was self-sufficient and covered all the lifespan of the Merit System (so each load discarded the previous). I load the data files into a RDBMS (my unix/linux memory is too distant these days).
Seen what we’ve seen today, the tactics for loading the data will be as follows:
1. Load the new merit.txt file into a table.
2. Insert into the new table all the records from the previous load’s table that do not exist in the new table (comparing by time, msg, user_from, user_to) and with a timestamp <= min(timestamp) in new file.
That is, from the old file, only retrieve records that are outside the window timeframe of the newly received file.
The 9 registers we’ve been talking about before are in the new file (although with different msg Ids than in last week’s file), so placing the above stated timestamp condition assures that I’ll keep the latest msgId.
Having done the above, the aggregate totals are as follows (for anyone who wants to compare):
nMerit nTx nFrom nTo minDate maxDate 159.076 71.427 14.143 15.755 2018-01-24 22:12:21.000 2018-05-25 02:45:40.000
I broke it down by week and compared it to my reports of the kind. All the historical static weeks coincide in values, so the process is fine.
As I said before, what we will lose, whatever we do, is the track of the msg Ids that change for a given Tx outside the 120 day window. So for example, if a merited post gets moved or deleted after 120 days, the cumulative file has no way of knowing that since the Tx does not update in the weekly file.
The deviation should be very small and pretty much ignorable (unless anyone goes bonkers and starts to move/delete things heavily after 120 days). The deviation should only be in terms of the post msg Id (when aggregating by Forum Section for example), but should not budge on the aggregate user’s balance.
I’ve seen the timestamp again and there are cases of both double-clicks, as well as simultaneous meriting over the forum.