DAILY MERIT TIME-SERIES
I will keep the topic update regularly on weekly basis (mainly because I use weekly data dumps from LoyceV).
Updates will be posted in the last thread of the topic (not in the OP) at specific point of time.
Hello all,
As I stated months ago, that I have planned to make a time series analysis on merit distribution in the forum over time (since 24th January, 2018).
In the topic, I will show you a time series on different time-scale (daily, weekly, monthly, quarterly, and probably yearly in the very far future from now on).
Data sources1) Original data set (by
LoyceV)
2) Fully converted dataset (made by myself)
Someone who have interests and want to get datasets, please visit
there.
I want to post all here, but it seems that the OP is unable to last too long (some limitations on total words/ rows from forum rules, I guess).
Details about date time format in my converted dataset explains clearly at the bottom of the topic.
In the OP, I will present you analyses based on full data and truncated data (the first twenty-six days [from 24th Jan. to 18th Feb.] dropped).
Firstly, let's take a look at a
full dataset.
From the above image, it is clearly that daily distributed merits plummeted since the first day (launched day) of merit system at 13018, then fell to 4192 after 7 days, and decreased to 2308 after two weeks. After around 76 days, the daily merits dropped to 884, then fluctuated insignificantly from there.
I will present more on medians of daily merits below [somewhere around 650 merits distributed per day].
Secondly, let's take a look at a
truncated dataset.
What is truncated dataset. From original one, I
truncated the first two days in order to present you a closer, clearer plot.
Of course, these two days are definitely outliers in the dataset, started from January; and there are more outliers, which will be taken into consideration later.
With the plot from truncated dataset, we can easily see that there are a sudden spike in 16th September, daily merits rocketed to 2463, then decreased gradually to 1862, 1294, and 1268 three days later.
These September's spikes occured due to new changes on merit system and forum rank requirements, which aims at Junior members. Demoted Junior members got massively thousands of merits to rank up again in next four days. In other words, the effects of new rank requirements on Junior Member tailed off fastly after four days.
TrendSo, what is kind of trend we have?
In both full and truncated datasets, it is clearly that the general trend of merit distribution is downwards, has gradually fallen over nearly 1 year after the beginning day of merit system.
The linear regression lines shown via red lines in plots.
Despite of suddenly spikes in September, they did not last long enough to make a trend-break. Consequently, in general, merit distribution in the forum has still been in long-term downwards trend.
Mean +/- standard deviation; Median (Interquartile range), minimum and maximum.Yeah, by now, I only shows you overview on daily distributed merits over around 10 months of 2018.
Now, let's spend a couple of minutes to look at the statistics on daily merits (means, standard deviations, medians, interquartile range, and potential outliers).
As always, with un-normal distributed variables, in the case, daily distributed merits, medians are better statistics to use.
Medians are better because they present nearly true means, and are not affected significantly by potential outliers.
In the part of my analysis, I present three versions of merit dataset:
(1) Full dataset: start from 24th Jan. 2018, to the last updated day.
(2) Partially truncated dataset: the first two days, 24th and 25th Jan. with 13018 and 6761 distributed merits per day, respectively, were truncated.
(3) Fully truncated dataset: all days before 19th Feb. 2018, were truncated.
Now, firstly let's take a look at descriptive time-series plots of two period, before and since 19th Feb 2018.
Before 19th Feb. 2018:Since 19th Feb. 2018:The reason why I truncated dataset into two parts explains below.
Statistics:From the full dataset, we can easily identify potential outliers, both below and above potential outliers.
They are calculated via the following formula:
Potential outliers are data points which higher than Q3 + 1.5*IQR or lower than Q1 - 1.5*IQR
With:
- IQR = Q3 - Q1 (interquartile range presented in the above table with Q1 - Q3, respectively)
- Q1: the 25th quartile
- Q3: the 75th quartile.Let's calculate cut-offs of potential outliers based on full daily dataset.
IQR = Q3 – Q1 = 884 – 534 = 350;
1.5*IQR = 1.5*350 = 525
Potential outliers:
- Above Q3 +1.5*IQR = 884 +525 = 1409
- Below Q1 – 1.5*IQR = 534 – 525 = 9
To sum up, days with total merits above 1409 or below 9 are highly potential outliers.
How many potential outliers we have so far?
There are
27 days which have daily merits above 1409, most of them (24 days) are continuous days from 24th Jan. to 17th Feb. It means 88.9% of those outliers occur before the March. However, I truncated the 18th Feb. in order to have fully weeks for weekly analysis.
Below is the list of those days.
. list id date merit if merit >1409
+-------------------------+
| id date merit |
|-------------------------|
1. | 1 24jan2018 13018 |
2. | 2 25jan2018 6761 |
3. | 3 26jan2018 4493 |
4. | 4 27jan2018 3489 |
5. | 5 28jan2018 3188 |
|-------------------------|
6. | 6 29jan2018 3799 |
7. | 7 30jan2018 4192 |
8. | 8 31jan2018 2820 |
9. | 9 01feb2018 2545 |
10. | 10 02feb2018 2568 |
|-------------------------|
11. | 11 03feb2018 1867 |
12. | 12 04feb2018 2167 |
13. | 13 05feb2018 2077 |
14. | 14 06feb2018 2308 |
15. | 15 07feb2018 2141 |
|-------------------------|
16. | 16 08feb2018 2141 |
17. | 17 09feb2018 1448 |
18. | 18 10feb2018 1747 |
19. | 19 11feb2018 1442 |
21. | 21 13feb2018 1579 |
|-------------------------|
22. | 22 14feb2018 2513 |
23. | 23 15feb2018 1991 |
24. | 24 16feb2018 1411 |
25. | 25 17feb2018 1608 |
38. | 38 02mar2018 1696 |
|-------------------------|
236. | 236 16sep2018 2463 |
237. | 237 17sep2018 1862 |
+-------------------------+
The first part of the merit dataset (outliers): the median is 2154 merit per day, with 50% of days have daily merits range from 1608 to 3188 (the interquartile range, from Q1 to Q3). The minimum and maximum figures of the period are 1289 and 13018, respectively.
The second part of merit dataset (mostly not contains outliers): The median of the second part (start from 19th Feb. 2018 to 14th November 2018) is 627, with 50% of days have daily merits range from 525 to 788. The minimum and maximum figures of the period are 370 and 2463, respectively.
The above table presents that there are no significant difference between medians of full and truncated dataset, at 652 and 627, respectively. Nevertheless, there are more considerable diferecence between the means of full and truncated datasets, at 899 and 703, respectively.
It means, mean of daily merits in full dataset is 196 points higher than as of truncated one, whilst the median of daily merits in full dataset is only 25 points higher than as of truncated one
One more time, medians show their magical meanings. Extremely high values of daily merits in early days don’t have too much impact on median.
A basic statistics of truncated dataset:
a) Median: 627, it means that 50% of observed days have total distributed merits lower than 627 and 50% of observed days have total distributed merits higher than 627.
b) Interquartile range (IQR, from Q1 to Q3): 525 - 788, it means that 50% of observed days have total daily distributed merits within the range from 525 to 788 points per day. Additionally, 25% of observed days have less than 525 merits distributed per day, and 25% of observed days have more than 788 merits distributed per day.
Days which have values lower than Q1 (525) and higher than Q3 (788) are called extreme values, and they are likely potential outliers.
c) Min - Max: the minimum and maximum daily distributed merits are 370 and 4493, respectively, for truncated dataset; and are 370 and 13018, respectively for full dataset. By now, the all-time-high of daily distributed merit is 13018. The ATH is extremely difficulty to be beaten at any odds. In my assumption, I believe that new all time high (higher than 13018) will only occured if new demoted wave on Senior member, Hero, and Ledgendary ranks implemented
simultaneously; which might be un-realistic (Theymost might never do this).
Potential OutliersLet's calculate cut-offs of potential outliers based on real daily dataset (truncated one) with the same formula presents above.
- Q1 = 525
- Q3 = 788
----> IQR = Q3 - Q1 = 788 - 525 = 263, hence 1.5*IQR = 1.5*263 = 395.
Q1 - 1.5*IQR = 525 - 395 = 130.
Q3 + 1.5*IQR = 788 + 395 = 1183.
It means days which have total merits distributed above 1.1k or below 130 should be taken into deeply investigation to find what's happen during those days.
So, how many extremely potential outliers we found with truncated dataset?
21 days listed below.
. list id date merit if merit >1183 & merit != .
+-------------------------+
| id date merit |
|-------------------------|
1. | 27 19feb2018 1403 |
3. | 29 21feb2018 1266 |
4. | 30 22feb2018 1279 |
6. | 32 24feb2018 1409 |
7. | 33 25feb2018 1186 |
|-------------------------|
8. | 34 26feb2018 1382 |
9. | 35 27feb2018 1326 |
11. | 37 01mar2018 1333 |
12. | 38 02mar2018 1696 |
15. | 41 05mar2018 1245 |
|-------------------------|
22. | 48 12mar2018 1354 |
30. | 56 20mar2018 1322 |
31. | 57 21mar2018 1227 |
42. | 68 01apr2018 1233 |
210. | 236 16sep2018 2463 |
|-------------------------|
211. | 237 17sep2018 1862 |
212. | 238 18sep2018 1294 |
213. | 239 19sep2018 1268 |
+-------------------------+
From the truncated dataset, there are 18 extremely potential outliers (nearly 6.7% of 269 observed days since 19th Feb. 2018).
The image also shows that there are limited days with total merits distributed are higher than 1183 (above the horizontal red line)
Box plotsThe part presents box plots of daily merit for full data, part of data before 19th Jan. 2018, and the rest part of data since 19th Jan. 2018.
The medians are the horizontal lines inside boxes.
With the visual plots, we can easily see the difference between the medians of after and before datasets.
In the above box plot, outliers are not shown.
Important1) About date time of days in the dataset:They are not real dates on the calendar. I started from @Loycev data source, then converted them to real calendar.
With the assumption that the first day in the data source is 24th January 2018, but the first day is a assumed 24th January 2018.
The real one maybe 23th or 25th January 2018 with forum time, or Dutch time.
I meant the assumed dates are only one day before or after the real ones, so I think that we all can accept the assumed/ pseudo-dates (they are actually not big issue).
LoyceV explained below.
If you're using my "days", you're still not using "real" days:
I've used the same "time convention" as I used for my
full merit transaction history: "Days" start the second the first Merit was transfered, and count exactly 3600*24 seconds after that. It has nothing to do with calendar days in any time zone.
I started the day the second
AdolfinWolf received the first Merit from theymos (Wed Jan 24 23:12:21 2018) (I think this is Dutch time, not forum time).
2) The first week in the converted dataset contains only 5 days, and is not a fully-traditional seven-day week.
List of those five days in the first week:
. list if id <=5
+----------------------------------------------------------------+
| id day month2 year merit date week month |
|----------------------------------------------------------------|
1. | 1 24 1 2018 13018 24jan2018 2018w4 2018m1 |
2. | 2 25 1 2018 6761 25jan2018 2018w4 2018m1 |
3. | 3 26 1 2018 4493 26jan2018 2018w4 2018m1 |
4. | 4 27 1 2018 3489 27jan2018 2018w4 2018m1 |
5. | 5 28 1 2018 3188 28jan2018 2018w4 2018m1 |
+----------------------------------------------------------------+
Notes:Highly appreciate help of someone who can give me data on merit distribution over ranks, boards (and other categories) since the beginning day of merit system.