Pages:
Author

Topic: Time Series Analysis on Distributed Merits in the forum (daily, weekly, monthly) - page 33. (Read 29475 times)

legendary
Activity: 2310
Merit: 4085
Farewell o_e_l_e_o
Weekly Analysis:
Updated at 19th Nov. 2018:

Overview plot and trend
General trend has been downward, despite of one spike due to adjustment on rank requirements aims at Junior Member rank.


Mean +/- standard deviation; Median (Interquartile range)
1)   The first period, with 4 weeks from 24th Jan. to 18th Feb. 2018 [the first week in these fours is incomplete one).
The mean of weekly merit is 18983 with standard deviation is 8739.
Minimum and maximum of the period are 11722 and 30949, respectively.
2)   The second period, since 19th Feb. 2018, with 38 weeks in total.
The median of the 38-week period is 4423, with 50% of those weeks have total weekly merits in the range from 3854 to 5487 (the interquartile range). 25% of those weeks have total weekly merit above 5487 and 25% of them have total weekly merits below 3854.
Minimum and maximum of the period are 3065 and 8806.

Potential Outliers
Interquartile Range (Q1 - Q3), with Q1 = 3854, Q3 = 5487.
--> IQR = Q3- Q1 = 5487 - 3854 = 1633.
---> 1.5*IQR = 1633*1.5 = 2450.
Therefore, extremely potential outliers are weeks with total distributed merits above 7937 or below 1404.

Formula to calculate potential outliers:
- Q3 + 1.5* IQR = 5487 +2450 = 7937;
- Q1 - 1.5*IQR = 3854 - 2450 = 1404.
Now, let’s see how many extremely potential outliers we have. Only two.

Code:
    +----------------+
     |   week   merit |
     |----------------|
  1. | 2018w8    8758 |
  2. | 2018w9    8806 |
     +----------------+
They are from those days:
Code:
. list if id >26 & id < 41

     +----------------------------------------------------------------+
     | id   day   month2   year   merit        date     week    month |
     |----------------------------------------------------------------|
 27. | 27    19        2   2018    1403   19feb2018   2018w8   2018m2 |
 28. | 28    20        2   2018    1169   20feb2018   2018w8   2018m2 |
 29. | 29    21        2   2018    1266   21feb2018   2018w8   2018m2 |
 30. | 30    22        2   2018    1279   22feb2018   2018w8   2018m2 |
 31. | 31    23        2   2018    1046   23feb2018   2018w8   2018m2 |
     |----------------------------------------------------------------|
 32. | 32    24        2   2018    1409   24feb2018   2018w8   2018m2 |
 33. | 33    25        2   2018    1186   25feb2018   2018w8   2018m2 |
 34. | 34    26        2   2018    1382   26feb2018   2018w9   2018m2 |
 35. | 35    27        2   2018    1326   27feb2018   2018w9   2018m2 |
 36. | 36    28        2   2018     991   28feb2018   2018w9   2018m2 |
     |----------------------------------------------------------------|
 37. | 37     1        3   2018    1333   01mar2018   2018w9   2018m3 |
 38. | 38     2        3   2018    1696   02mar2018   2018w9   2018m3 |
 39. | 39     3        3   2018    1089   03mar2018   2018w9   2018m3 |
 40. | 40     4        3   2018     989   04mar2018   2018w9   2018m3 |
     +----------------------------------------------------------------+
The most interesting thing is even when new rank requirements implemented on Junior Member rank, impacts of the one did not last for too long, and ranged from the end day of week 2018w37 to the first 3 days of week #2018w38.
Consequently, both of weeks #2018w37 and #2018w38 are not extremely potential outliers.
Code:
. list if id > 234 & id < 241

     +------------------------------------------------------------------+
     |  id   day   month2   year   merit        date      week    month |
     |------------------------------------------------------------------|
235. | 235    15        9   2018     463   15sep2018   2018w37   2018m9 |
236. | 236    16        9   2018    2463   16sep2018   2018w37   2018m9 |
237. | 237    17        9   2018    1862   17sep2018   2018w38   2018m9 |
238. | 238    18        9   2018    1294   18sep2018   2018w38   2018m9 |
239. | 239    19        9   2018    1268   19sep2018   2018w38   2018m9 |
     |------------------------------------------------------------------|
240. | 240    20        9   2018     846   20sep2018   2018w38   2018m9 |
     +------------------------------------------------------------------+
Anyway, those two weeks are still potential outliers because they are higher than the 74th quartile (Q3, at 5487).
Code:
.         list

     +-----------------+
     |    week   merit |
     |-----------------|
  1. |  2018w4   30949 |
  2. |  2018w5   19958 |
  3. |  2018w6   13304 |
  4. |  2018w7   11722 |
  5. |  2018w8    8758 |
     |-----------------|
  6. |  2018w9    8806 |
  7. | 2018w10    7253 |
  8. | 2018w11    7309 |
  9. | 2018w12    6941 |
 10. | 2018w13    6707 |
     |-----------------|
 11. | 2018w14    6415 |
 12. | 2018w15    5487 |
 13. | 2018w16    4631 |
 14. | 2018w17    4585 |
 15. | 2018w18    4953 |
     |-----------------|
 16. | 2018w19    4753 |
 17. | 2018w20    4346 |
 18. | 2018w21    3854 |
 19. | 2018w22    4183 |
 20. | 2018w23    4527 |
     |-----------------|
 21. | 2018w24    3818 |
 22. | 2018w25    4921 |
 23. | 2018w26    4457 |
 24. | 2018w27    4253 |
 25. | 2018w28    4239 |
     |-----------------|
 26. | 2018w29    4159 |
 27. | 2018w30    3652 |
 28. | 2018w31    3798 |
 29. | 2018w32    3994 |
 30. | 2018w33    3618 |
     |-----------------|
 31. | 2018w34    3789 |
 32. | 2018w35    3065 |
 33. | 2018w36    3574 |
 34. | 2018w37    5630 |
 35. | 2018w38    7825 |
     |-----------------|
 36. | 2018w39    4388 |
 37. | 2018w40    4271 |
 38. | 2018w41    3800 |
 39. | 2018w42    4821 |
 40. | 2018w43    3945 |
     |-----------------|
 41. | 2018w44    3339 |
 42. | 2018w45    4513 |
 43. | 2018w46    1740 |
     +-----------------+

legendary
Activity: 2310
Merit: 4085
Farewell o_e_l_e_o
DAILY MERIT TIME-SERIES




I will keep the topic update regularly on weekly basis (mainly because I use weekly data dumps from LoyceV).
Updates will be posted in the last thread of the topic (not in the OP) at specific point of time.






Hello all,

As I stated months ago, that I have planned to make a time series analysis on merit distribution in the forum over time (since 24th January, 2018).
In the topic, I will show you a time series on different time-scale (daily, weekly, monthly, quarterly, and probably yearly in the very far future from now on).


Data sources
1) Original data set (by LoyceV)
2) Fully converted dataset (made by myself)
Someone who have interests and want to get datasets, please visit there.
I want to post all here, but it seems that the OP is unable to last too long (some limitations on total words/ rows from forum rules, I guess).
Details about date time format in my converted dataset explains clearly at the bottom of the topic.


In the OP, I will present you analyses based on full data and truncated data (the first twenty-six days [from 24th Jan. to 18th Feb.] dropped).

Firstly, let's take a look at a full dataset.
From the above image, it is clearly that daily distributed merits plummeted since the first day (launched day) of merit system at 13018, then fell to 4192 after 7 days, and decreased to 2308 after two weeks. After around 76 days, the daily merits dropped to 884, then fluctuated insignificantly from there.
I will present more on medians of daily merits below [somewhere around 650 merits distributed per day].

Secondly, let's take a look at a truncated dataset.
What is truncated dataset. From original one, I truncated the first two days in order to present you a closer, clearer plot.
Of course, these two days are definitely outliers in the dataset, started from January; and there are more outliers, which will be taken into consideration later.
With the plot from truncated dataset, we can easily see that there are a sudden spike in 16th September, daily merits rocketed to 2463, then decreased gradually to 1862, 1294, and 1268 three days later.
These September's spikes occured due to new changes on merit system and forum rank requirements, which aims at Junior members. Demoted Junior members got massively thousands of merits to rank up again in next four days. In other words, the effects of new rank requirements on Junior Member tailed off fastly after four days.


Trend
So, what is kind of trend we have?
In both full and truncated datasets, it is clearly that the general trend of merit distribution is downwards, has gradually fallen over nearly 1 year after the beginning day of merit system.
The linear regression lines shown via red lines in plots.
Despite of suddenly spikes in September, they did not last long enough to make a trend-break. Consequently, in general, merit distribution in the forum has still been in long-term downwards trend.


Mean +/- standard deviation; Median (Interquartile range), minimum and maximum.
Yeah, by now, I only shows you overview on daily distributed merits over around 10 months of 2018.

Now, let's spend a couple of minutes to look at the statistics on daily merits (means, standard deviations, medians, interquartile range, and potential outliers).

As always, with un-normal distributed variables, in the case, daily distributed merits, medians are better statistics to use.
Medians are better because they present nearly true means, and are not affected significantly by potential outliers.

In the part of my analysis, I present three versions of merit dataset:
(1) Full dataset: start from 24th Jan. 2018, to the last updated day.
(2) Partially truncated dataset: the first two days, 24th and 25th Jan. with 13018 and 6761 distributed merits per day, respectively, were truncated.
(3) Fully truncated dataset: all days before 19th Feb. 2018, were truncated.

Now, firstly let's take a look at descriptive time-series plots of two period, before and since 19th Feb 2018.
Before 19th Feb. 2018:

Since 19th Feb. 2018:
The reason why I truncated dataset into two parts explains below.
Statistics:
From the full dataset, we can easily identify potential outliers, both below and above potential outliers.
They are calculated via the following formula:

Potential outliers are data points which higher than Q3 + 1.5*IQR or lower than Q1 - 1.5*IQR
With:
- IQR = Q3 - Q1 (interquartile range presented in the above table with Q1 - Q3, respectively)
- Q1: the 25th quartile
- Q3: the 75th quartile.


Let's calculate cut-offs of potential outliers based on full daily dataset.
IQR = Q3 – Q1 = 884 – 534 = 350;
1.5*IQR = 1.5*350 = 525
Potential outliers:
-   Above Q3 +1.5*IQR = 884 +525 = 1409
-   Below Q1 – 1.5*IQR = 534 – 525 = 9
To sum up, days with total merits above 1409 or below 9 are highly potential outliers.
How many potential outliers we have so far?
There are 27 days which have daily merits above 1409, most of them (24 days) are continuous days from 24th Jan. to 17th Feb. It means 88.9% of those outliers occur before the March. However, I truncated the 18th Feb. in order to have fully weeks for weekly analysis.
Below is the list of those days.
Code:
. list id date merit if merit >1409

     +-------------------------+
     |  id        date   merit |
     |-------------------------|
  1. |   1   24jan2018   13018 |
  2. |   2   25jan2018    6761 |
  3. |   3   26jan2018    4493 |
  4. |   4   27jan2018    3489 |
  5. |   5   28jan2018    3188 |
     |-------------------------|
  6. |   6   29jan2018    3799 |
  7. |   7   30jan2018    4192 |
  8. |   8   31jan2018    2820 |
  9. |   9   01feb2018    2545 |
 10. |  10   02feb2018    2568 |
     |-------------------------|
 11. |  11   03feb2018    1867 |
 12. |  12   04feb2018    2167 |
 13. |  13   05feb2018    2077 |
 14. |  14   06feb2018    2308 |
 15. |  15   07feb2018    2141 |
     |-------------------------|
 16. |  16   08feb2018    2141 |
 17. |  17   09feb2018    1448 |
 18. |  18   10feb2018    1747 |
 19. |  19   11feb2018    1442 |
 21. |  21   13feb2018    1579 |
     |-------------------------|
 22. |  22   14feb2018    2513 |
 23. |  23   15feb2018    1991 |
 24. |  24   16feb2018    1411 |
 25. |  25   17feb2018    1608 |
 38. |  38   02mar2018    1696 |
     |-------------------------|
236. | 236   16sep2018    2463 |
237. | 237   17sep2018    1862 |
     +-------------------------+
The first part of the merit dataset (outliers): the median is 2154 merit per day, with 50% of days have daily merits range from 1608 to 3188 (the interquartile range, from Q1 to Q3). The minimum and maximum figures of the period are 1289 and 13018, respectively.

The second part of merit dataset (mostly not contains outliers): The median of the second part (start from 19th Feb. 2018 to 14th November 2018) is 627, with 50% of days have daily merits range from 525 to 788. The minimum and maximum figures of the period are 370 and 2463, respectively.

The above table presents that there are no significant difference between medians of full and truncated dataset, at 652 and 627, respectively. Nevertheless, there are more considerable diferecence between the means of full and truncated datasets, at 899 and 703, respectively.
It means, mean of daily merits in full dataset is 196 points higher than as of truncated one, whilst the median of daily merits in full dataset is only 25 points higher than as of truncated one

One more time, medians show their magical meanings. Extremely high values of daily merits in early days don’t have too much impact on median.

A basic statistics of truncated dataset:
a) Median: 627, it means that 50% of observed days have total distributed merits lower than 627 and 50% of observed days have total distributed merits higher than 627.
b) Interquartile range (IQR, from Q1 to Q3): 525 - 788, it means that 50% of observed days have total daily distributed merits within the range from 525 to 788 points per day. Additionally, 25% of observed days have less than 525 merits distributed per day, and 25% of observed days have more than 788 merits distributed per day.
Days which have values lower than Q1 (525) and higher than Q3 (788) are called extreme values, and they are likely potential outliers.
c) Min - Max: the minimum and maximum daily distributed merits are 370 and 4493, respectively, for truncated dataset; and are 370 and 13018, respectively for full dataset. By now, the all-time-high of daily distributed merit is 13018. The ATH is extremely difficulty to be beaten at any odds. In my assumption, I believe that new all time high (higher than 13018) will only occured if new demoted wave on Senior member, Hero, and Ledgendary ranks implemented simultaneously; which might be un-realistic (Theymost might never do this).


Potential Outliers
Let's calculate cut-offs of potential outliers based on real daily dataset (truncated one) with the same formula presents above.
- Q1 = 525
- Q3 = 788
----> IQR = Q3 - Q1 = 788 - 525 = 263, hence 1.5*IQR = 1.5*263 = 395.

Q1 - 1.5*IQR = 525 - 395 = 130.
Q3 + 1.5*IQR = 788 + 395 = 1183.

It means days which have total merits distributed above 1.1k or below 130 should be taken into deeply investigation to find what's happen during those days.
So, how many extremely potential outliers we found with truncated dataset?
21 days listed below.
Code:
. list id date merit if merit >1183 & merit != .

     +-------------------------+
     |  id        date   merit |
     |-------------------------|
  1. |  27   19feb2018    1403 |
  3. |  29   21feb2018    1266 |
  4. |  30   22feb2018    1279 |
  6. |  32   24feb2018    1409 |
  7. |  33   25feb2018    1186 |
     |-------------------------|
  8. |  34   26feb2018    1382 |
  9. |  35   27feb2018    1326 |
 11. |  37   01mar2018    1333 |
 12. |  38   02mar2018    1696 |
 15. |  41   05mar2018    1245 |
     |-------------------------|
 22. |  48   12mar2018    1354 |
 30. |  56   20mar2018    1322 |
 31. |  57   21mar2018    1227 |
 42. |  68   01apr2018    1233 |
210. | 236   16sep2018    2463 |
     |-------------------------|
211. | 237   17sep2018    1862 |
212. | 238   18sep2018    1294 |
213. | 239   19sep2018    1268 |
     +-------------------------+

From the truncated dataset, there are 18 extremely potential outliers (nearly 6.7% of 269 observed days since 19th Feb. 2018).

The image also shows that there are limited days with total merits distributed are higher than 1183 (above the horizontal red line)

Box plots
The part presents box plots of daily merit for full data, part of data before 19th Jan. 2018, and the rest part of data since 19th Jan. 2018.
The medians are the horizontal lines inside boxes.
With the visual plots, we can easily see the difference between the medians of after and before datasets.
In the above box plot, outliers are not shown.


Important
1) About date time of days in the dataset:
They are not real dates on the calendar. I started from @Loycev data source, then converted them to real calendar.
With the assumption that the first day in the data source is 24th January 2018, but the first day is a assumed 24th January 2018.
The real one maybe 23th or 25th January 2018 with forum time, or Dutch time.
I meant the assumed dates are only one day before or after the real ones, so I think that we all can accept the assumed/ pseudo-dates (they are actually not big issue).

LoyceV explained below.
If you're using my "days", you're still not using "real" days:
I've used the same "time convention" as I used for my full merit transaction history: "Days" start the second the first Merit was transfered, and count exactly 3600*24 seconds after that. It has nothing to do with calendar days in any time zone.
I started the day the second AdolfinWolf received the first Merit from theymos (Wed Jan 24 23:12:21 2018) (I think this is Dutch time, not forum time).

2) The first week in the converted dataset contains only 5 days, and is not a fully-traditional seven-day week.
List of those five days in the first week:
Code:
. list if id <=5

     +----------------------------------------------------------------+
     | id   day   month2   year   merit        date     week    month |
     |----------------------------------------------------------------|
  1. |  1    24        1   2018   13018   24jan2018   2018w4   2018m1 |
  2. |  2    25        1   2018    6761   25jan2018   2018w4   2018m1 |
  3. |  3    26        1   2018    4493   26jan2018   2018w4   2018m1 |
  4. |  4    27        1   2018    3489   27jan2018   2018w4   2018m1 |
  5. |  5    28        1   2018    3188   28jan2018   2018w4   2018m1 |
     +----------------------------------------------------------------+

Notes:
Highly appreciate help of someone who can give me data on merit distribution over ranks, boards (and other categories) since the beginning day of merit system.
Pages:
Jump to: