Author

Topic: Analysis on Merit distribution over time (Box plots, Histogram, and Time series) (Read 276 times)

legendary
Activity: 2310
Merit: 4085
Farewell o_e_l_e_o
Really not. I don't understand this thread for what ? Apologize since I am beginner.
Better one has been released days ago.
Link to get it:
Time series analysis on distributed merits (daily, weekly, monthly)
legendary
Activity: 2408
Merit: 2226
Signature space for rent
Update for today. Enjoy it.

I don't see any point of your update here since you have updated in another thread. It's not really something that enjoyable and not necessary to bump this thread.

Update for the last week, based on the given data from @coinlocket$

So what are you doing really?

I have seen this particular topic over and over again in this forum. Can we have something quiet different for once?
This one is different, you have not seen the difference, really?

Really not. I don't understand this thread for what ? Apologize since I am beginner.
legendary
Activity: 2310
Merit: 4085
Farewell o_e_l_e_o
Update for today. Enjoy it.
Update for the last week, based on the given data from @coinlocket$
As the attached image shown, the means and medians of weekly distributed merits are nearly 4.8k and 4.4k, respectively.
As always, I strongly suggest to take a look at medians, which are almost true means of distributed merits, rather than means,
Using medians will help us to have more exact, neutral overview on merit distribution without significant effects of outliers (robust to outliers).
More interestingly, the interquartile range (from 25th quartile to 75th quartile) has been from 3978 to 4820.
What does it means?
It presents that 25% of observed weeks have less than 3978 merits distributed over week (below the 25th quartile), and 25% of them have more than 4820 merits distributed over week (above the 75th quartile).
In other words, 50% of those observed weeks have weekly distributed merits ranges from 3978 to 4820; and the median of weekly distributed merit has been somewhere around 4.4k per week.
Data interpretation used statistics in the last row of the attached table.

In addition, weeks which have total distributed merits above 6k should be consider as potential outliers [as red circles shown in the boxplot]

*Notes:
1. Interquartile range: The range from 25th quartile to 75th quartile.
2. Mean +/- sd: Mean +/- Standard deviation
3. Potential outliers: weeks which have total distributed merits outside the two whiskers (above and below).
Above whisker = Q3 + 1.5 IQR = 4820 + 1.5*842 = 6083 (Q3: the 75th quartile)
Below whisker = Q1 - 1.5 IQR = 3978 - 1.5*842 = 2715 (Q1: the 25th quartile).
newbie
Activity: 10
Merit: 0
I have seen this particular topic over and over again in this forum. Can we have something quiet different for once?
legendary
Activity: 2310
Merit: 4085
Farewell o_e_l_e_o
Hello everyone,

Being stimulated by good initiatives of @coinlocket$, today I would like to start my new topic, namely:
ANALYSIS ON MERIT DISTRIBUTION OVER TIME

In the topic, I will show you interesting facts related to the distributions of merit over time on weekly data.
My analysis will be made using:
1) Box plots;
2) Histogram;
3) Time series.

For now, I don't have time to actually compose a totally new contents for the topic, so let you all discover some points which I posted on another topic.
I simply created the topic, marked my ideas for further works later.

Rising from 3580 to 9684, then dropping from 9684 to 4510.
This is what I expected to see after watching your previous thread (last week). There are many demoted Junior members who ranked up back to this rank with only one new merit earned after the day newest ranking system launched weeks ago.

As I suggested in other threads, it might be better if you, or someone else, can collect specific statistics on the total demoted Junior Members who ranked up back and till now has had only one merit (enough figure for ranking up back).
Not sure, but I guess those cases and total merits received by them accounted for dominant proportions of those abnormal sudden rises of merit distribution last two week.

I really hope that you or someone else can do it someday.

I spent a couple of minutes to run simple analysis and grap box plots from those merit data. Here you go:

(1) Basic statistics:
Notes:
Dropped data means I dropped the data of the week from 17/9/2018 to 23/9/2018 out of the analysis because this one is a extremely outlier.

As you all can see that the means of those two datasets (4876, and 47111 for Full, and Dropped datasets, respectively) are much different than the Medians (4440 and 4431, for Full and Dropped datasets, respectively).
This is one of magical meanings of the Median statistic. In statistics, Medians are called as 'true' means of variables.

Besides the Medians, with the current datasets, statistics outside 3903 and 5043 (for Full dataset) or 4820 (for Dropped dataset) can be called as potential outliers because they are outside the Q3 + 1.5 IQR or Q1 - 1.5 IQR, with Q1 ~ 25th quartiel; Q3 ~ 75th quartile; IQR ~ Interquatile range.

For example:
You can look back at the table above, for the full merit dataset, we have:
- 25th quartile is
- 75th quartile is
so, the IQR = 5043 - 3903 = 1140
Q3 + 1.5 IQR = 5043 +1.5*1140 = 6753
Q1 - 1.5 IQR = 3903 - 1710 = 2193.
For the case, weeks which has total merit distributed over 6.7k or below 2.2k should be taken into deeply consideration to find out where are the reasons behinds those un-normal merit distributions for those weeks. And, please remember that those ones are only 'Potential Outliers'.
From the current dataset, you can easily see two important things:
(1) There are four potential weekly outliers [the first three weeks on the top, and the second from the bottom of the table given above], and all of them have values above 6.7k. I don't remember what actually happened in March, but I guess there were some significant changes on those weeks, maybe more new merit sources added. Highly appreciated information to explain these Potential Outliers.
(2) More interestingly, none of them has value below 2.2k. Thanks merit sources, at least they have been actively worked and kept distributing allocated merits.

Over time, when we have more weekly data, the whole picture will be clearer, and more reliable.
By now, the dataset which I used from @coinlocket$ contains only 30 data point (30 weeks), it's not large enough.

So what does it mean in real life in the BTT forum? It means when we see the total merits distributed per week above 4.8k, we can start thingking of some potential internal changes in the forum, such as significant new active merit sources, or new rules/ systems implemented recently. Interesting, right?  Grin
These potential outliers presented as red circle in the below box plots.

(2) Box plots and histograms:
Box plots presents both medians, interquartile ranges, potential outliers.
Historgrams present that whether data (merit distribution in this case) is normal distribution by following the bell curve, or un-normal distribution (by not following the bell curve).
2.1. Full data:
2.2. Dropped data:
One more time, as you all can see that both box plots, and histograms obviously present that these dataset on merit distributions are un-nonrmal distributions.


Notes:
- I will do a time-series analysis with given datasets from @LoyceV later.
- Highly appreciated anyone help with the sort of data in Excel format (I don't need excell file, if someone can get it, simply give me a snapshot of your sheet. It is enough for me, I will input those given figures manually into my sheet). Something like this one:
- I will dedicate my spare time to do it in my coming own topic. Stay tuned, please.

TO DO list for the topic:
1) Collecting more data (older weeks in January and February; and merit distributions on forum boards over time), maybe I will get them from LoyceV datasets.
2) Making a time series analysis on merit distribution.
3) Adjusting graphs.
Jump to: