<…>
I also agreed with "nutildah" that the fact there is no 2 mln, active users after removing bots and shitposters. I think those statistics can be provided by "LoyceV" or "DdmrDdmr". So your calculations to be revised accordingly that data too.
<…>
Unfortunately, I don’t scrape a vast amount of data such as all user profiles to answer this question.
Per
Vod’s definition used in the
BPIP, which does work on all the forum user profile base, an active user is one who has
logged in within the last three months. The site states there are currently 444.249 active profiles, which is really nearly a sixth of the entire user base of Bitcointalk.
mazdafunsun has been publishing stats on this recently (
2 million users and their stats ), and reduces his figure to 90.993, using as active user definition users that have
logged in within the last year. He deliberately leaves out newbies and beginners in the calculation and focuses on core users.
Although he has analysed the first 2 million user profiles, there are probably more active users proportionally in the remaining profiles (2.335.558 is the current number of total member officially, so a total of 335.558 are not accounted for), for those are the most recent ones and have not been included in the analysis.
Regardless, If we want to contrast active users to merits somehow, I would go for a more restrictive definition of active user. I would consider an active user for this exercise as one that
has posted at least once in a given period of time. Say for example we want to contrast merit given in the past 3 months to active users. We need to consider users that have posted at least once in the same period of time, in order to postulate as a candidate to receive merit in the first place (logging in is not enough for this kind of exercise).
Of course, even this definition would not take us to a terrain that conforms a 100% accurate scenario, since for example posts outside the 3 month example window frame may be merited, so an inactive user may have been merited for previous posts. Regardless, this is a lesser noise factor in the calculations, and the results should be pretty much correct.
So why can’t we do this sort of exercise easily?
Because while the “last active” bit of information is part of the user profile and can easily (but patiently) be scraped, the "last post" cannot. You would either need to enter the user’s profile and scrape data from the last post (not easy at all), or compare two different snapshots to see the post difference. This can be done, but it’s tedious work over 2,3 Million records that download at a rate of one user profile every 2 seconds at best.
Ideally, a field on the user profile with the date of the last post or a counter on Nº posts in a month would allow a simple analysis and distinguish between logged-in users and posting-users when compared to merits awarded to posting users.