Hi I came across this
The future of Bitcointalk: Low Ranking Top Merit earners in the past 30 days and got interested in making a simple analysis after scrapping the information from the last two posts to see how much insight I can gain from the information on the thread. I just wanted to compare between Newbies, Jnr members and Members who earned more merit in the last 30 days. below is the simple approach I deployed. First I imported a few library to aid the process.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
From there I moved to getting the url and also using the BeautifulSoup class to create an object for the url
url = 'https://bitcointalk.org/index.php?topic=5185736.msg64847098#msg64847098'
address = requests.get(url).text
soup = BeautifulSoup(address)
Below are empty list object where I will append each data scrapped
rank = []
profile = []
Trust = []
Merit = []
Bip = []
user = []
After the above steps i went ahead to create a function that should do the scrapping process once so I don't have to repeat myself countless times
def Scrapped():
for i in soup.find_all('div', {'class':'post'}):
for sp in i.find_all('a', {'class':'ul'}):
if sp.text.startswith('Trust') | sp.text.startswith('earned') | sp.text.startswith('BPIP'):
pass
else:
item = str(sp.get_attribute_list('href'))
left = item.strip('[')
right = left.strip(']')
both = right.strip("")
user.append(sp.text)
profile.append(both[1:-1])
for k in i.find_all('span'):
if(k.text.endswith('.')) | ('+' not in k.text) :
pass
else:
Trust.append(k.text)
pass
for b in i.find_all('a', {'class':'ul'}):
if b.text.startswith('earned'):
a = list(b.find_next_sibling(string=True))
s = ''.join(a)
h = s.strip('(')
Merit.append(h)
else:
pass
for c in i.find_all('span'):
if(c.text.endswith('.')):
rank.append(c.find_next_sibling(string=True))
for g in i.find_all('a', {'class':'ul'}):
if ('BPIP' in g.text):
p1 = str(g.get_attribute_list('href'))
L = p1.strip('[')
R = L.strip(']')
f = R
Bip.append(f)
I had to save this in a dataframe so it makes retrieving the information from each column more easier
df = pd.DataFrame({'User':user, 'Profile':profile, 'Rank':rank, 'Trust':Trust, 'BPIP':Bip, 'Merit':Merit})
df['Merit_data'] = df['Merit'].apply(lambda x: x[:3]).astype('int64')
df.head()
I proceeded to getting the value_count for each of the rank. I wanted to know how many of each represented we have in each category.
from the result i understood we have more members who earned more merit in the last 30 days
I was also able to deduce that majority of the merit earners has a clean trust record.
From the above cross tabulation it can be seen that we have one record each for member and newbie with two positive trust.
The total merit earned by each of the ranks in the forum are as follows :
- Newbie: 575
- Jr. member: 283
- Member: 900
To further explore the outcome of the analysis i moved into visualisation so as to gain more insight