How did you get this data? I can tell you didn't process that manually
.
I scraped it and processed it programmatically. I used libreoffice calc to do the formulas determining what users fit the criteria.
Share your program?
I think I could share the user list instead, I left the site scraper running over the holiday, no point in another 200K page hits to get the same data.
Just to show that it took some work, here's some code with the page field-finding:
lines = got.splitlines()
if len(lines) < 24:
continue
templist = [str(usernum), '', '', '', '', '']
lineindex = ['Name:', 'Posts:', 'Activity:', 'Date Registered:', 'Last Active:']
for i in xrange(23, len(lines)-2, 1):
ln = lines[i].strip('\t')[7:-9].strip()
if len(ln) > 2 and ln[-1] == ':':
for j in xrange(len(lineindex)):
if ln == lineindex[j]:
templist[j+1] = (lines[i+1].strip('\t ')[4:-5])
To get a line like this (which I grabbed now since I discarded the input files):
56964, whiskers75, 767, 406, April 29, 2012, 06:22:52 AM, Today at 03:51:15 PMI converted the dates to ISO time with a post-process on the downloaded csv, and also didn't foresee the hassle caused by users with commas and random crap in their user names, as my downloader had made a comma-seperated list instead of tabs, quotes, or some other escape character around usernames. I fixed those manually instead of doing the work of rewriting the scraper to a different format.
Here's a snippet of fixing the date:
month_dict = {"January": '01', "February": '02', "March": '03', "April": '04', "May": '05', "June": '06',
"July": '07', "August": '08', "September": '09', "October": '10', "November": '11', "December": '12'}
if z[7][:12] == 'Today':
datetimel = '2013-12-31T00:00:00'
elif z[7][:5] == 'Never':
datetimel = 'Never'
else:
md = z[7].split()
m = md[0]
d = md[1]
t_ap = z[9].split()
t = t_ap[0]
ap = t_ap[1]
hms = t.split(':')
if ap == 'AM' and hms[0] == '12':
hms[0] = '00'
if ap == 'PM' and hms[0] != '12':
hms[0] = str(int(hms[0])+12)
hms = ':'.join(hms)
m = month_dict.get(m)
datetimel = z[8] + '-' + m + '-' + d + 'T' + hms
after (your stats from when I got the list)
56964 whiskers75 731 392 2012-04-29T06:22:52 2013-12-31T00:00:00I wrote it as an inactive-finder, so "today" got a magic date.
To find missing numbers, it was faster to look for gaps than to use some python-esque list comprehension generator class def:
nonlist = []
for i in xrange(len(memnumlist)):
if i > len(memnumlist)-2:
continue
else:
if memnumlist[i+1] - memnumlist[i] > 1:
nonlist.extend(range(memnumlist[i]+1, memnumlist[i+1]))