Pages:
Author

Topic: [TELEGRAM] Yet Another BitcoinTalk Notification BOT (merits, mentions, topics,+) - page 46. (Read 20207 times)

legendary
Activity: 2758
Merit: 6830
I've now scraped 90% of the topics, I expect to be done in a couple of weeks. How are you on disk space?
Note: I didn't keep the original post date, but it's quite easy to get an estimate.
Note2: I almost forgot: when I'm done scraping, I have to re-do some local boards (with weird layout).
Honestly, not that good. Tongue

My VPS plan only comes with 45 GB of storage and I'm already using 45% of it. The main issue is that the database indexes (that lets people search the database without needing to wait 5+ minutes) are using half of the storage. So, it basically doubles the size of the database. I sent an email to my VPS asking for their prices on additional space and they charge $0.09/month per extra GB.

edit: I was thinking about maybe getting a second VPS with more storage to host my files/database. E.g: $3.30/month for one with 256 Gb (up to 3 TB for $5) VS my current one where an extra 211 gb would cost another ~$19/month.

Not sure how you guys are doing it, but I was thinking of writing a small .NET Core console app to run it on multiple servers (different IPs). That's what I did when I made this list. Vultr allows up to 5 instances, but I believe that could be increased on request.
Is theymos ok with that? I could do something similar but with different proxies. That sounds like breaking his 1 request/sec rule, though.
staff
Activity: 3500
Merit: 6152
Every post before May 2020 was provided to me by Loyce. But he only had/gave me 10 months worth of posts. I could scrape more myself - while also fixing old posts - , but time is indeed an issue. I would appreciate if there is any way you could help with that.

I've now scraped 90% of the topics, I expect to be done in a couple of weeks. How are you on disk space?
Note: I didn't keep the original post date, but it's quite easy to get an estimate.
Note2: I almost forgot: when I'm done scraping, I have to re-do some local boards (with weird layout).

Not sure how you guys are doing it, but I was thinking of writing a small .NET Core console app to run it on multiple servers (different IPs). That's what I did when I made this list. Vultr allows up to 5 instances, but I believe that could be increased on request.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Every post before May 2020 was provided to me by Loyce. But he only had/gave me 10 months worth of posts.
I've now scraped 90% of the topics, I expect to be done in a couple of weeks. How are you on disk space?
Note: I didn't keep the original post date, but it's quite easy to get an estimate.
Note2: I almost forgot: when I'm done scraping, I have to re-do some local boards (with weird layout). This will take a bit longer.
legendary
Activity: 2758
Merit: 6830
Is there any reason behind scrapping (10 months old) posts only? Is it the storage you're worried about or just the time it might take to scrap everything? If the latter, I might be able to help.
Every post before May 2020 was provided to me by Loyce. But he only had/gave me 10 months worth of posts. I could scrape more myself - while also fixing old posts - , but time is indeed an issue. I would appreciate if there is any way you could help with that.
staff
Activity: 3500
Merit: 6152
Is there any reason behind scrapping (10 months old) posts only? Is it the storage you're worried about or just the time it might take to scrap everything? If the latter, I might be able to help.
legendary
Activity: 2758
Merit: 6830
Does it show it even when it's a quote?
For addresses, the bot ignores everything inside a quote. I could change that if needed, though. I just thought it would maybe result in more false-positives.

Just made some changes to the page. What's new:

- Search by username;
- Show only addresses you have tagged (search by tag).
- Show only addresses which were mentioned at least 2 times.
- Changed the pagination to make the page less laggy.

I also scrapped the author's UID of more than 2m posts with Loyce's file (it's now down to only 6,6k posts with no author uid). I will fix those and add the updated data to the website.
legendary
Activity: 1484
Merit: 1491
I forgot more than you will ever know.
Does it show it even when it's a quote?
The telegram bot notifies you if someone quotes a post where you were @.. so I assume this one would do the same.
Tho I would not change that as some might add their addy in quotes to bypass it.
Please read before answering horse shit.
Why so unfriendly? I think Rizzrack is right.

An extra reason not to change it: quotes may still contain deleted posts.

You're right. Bad mood this morning. Sorry about that.

Regarding your answer, I'm not sure the parser uses the same rules on his website than for the bot.
copper member
Activity: 786
Merit: 710
Defend Bitcoin and its PoW: bitcoincleanup.com
Please read before answering horse shit.

Let me rephrase my shit... I assume it would also work if the addresses was in a quote. And if I am right it is ok and not worth changing due to the what I said before.
One man's shit is another man's post. You might also want to read some things twice... just saying

Edit : @Loyce yeah, for deleted posts also but was referring that users could add them in quotes like
"This is my ETH address
Quote
ETH-address-here
" and might bypass this if it excluded quotes  Wink
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Does it show it even when it's a quote?
The telegram bot notifies you if someone quotes a post where you were @.. so I assume this one would do the same.
Tho I would not change that as some might add their addy in quotes to bypass it.
Please read before answering horse shit.
Why so unfriendly? I think Rizzrack is right.

An extra reason not to change it: quotes may still contain deleted posts (although the current data already includes deleted posts).
legendary
Activity: 1484
Merit: 1491
I forgot more than you will ever know.
Does it show it even when it's a quote?

The telegram bot notifies you if someone quotes a post where you were @.. so I assume this one would do the same.
Tho I would not change that as some might add their addy in quotes to bypass it.

Please read before answering horse shit.
copper member
Activity: 786
Merit: 710
Defend Bitcoin and its PoW: bitcoincleanup.com
Does it show it even when it's a quote?

The telegram bot notifies you if someone quotes a post where you were @.. so I assume this one would do the same.
Tho I would not change that as some might add their addy in quotes to bypass it.
legendary
Activity: 1484
Merit: 1491
I forgot more than you will ever know.
I scrapped every ETH address from every post I have saved in the website (~10 months old), and linked them together with the accounts that posted it. This could help people find out connections between accounts when bounty abusers reuse their addresses between different accounts.


Does it show it even when it's a quote?
copper member
Activity: 786
Merit: 710
Defend Bitcoin and its PoW: bitcoincleanup.com
...
Notes:
- Keep in mind that this contains every single post (~10 months old until now), regardless of its content. So you should see trusted users being linked with abusers because they just posted their addresses (e.g to expose them). You should check the data yourself and edit the table to make it correct.

Nice feature!
I guess you might minimise the error probability if you don't scrape some boards like reputation, scam accusations and meta. Everywhere else would be fair game  Smiley
legendary
Activity: 2758
Merit: 6830
Why choosing to scrap only ethereum addresses and not also Bitcoin addresses?
It is planned. I did it first with ETH addresses because I think most bounty abuses happen with ERC-20 token bounties and it was easier to implement it.

Can we search with tags and usernames also?
Not right now. But I already plan to implement this after I deal with the missing UIDs and some other small issues.

It took an hour to list 20%. Wait a bit longer (and much longer if the server runs out of CPU credits).
I will wait for it. Smiley

legendary
Activity: 2212
Merit: 7064
I scrapped every ETH address from every post I have saved in the website (~10 months old), and linked them together with the accounts that posted it. This could help people find out connections between accounts when bounty abusers reuse their addresses between different accounts.
Great work TryNinja!
This will be very useful for sure.

Why choosing to scrap only ethereum addresses and not also Bitcoin addresses?

Can we search with tags and usernames also?
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Everything seems to be working fine
It took an hour to list 20%. Wait a bit longer (and much longer if the server runs out of CPU credits).
Update: it finished 4 hours ago.
legendary
Activity: 2758
Merit: 6830
Does this help? I'm making a list: http://loyce.club/other/msgIDuserID.txt
Yes, it helps a lot. Thanks!

I'm updating all of them in my dev database while I post this. Everything seems to be working fine, so I will run it on production soon.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
initially, I wasn't saving the user's UID along with the post when scrapping them. I plan to update all old posts with the author's UID in the future.
Does this help? I'm making a list: http://loyce.club/other/msgIDuserID.txt
Disclaimer:
  • I ignored the first couple thousand posts from July 2019, when I used a bit of a messy format.
  • The file is still updating. I don't have a database, so getting the data from 2.7 million files takes a while. I've processed less than 2% now, I expect it to take a few hours.
Done! The file is 44 MB.
legendary
Activity: 2758
Merit: 6830
I plan to create another thread only for the http://posts.ninjastic.space, but for now, I will just post this here.

I started to work in a few new features not directly related to the Telegram bot and I will be implementing them in the ninjastic.space website.

The first one is the https://posts.ninjastic.space/addresses page.



I scrapped every ETH address from every post I have saved in the website (~10 months old), and linked them together with the accounts that posted it. This could help people find out connections between accounts when bounty abusers reuse their addresses between different accounts.

You can also add tags to the addresses, which are locally saved in your browser and generate BBCode tables. Example:

Quote
___________________________________________________________________________________________________________
UsernamePostsArchive
Qomar Hamizah Gemilang1, 2, 3, 4, 51, 2, 3, 4, 5
Koteb171, 2, 3, 4, 5, 6, 7, 8, 9, 10, 111, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
Rawon ayam11
Related address:0x78de3ab871785488d7cd0a5fc6c5c8aa9a81eaa9Powered by ninjastic.space



Notes:
- Keep in mind that this contains every single post (~10 months old until now), regardless of its content. So you should see trusted users being linked with abusers because they just posted their addresses (e.g to expose them). You should check the data yourself and edit the table to make it correct.

- When you generate a table, some users do not have a link to their profile embedded to it. That's because, initially, I wasn't saving the user's UID along with the post when scrapping them. I plan to update all old posts with the author's UID in the future.

- Only Ethereum addresses (and thus, also ERC-20 tokens) are being scrapped (for now).
legendary
Activity: 2534
Merit: 1397
Just a suggestion for additional notifier. Can the super bot implement notification on birthdays of users? Since there is an option to add birthday details on profile section.
(....)
This is possible, but it's kinda annoying to bot subscribers if they will notify for every birthday of other users.
Maybe this can be done only for those subscribers of the Bot, much less annoying.
Although, this is a good idea. I will look forward to this kind of extra feature of a bot.
Pages:
Jump to: