Pages:
Author

Topic: [TELEGRAM] Yet Another BitcoinTalk Notification BOT (merits, mentions, topics,+) - page 44. (Read 19363 times)

legendary
Activity: 2758
Merit: 6830
Link is still correct, only thing that is missing is correct title name.
This is an old post I scrapped from Loyce's files. Most of them have no title because he didn't included it on the file. However, I have made so the bot scrappes the post again and updates the database with the correct title when someone receives a merit for it. With the recent updates I have made (fixing most posts author uid), something had broken up. It should be fixed now. Thanks.

edit: Pretty much all posts have been updated with their author UID and a lot of them got their title fixed. This means that almost all users in the /addresses page will come with their UID when you copy the table bbcode.
legendary
Activity: 2212
Merit: 7064
@TryNinja

One thing I noticed is that some topics don't have title showing in telegram bot.
For example, let's take my topic : [LEARN] Phishing Quizzes - Beginners & Experts

This is info I get in telegram when I receive merit for that:
~Unknown Title~


Link is still correct, only thing that is missing is correct title name.
sr. member
Activity: 770
Merit: 284
★Bitvest.io★ Play Plinko or Invest!
Just setup the bit. Sounds interesting. If you need a Dutch translation, just send me a dm.
staff
Activity: 3500
Merit: 6152
Is theymos ok with that? I could do something similar but with different proxies. That sounds like breaking his 1 request/sec rule, though.

As far as I know, that's not allowed. I download 1 page, then wait a second. I'm already pushing the limits a bit by adding this to my "normal" scraping, but that's less than 1 page per second. I don't want to cause more load than I'm doing already.

I did the same actually, I had a loop, and I limited the requests to one per second. But from my understanding, the 1 request/sec rule is for the IP addresses and not accounts? If you do many requests, your IP address will get blocked automatically after a while so even if you wanted to break it, you won't be able to but I'll P.M theymos and get a confirmation about this before starting anything. Thanks for bringing this to my attention.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Not sure how you guys are doing it, but I was thinking of writing a small .NET Core console app to run it on multiple servers (different IPs).
As far as I know, that's not allowed. I download 1 page, then wait a second. I'm already pushing the limits a bit by adding this to my "normal" scraping, but that's less than 1 page per second. I don't want to cause more load than I'm doing already.
legendary
Activity: 2758
Merit: 6830
I've now scraped 90% of the topics, I expect to be done in a couple of weeks. How are you on disk space?
Note: I didn't keep the original post date, but it's quite easy to get an estimate.
Note2: I almost forgot: when I'm done scraping, I have to re-do some local boards (with weird layout).
Honestly, not that good. Tongue

My VPS plan only comes with 45 GB of storage and I'm already using 45% of it. The main issue is that the database indexes (that lets people search the database without needing to wait 5+ minutes) are using half of the storage. So, it basically doubles the size of the database. I sent an email to my VPS asking for their prices on additional space and they charge $0.09/month per extra GB.

edit: I was thinking about maybe getting a second VPS with more storage to host my files/database. E.g: $3.30/month for one with 256 Gb (up to 3 TB for $5) VS my current one where an extra 211 gb would cost another ~$19/month.

Not sure how you guys are doing it, but I was thinking of writing a small .NET Core console app to run it on multiple servers (different IPs). That's what I did when I made this list. Vultr allows up to 5 instances, but I believe that could be increased on request.
Is theymos ok with that? I could do something similar but with different proxies. That sounds like breaking his 1 request/sec rule, though.
staff
Activity: 3500
Merit: 6152
Every post before May 2020 was provided to me by Loyce. But he only had/gave me 10 months worth of posts. I could scrape more myself - while also fixing old posts - , but time is indeed an issue. I would appreciate if there is any way you could help with that.

I've now scraped 90% of the topics, I expect to be done in a couple of weeks. How are you on disk space?
Note: I didn't keep the original post date, but it's quite easy to get an estimate.
Note2: I almost forgot: when I'm done scraping, I have to re-do some local boards (with weird layout).

Not sure how you guys are doing it, but I was thinking of writing a small .NET Core console app to run it on multiple servers (different IPs). That's what I did when I made this list. Vultr allows up to 5 instances, but I believe that could be increased on request.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Every post before May 2020 was provided to me by Loyce. But he only had/gave me 10 months worth of posts.
I've now scraped 90% of the topics, I expect to be done in a couple of weeks. How are you on disk space?
Note: I didn't keep the original post date, but it's quite easy to get an estimate.
Note2: I almost forgot: when I'm done scraping, I have to re-do some local boards (with weird layout). This will take a bit longer.
legendary
Activity: 2758
Merit: 6830
Is there any reason behind scrapping (10 months old) posts only? Is it the storage you're worried about or just the time it might take to scrap everything? If the latter, I might be able to help.
Every post before May 2020 was provided to me by Loyce. But he only had/gave me 10 months worth of posts. I could scrape more myself - while also fixing old posts - , but time is indeed an issue. I would appreciate if there is any way you could help with that.
staff
Activity: 3500
Merit: 6152
Is there any reason behind scrapping (10 months old) posts only? Is it the storage you're worried about or just the time it might take to scrap everything? If the latter, I might be able to help.
legendary
Activity: 2758
Merit: 6830
Does it show it even when it's a quote?
For addresses, the bot ignores everything inside a quote. I could change that if needed, though. I just thought it would maybe result in more false-positives.

Just made some changes to the page. What's new:

- Search by username;
- Show only addresses you have tagged (search by tag).
- Show only addresses which were mentioned at least 2 times.
- Changed the pagination to make the page less laggy.

I also scrapped the author's UID of more than 2m posts with Loyce's file (it's now down to only 6,6k posts with no author uid). I will fix those and add the updated data to the website.
legendary
Activity: 1484
Merit: 1491
I forgot more than you will ever know.
Does it show it even when it's a quote?
The telegram bot notifies you if someone quotes a post where you were @.. so I assume this one would do the same.
Tho I would not change that as some might add their addy in quotes to bypass it.
Please read before answering horse shit.
Why so unfriendly? I think Rizzrack is right.

An extra reason not to change it: quotes may still contain deleted posts.

You're right. Bad mood this morning. Sorry about that.

Regarding your answer, I'm not sure the parser uses the same rules on his website than for the bot.
copper member
Activity: 783
Merit: 710
Defend Bitcoin and its PoW: bitcoincleanup.com
Please read before answering horse shit.

Let me rephrase my shit... I assume it would also work if the addresses was in a quote. And if I am right it is ok and not worth changing due to the what I said before.
One man's shit is another man's post. You might also want to read some things twice... just saying

Edit : @Loyce yeah, for deleted posts also but was referring that users could add them in quotes like
"This is my ETH address
Quote
ETH-address-here
" and might bypass this if it excluded quotes  Wink
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Does it show it even when it's a quote?
The telegram bot notifies you if someone quotes a post where you were @.. so I assume this one would do the same.
Tho I would not change that as some might add their addy in quotes to bypass it.
Please read before answering horse shit.
Why so unfriendly? I think Rizzrack is right.

An extra reason not to change it: quotes may still contain deleted posts (although the current data already includes deleted posts).
legendary
Activity: 1484
Merit: 1491
I forgot more than you will ever know.
Does it show it even when it's a quote?

The telegram bot notifies you if someone quotes a post where you were @.. so I assume this one would do the same.
Tho I would not change that as some might add their addy in quotes to bypass it.

Please read before answering horse shit.
copper member
Activity: 783
Merit: 710
Defend Bitcoin and its PoW: bitcoincleanup.com
Does it show it even when it's a quote?

The telegram bot notifies you if someone quotes a post where you were @.. so I assume this one would do the same.
Tho I would not change that as some might add their addy in quotes to bypass it.
legendary
Activity: 1484
Merit: 1491
I forgot more than you will ever know.
I scrapped every ETH address from every post I have saved in the website (~10 months old), and linked them together with the accounts that posted it. This could help people find out connections between accounts when bounty abusers reuse their addresses between different accounts.


Does it show it even when it's a quote?
copper member
Activity: 783
Merit: 710
Defend Bitcoin and its PoW: bitcoincleanup.com
...
Notes:
- Keep in mind that this contains every single post (~10 months old until now), regardless of its content. So you should see trusted users being linked with abusers because they just posted their addresses (e.g to expose them). You should check the data yourself and edit the table to make it correct.

Nice feature!
I guess you might minimise the error probability if you don't scrape some boards like reputation, scam accusations and meta. Everywhere else would be fair game  Smiley
legendary
Activity: 2758
Merit: 6830
Why choosing to scrap only ethereum addresses and not also Bitcoin addresses?
It is planned. I did it first with ETH addresses because I think most bounty abuses happen with ERC-20 token bounties and it was easier to implement it.

Can we search with tags and usernames also?
Not right now. But I already plan to implement this after I deal with the missing UIDs and some other small issues.

It took an hour to list 20%. Wait a bit longer (and much longer if the server runs out of CPU credits).
I will wait for it. Smiley

legendary
Activity: 2212
Merit: 7064
I scrapped every ETH address from every post I have saved in the website (~10 months old), and linked them together with the accounts that posted it. This could help people find out connections between accounts when bounty abusers reuse their addresses between different accounts.
Great work TryNinja!
This will be very useful for sure.

Why choosing to scrap only ethereum addresses and not also Bitcoin addresses?

Can we search with tags and usernames also?
Pages:
Jump to: