Pages:
Author

Topic: Ninjastic.space - BitcoinTalk Post/Address archive + API - page 28. (Read 16648 times)

legendary
Activity: 1624
Merit: 2594
Top Crypto Casino
If you are looking for every post, you can do this:
[...]

How is this relevant for this thread? Or did I miss something?
copper member
Activity: 1652
Merit: 1901
Amazon Prime Member #7
If you are looking for every post, you can do this:
total_posts = 5273824 #
for x in range(total_posts):
    page = 0
    #go to 'bitcointalk.org/index.php?topic={}.{}'.format(x, page)
    #if not available to you: pass
    #scrape board information
    #scrape each post via loop
    #I believe there are two classes of posts - scrape both classes, you will insert posts into your DB out of order, but this is okay
    page += 20
    #there is a middletext td class
    #there is a prevnext span class
    next_page = bitcointalk.org/index.php?topic={}.{}'.format(x, page)
    #sleep for 1 second
    #if you can find a link equal to next_page, goto that page, else pass

in parallel to the above, and starting at the same time the above starts:
scrape the recent posts page, and add each post to your DB. Here you can scrape the board each thread is on, via adding it if it doesn't exist in your DB, and updating it if it doesn't exist.

The above will capture every post that you have access to. The first loop will take quite some time, and a thread being moved to a different board while you are in the process of scraping all posts will not cause you to miss any posts.
legendary
Activity: 2758
Merit: 6830
~
Thank you. I think I got it.

I went to https://bitcointalk.org/sitemap.php?t=b, grabbed every board url, scraped each one of them and linked them to their closest parent. There are only 248 boards, so it was pretty quick.

Code:
"board_id","name","parent_id"
1,   Bitcoin Discussion,
4,   Bitcoin Technical Support,
14,  Mining,
40,  Mining support,   14
42,  Mining software (miners),   14

If you want it: https://pastebin.com/raw/xhudKFZ8
legendary
Activity: 3654
Merit: 8909
https://bpip.org
In this case, you can send me the boards right away so I can figure out how to do that. Cheesy

Thanks!

Here you go:

https://bpip.org/boards_202009112042.zip (~100MB compressed, ~600MB uncompresed)

You may be able to get board details from here:

https://bitcointalk.org/index.php?action=search

It's a bit messy but it has all boards listed on one page. Otherwise you'd have to recursively scrape multiple pages starting from the front page.

Another option - if you are currently scraping recent posts including the full path (with board names and IDs) then you can extract the hierarchy from that data. It might not be complete though. Some boards that are rarely posted in might need to be added manually.
legendary
Activity: 2758
Merit: 6830
I have only the ID of the direct "parent" board (24 in your example). You would need to scrape the board hierarchy if you need the full path and the names of the boards.

Also that timestamp stuff will take me a day or two so if you want boards sooner - let me know, I can send it separately.
In this case, you can send me the boards right away so I can figure out how to do that. Cheesy

Thanks!
legendary
Activity: 3654
Merit: 8909
https://bpip.org
I do.
Would you be able to send them to me in this format (along with the post date)?

Code:
postid, date, boards
55179038, "2020-04-13 12:03:00", "{Other, Meta}"

I have only the ID of the direct "parent" board (24 in your example). You would need to scrape the board hierarchy if you need the full path and the names of the boards.

Also that timestamp stuff will take me a day or two so if you want boards sooner - let me know, I can send it separately.
legendary
Activity: 2758
Merit: 6830
I do.
Would you be able to send them to me in this format (along with the post date)?

Code:
postid, date, boards
55179038, "2020-04-13 12:03:00", "{Other, Meta}"

One of the things I had wanted to with BPIP was breakdown posts per hour per section of the forum.  People could see when the best time to post would be. Obviously this can be done easily using the infrastructure you have set up.  
That sounds like an easy one. I will add it!
legendary
Activity: 3654
Merit: 8909
https://bpip.org
Would you also have the boards of the posts? The old archive also doesn't contain them. Cheesy

I do.
Vod
legendary
Activity: 3668
Merit: 3010
Licking my boob since 1970
One of the things I had wanted to with BPIP was breakdown posts per hour per section of the forum.  People could see when the best time to post would be. Obviously this can be done easily using the infrastructure you have set up. 

legendary
Activity: 2758
Merit: 6830
Quick update on this: the timezone mess is messier than I thought so it will take some time to sort through it. Basically some posts got scraped with +0200 instead of UTC but I don't know which ones, so I'll probably need to scrape some "checkpoints" and find posts between them that are "time traveling" (e.g. created later than the next post).
Would you also have the boards of the posts? The old archive also doesn't contain them. Cheesy

I finished setting up the new database and already have some cool new features to announce. But I'm waiting for the timestamps and potentially the boards so I can index the data.

Here is a sneak peek of one of them (WIP): https://talkimg.com/images/2023/05/14/blobf707e32c89df6b5f.png
legendary
Activity: 3654
Merit: 8909
https://bpip.org
Sure. I need to double-check a few things first. At one point I had some issues with timezones so I'll verify if I need to make any adjustments. It will all be in UTC once it's ready.

Quick update on this: the timezone mess is messier than I thought so it will take some time to sort through it. Basically some posts got scraped with +0200 instead of UTC but I don't know which ones, so I'll probably need to scrape some "checkpoints" and find posts between them that are "time traveling" (e.g. created later than the next post).
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
How about a blockchain search?

4 values - min,max number of bitcoins transferred.   Start,end date to search.
Blockchair has this data. It might take more disk space than all Bitcointalk posts.

I have several topics based on it already (but I don't do databases):
legendary
Activity: 1624
Merit: 2594
Top Crypto Casino
I could also maybe implement some kind of authentication in the future.
I think you should. Otherwise you are just asking for a DoS attack Wink

Lack of authentication doesn't mean there are no other DDoS mitigation measures implemented. Just saying... Wink

Btw, you messed the quotes up, that was TryNinja's quote.
Vod
legendary
Activity: 3668
Merit: 3010
Licking my boob since 1970
Let me know and I'll implement them if possible.

How about a blockchain search?

4 values - min,max number of bitcoins transferred.   Start,end date to search.
For example - I want to search for any transfers between 450-500 bitcoin between Sep 1 and Oct 31 2015.

If you integrate the crypto price in the search, I could also search for transfers of $40-$50 for example.


sr. member
Activity: 840
Merit: 375
It's a simple HTTP request:

https://api.ninjastic.space/posts/55141939

Status code 200 means the post exists and you can parse JSON from the response. 404 means not found, etc.

Oh if it's just a simple HTTP request then I am familiar with that  Grin I thought it was some kind of special interface with mandatory authorization via an api key....

You can use them as you wish for now. But I would appreciate if you consulted me before doing many requests or implementing it in any kind of project. This way we can optimize things to keep the server working without too much workload.

I'm still working on the bot right now, if I deem it useful enough to release it publicly one day, I will definitely let you know before so we can optimize it.

I could also maybe implement some kind of authentication in the future.
I think you should. Otherwise you are just asking for a DoS attack Wink
legendary
Activity: 2758
Merit: 6830
Nice job.  I have some ideas that could help in scam busting.   I left you merit, but even better than that, I've left you my trust.
Thanks, Vod. Smiley

Let me know and I'll implement them if possible.

Any chance you can recover other posts from the Internet Archive or one of the bitcointalk clone sites?     Some scammers have deleted hundreds of posts of illegal activities.
It's technically possible, but I'm not sure how hard that would be. I'm priorizing scraping all the live posts that are missing from the database. When everything is working and most features are done, I may think about doing that.
Vod
legendary
Activity: 3668
Merit: 3010
Licking my boob since 1970
The new ninjastic.space website is out! Remade from scratch.

Nice job.  I have some ideas that could help in scam busting.   I left you merit, but even better than that, I've left you my trust.

- The post archive is still incomplete as many posts from this year are missing. It has, however, a lot more posts than its previous version: 42,785,512 posts! Mostly from the previous years. (thanks to @LoyceV for his oldposts archive).

Any chance you can recover other posts from the Internet Archive or one of the bitcointalk clone sites?     Some scammers have deleted hundreds of posts of illegal activities.
legendary
Activity: 1624
Merit: 2594
Top Crypto Casino

How to get access to these end points?/ Can you give me access to one of them? I will be messing around with data a bit and try to integrate it into my bot. I'm not too familiar with RESTful APIs so please include as much details as possible about it Smiley

You can find a useful online API testing tool here: https://reqbin.com/
Just send a request to one of the ninjastic.space endpoints (like these examples TryNinja gave) and watch the responses. You can also find code samples for popular programming languages. Try it. RESTful APIs are quite simple and easy to implement.
legendary
Activity: 2758
Merit: 6830
This ^

You can use them as you wish for now. But I would appreciate if you consulted me before doing many requests or implementing it in any kind of project. This way we can optimize things to keep the server working without too much workload. I could also maybe implement some kind of authentication in the future.

Some endpoints:

Post: https://api.ninjastic.space/posts/55141939
Posts: https://api.ninjastic.space/posts/46250414,50646799,55163653,50966531
Posts on topic: https://api.ninjastic.space/posts/topic/5273824

Address: https://api.ninjastic.space/addresses/1NinjabXd5znM5zgTcmxDVzH4w3nbaY16L
Address authors: https://api.ninjastic.space/addresses/1NinjabXd5znM5zgTcmxDVzH4w3nbaY16L/authors
Addresses on post: https://api.ninjastic.space/addresses/post/50966531

I will probably create a page to document them better.
legendary
Activity: 3654
Merit: 8909
https://bpip.org
How to get access to these end points?/ Can you give me access to one of them? I will be messing around with data a bit and try to integrate it into my bot. I'm not too familiar with RESTful APIs so please include as much details as possible about it Smiley

It's a simple HTTP request:

https://api.ninjastic.space/posts/55141939

Status code 200 means the post exists and you can parse JSON from the response. 404 means not found, etc.
Pages:
Jump to: