Author

Topic: Downloadable topic-database? (Read 229 times)

newbie
Activity: 26
Merit: 64
December 24, 2023, 09:21:33 PM
#18
JSON is probably fine. Could you provide an example of the format you want with a dummy post?

Suppose there are boards B1 and B2. B1 has child boards B11 and B12. Each board is represented as a folder with the same name. The main foder, F, could be structured as follows (every instance of content.txt represents a file; the rest are folders).

F
├───B1
│   ├───content.txt
│   ├───B11
│   │   └───content.txt (*)
│   └───B12
│       └───content.txt
└───B2
    └───content.txt

Now here's what a content.txt file could look like. Suppose we're looking at (*).

{
    "name": "B11",
    "topics": [
        {
            "topicId": "1111",
            "subject": "Help me out plz",
            "op": {"userId": "3596085", "username": "ltcltcltc", "activity": 26, "merit": 60},
            "time": ,
            "messages": [
                {
                    "msgId": "6666",
                    "author": {"userId": "3596085", "username": "ltcltcltc", "activity": 26, "merit": 60},
                    "time": ,
                    "merited": "2",
                    "message": "Hey does anyone know how to speed up ecdsa signature bruteforcing?"
                },
                {
                    "msgId": "6699",
                    "author": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23},
                    "time": ,
                    "merited": 0,
                    "message": "Stop wasting your time."
                }
            ]
        },
        {
            "topicId": "2222",
            "subject": "Test. Do not answer.",
            "op": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23},
            "time": ,
            "messages": [
                {
                    "msgId": "8008",
                    "author": {"userId": "3597570", "username": "aleph1", "activity": 1, "merit": 23},
                    "time": ,
                    "merited": "3",
                    "message": "Testy test."
                }
            ]
        }
    ]
}


I didn't give any example of timestamp because I don't know what your time format is, but I think I'd prefer Unix time. Also note the redundancy: the topic's timestamp is the same as the timestamp on the first message of said topic. The topics inside each board are ordered chronologically (older first) and the messages inside each topic too.

What do you think about this format?
legendary
Activity: 2758
Merit: 6830
December 24, 2023, 01:14:01 PM
#17
I think a tree-like structure (board/subboard/topic/message) would work best so as to study conversations as a whole more than individual messages, since I don't care about individual opinions as much as I do about global sentiments.
So maybe JSON? Does this work for you? I mentioned the Economy board as an example; ideally I'd want the whole data.
JSON is probably fine. Could you provide an example of the format you want with a dummy post?
newbie
Activity: 26
Merit: 64
December 24, 2023, 11:54:27 AM
#16
It's a good idea too. I'll .append() it to the list. Ltc stands for other than litecoin.
newbie
Activity: 26
Merit: 64
December 24, 2023, 09:47:00 AM
#15
I just keep the raw HTML for archiving purposes.
Ok. Then perhaps TryNinja's database fits better my purposes.

I’m willing to give anyone a .csv or similar with any data that I have.
I think a tree-like structure (board/subboard/topic/message) would work best so as to study conversations as a whole more than individual messages, since I don't care about individual opinions as much as I do about global sentiments.
So maybe JSON? Does this work for you? I mentioned the Economy board as an example; ideally I'd want the whole data.

It would be a super favour you'd be doing me.
copper member
Activity: 1330
Merit: 899
🖤😏
December 24, 2023, 06:34:34 AM
#14
My man triple ltc, can we start over? The analysis you linked above seems to be interesting, can you do a special analysis on price changes and my appearance on reputation and meta boards in the past 6 month? Lol, I mean is there a way to do that?
Earlier I thought you are one of the trolls harassing me, so apology for snapping at you. I appreciate the effort. 😉
legendary
Activity: 2758
Merit: 6830
December 24, 2023, 05:23:11 AM
#13
I’m willing to give anyone a .csv or similar with any data that I have. Like Loyce said, all you gotta do is ask. Smiley
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
December 24, 2023, 04:59:02 AM
#12
So, please tell me if I'm wrong: you just scrape content with limited data treatment, so the filtering functionalities you offer are the same that the forum offers, i.e. sorting by chronological order, viewing the posts inside a topic and filtering by user.
I don't process anything, I just keep the raw HTML for archiving purposes. Although I also keep a list per user and per topic.

Quote
Ok! I thought by the previous message that you were declining.
I meant ask TryNinja nicely if you want for instance only data from the Economics board.

Quote
But if that's not the case then I'd be super grateful if you shared your database with me to avoid the undesirable task of rescraping the scraped!
I don't have a database. I just have "data". And it's a lot. Hence my question if you know how you're going to handle it. Old posts for instance are stored in a different format, although I may still have a backup of individual files for each post. Update: found it. That's the part where you'll get millions of files in one directory. So you'll have to be a bit more specific before I just dump a shitload of files on you Tongue

Quote
Ninja's website looks handy too. Harder to scrape though.
Don't scrape, ask Tongue
legendary
Activity: 2856
Merit: 7410
Crypto Swap Exchange
December 24, 2023, 04:55:41 AM
#11
Quote
Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages".
Nope. That's TryNinja's specialty. Again: just ask nicely Smiley

Link you mentioned TryNinja already offer API where it's documentation can be seen on https://docs.ninjastic.space/. If OP willing to write script which download topic/reply from the API and wait for several days, it should be viable option.
newbie
Activity: 26
Merit: 64
December 24, 2023, 04:37:08 AM
#10
CS means computer science.

Quote
That's literally how my data files are.

So, please tell me if I'm wrong: you just scrape content with limited data treatment, so the filtering functionalities you offer are the same that the forum offers, i.e. sorting by chronological order, viewing the posts inside a topic and filtering by user.

Quote
Again: just ask nicely

Ok! I thought by the previous message that you were declining. But if that's not the case then I'd be super grateful if you shared your database with me to avoid the undesirable task of rescraping the scraped!

Ninja's website looks handy too. Harder to scrape though.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
December 24, 2023, 03:55:32 AM
#9
Thanks, btw what do you mean by WYSIWYG? I get what it stands for, and that it's CS slang, but how does it apply here?
I don't know what "CS slang" is, but WYSIWYG stands for What You See Is What You Get. That's literally how my data files are.

Quote
Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages".
Nope. That's TryNinja's specialty. Again: just ask nicely Smiley
Vod
legendary
Activity: 3668
Merit: 3010
Licking my boob since 1970
December 23, 2023, 07:41:09 PM
#8
Thanks, btw what do you mean by WYSIWYG?

Odd that you have never googled that phrase, but you've found an obscure website.   Could it have something to do with my recent suggestion?   Wink
newbie
Activity: 26
Merit: 64
December 23, 2023, 07:26:11 PM
#7
Thanks, btw what do you mean by WYSIWYG? I get what it stands for, and that it's CS slang, but how does it apply here? Also, I've seen that your website offers the functionality of showing any given user's messages. Is there an analogous way of filtering messages by board? Like: "showing Economy messages".

PD. I've seen your other work, quite impressive!
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
December 23, 2023, 06:10:24 PM
#6
How is your data classified?
See the link you started this topic with. WYSIWYG. Best I can do is a post number.
newbie
Activity: 26
Merit: 64
December 23, 2023, 05:30:35 PM
#5
How is your data classified? Tree-structure or raw recent-first stack? In the second case, I'd probably reorganize it myself into a tree-like structure. This way should be quicker to filter out some data. Maybe start with the Bitcoin discussion board, then scale up.

Also I can always chop those 100GB into various time series. Perhaps the 2020-2022 time period contains jucier data than the rest (due to the rise and drop of BTC). Everything can be explored.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
December 23, 2023, 04:11:53 PM
#4
I've seen another sentiment analysis, which analysed posts concerning the block size discussion at Fork time. But it's only in my email, I can't find it online.

Have you thought about how you'd handle my data? It's a lot: millions of files, about 100 GB, and most file systems can't handle that many files without many subdirectories.
newbie
Activity: 26
Merit: 64
December 23, 2023, 03:44:27 PM
#3
Haha I didn't think about that indeed.
I came across this sentimental analysis of BTT. It aims to infer a correlation between the temperature/feeling of this forum and the tendency of cryptos like Bitcoin. I found it interesting so I thought I'd try it myself, play around with the data, see what comes out. Might even be an intro to ML. My goal: learning. Oftentimes that leads to interesting results but one can never be certain. Still, if the least comes out of this you'll be the first to read about it.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
December 23, 2023, 01:23:27 PM
#2
How about you just ask nicely? Wink What do you need, and what's the goal? Or better: will you publish the results on Bitcointalk?
newbie
Activity: 26
Merit: 64
December 23, 2023, 12:28:02 PM
#1
I know LoyceV has put together a nice scrapable archive of the topics of this forum.

I want to do a analysis of the BTT forum. I could write a script to scrape the data from LoyceV's archive but I was really wishing someone could point me towards a fully downloadable database to speed things up. Does anyone have a reference?

Cheers!
Jump to: