Author

Topic: Forum BBCode to HTML script development (Read 230 times)

legendary
Activity: 1904
Merit: 1563
December 09, 2020, 09:31:48 AM
#10
Have you tried using TryNinja's API? If you would read its documentation[1], you can see that it already parses/scrapes all of the Bitcointalk's posts and necessary data and turn contents into an HTML format. The documentation also shows how to use the API with python scripts. And if you would make a program or an application that shows the parsed content into an embeddable HTML, you could just use iteration to access on the Key named 'content' on the JSON format from the API's Response.

Good lord, TryNinja deserves money for this. it translated everything in my test post perfectly.

I'll see if I can get permission to reuse his source code that translates bbcode like that for my own (commercial) use as I will be putting some good posts on my new domain and running ads on the side. So it's not exactly for personal use.

Well if you would make another Python or JS program that parses and scrapes a certain data from this forum, then you might want to try using cheerios (JS) or BeautifulSoup and make a certain get request that scrapes posts by id then turn it into HTML embeddable tag. But honestly, the API works better with less work to do, as well as it is free to use, both commercially or personal (API has no token nor Request Limiter)

Meanwhile, in python, there is a library called bbcode 1.1.0. I suggest you create a function that accepts an bbcode formatted string and use the library and its functions to process the accepted string and return it in a variable that you can later use to either show the embeddable HTML or just the html code. In javascript, I guess it would be harder as you would map all of the parsed text and use RegEx to replace certain tags.

I don't know if I'm fully right, but I hope my knowledge were helpful.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 08, 2020, 05:40:14 PM
#9
Have you tried using TryNinja's API? If you would read its documentation[1], you can see that it already parses/scrapes all of the Bitcointalk's posts and necessary data and turn contents into an HTML format. The documentation also shows how to use the API with python scripts. And if you would make a program or an application that shows the parsed content into an embeddable HTML, you could just use iteration to access on the Key named 'content' on the JSON format from the API's Response.

Good lord, TryNinja deserves money for this. it translated everything in my test post perfectly.

I'll see if I can get permission to reuse his source code that translates bbcode like that for my own (commercial) use as I will be putting some good posts on my new domain and running ads on the side. So it's not exactly for personal use.
legendary
Activity: 1904
Merit: 1563
December 08, 2020, 09:16:06 AM
#8
I am trying to make a Python script that can convert any bitcointalk post into the equivalent in HTML. This will allow you to naturally embed them in web pages for example. To do this I have to write a BBCode parser that outputs an HTML tag for each BBCode tag, and also handle all tags specific to Bitcointalk. Since this is tantamount to writing a state machine for an entire language, a daunting task, I have decided to look for some existing program to base my work on rather than write it from scratch. A fully-working program suitable for BTT does not exist as far as I know.

https://github.com/chaomodus/ppcode This is someone's bare bones implementation of a bbcode state machine, it needs a lot of work like handling url= and img= tags and recognizing "quote", it needs to be made case insensitive and it needs to handle all the other smileys, not just the smile face. But I think it will be worth it in the long run if I manage to build this. I forked it at https://github.com/ZenulAbidin/ppcode if you want to track its progress.

Have you tried using TryNinja's API? If you would read its documentation[1], you can see that it already parses/scrapes all of the Bitcointalk's posts and necessary data and turn contents into an HTML format. The documentation also shows how to use the API with python scripts. And if you would make a program or an application that shows the parsed content into an embeddable HTML, you could just use iteration to access on the Key named 'content' on the JSON format from the API's Response. Here's an example:


Code:
{
"result": "success",
"message": null,
"data": [
{
"post_id": 55763102,
"topic_id": 5295719,
"author": "Maus0728",
"author_uid": 1289002,
"title": "Re: 2 new Metamask phishing site thru Google Ads",
"content": "I don't know if everyone practices installing "uBlock Origin" as one of their browser add-ons.
Though I am fully aware that this is not an ad-blocker, however, based on my experience it can effectively
help solve these kinds of phishing attempts in a form of ads that is not carefully filtered by Google
\"Roll. Been using their services for quite some time and fortunately,
 I never encountered such scam attempts up to this date.

[1] https://ublockorigin.com/

",
"date": "2020-12-06T05:14:44.000Z",
"board_id": 39,
"board_name": "Beginners & Help",
"archive": false,
"created_at": "2020-12-06T05:14:47.342Z",
"updated_at": "2020-12-06T05:14:50.066Z"
}]
}

from: https://api.ninjastic.space/posts/55763102

As you can see, the content was already in a snippet form, if you would make a webapp that produces embedding of posts, better check his API for an easier job.



[1] - https://docs.ninjastic.space/
legendary
Activity: 2352
Merit: 6089
bitcoindata.science
December 08, 2020, 06:44:47 AM
#7
The problem with using an online tool is that it won't recognize any of theymos' added bbcode such as [ btc]. I'm not sure if that is the only one, but I would prefer the translation to be completely automated without having to manually replace tags.

I suggested that you use those online tools as a reference to your new program.
Then you could add the new specific rules (and remove some others which are not supported here)

For this specifc case of [ btc], it is simple to use replace to BTC

I took a look at this class and it is the letter B of a custom font type
http://bitcointalk.org/Themes/BTC.ttf
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 08, 2020, 06:36:37 AM
#6
Then why don't you look on the smf code? Forum software is open source and if I'm not mistaken, written in PHP.

That is a good idea. I found the source code for the SMF version bitcointalk's on (1.1.19, says so at the bottom of this thread) at https://download.simplemachines.org/index.php?archive;b=3;v=85 , now I just need to track all the additions that theymos made to the bbcode.

Also, I think that forumotion has done what you say, but not in python. Technically, it works. You write your BBCode, you copy it and then you paste it on an html form (admin panel). It pastes it, correctly. You can check their forum here: help.forumotion.com (or create your own forumotion forum, just to check the admin panel).

i made a quick search and found that:

http://www.bbcode-to-html.com/
https://www.browserling.com/tools/bbcode-to-html

Those are javascript implementations I guess. I found some of the source code here:
https://www.browserling.com/js/tools/xbbcode.js

I think also that javascript would be easier to distribute than python as well, as you could make an HTML page that would make the conversion, just like those 2.

The problem with using an online tool is that it won't recognize any of theymos' added bbcode such as [ btc]. I'm not sure if that is the only one, but I would prefer the translation to be completely automated without having to manually replace tags.
legendary
Activity: 2352
Merit: 6089
bitcoindata.science
December 08, 2020, 05:58:37 AM
#5
i made a quick search and found that:

http://www.bbcode-to-html.com/
https://www.browserling.com/tools/bbcode-to-html

Those are javascript implementations I guess. I found some of the source code here:
https://www.browserling.com/js/tools/xbbcode.js

I think also that javascript would be easier to distribute than python as well, as you could make an HTML page that would make the conversion, just like those 2.

A brief look at the code, he made dictionaries (or a json structure talking in js terms) with all possible tag/brackets
like this:

Code:
    tags = {
        "b": {
            openTag: function(params,content) {
                return '';
            },
            closeTag: function(params,content) {
                return '
';
            }
        },
        "center": {
            openTag: function(params,content) {
                return '
';
            },
            closeTag: function(params,content) {
                return '
';
            }
        },

Than he replaced them later on using that "tags" dictionary in some functions.
legendary
Activity: 1512
Merit: 7340
Farewell, Leo
December 08, 2020, 05:38:25 AM
#4
Then why don't you look on the smf code? Forum software is open source and if I'm not mistaken, written in PHP. Also, I think that forumotion has done what you say, but not in python. Technically, it works. You write your BBCode, you copy it and then you paste it on an html form (admin panel). It pastes it, correctly. You can check their forum here: help.forumotion.com (or create your own forumotion forum, just to check the admin panel).
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 08, 2020, 03:26:09 AM
#3
I am trying to make a Python script that can convert any bitcointalk post into the equivalent in HTML.

This looks simple to do, just using replace.

Code:
[b] I am bold[/b]

into this:

Code:
I am bold

You could also use , it is depreciated but still works.
In most cases I think replacing [ for < will do the job.

It's not as simple as you think. I can't just simply find and replace all instances of a bbcode tag like [ b] to . First you have the issue that those tags are inside a code block which is not supposed to be altered. Second, this doesn't work for all tags, things like [ center] and [ color] have CSS in the opening tag which must not be in the closing tag, so a find and replace won't work here. Third, some bbcode tags do not come in pairs and only exist as a single tag like [ hr] and [ btc], the first case needs to be converted into and he other needs to be converted into the Unicode character for bitcoin, and that's why I need to use a state machine. Or more accurately, someone else's state machine.
legendary
Activity: 2352
Merit: 6089
bitcoindata.science
December 07, 2020, 09:01:14 PM
#2
I am trying to make a Python script that can convert any bitcointalk post into the equivalent in HTML.

This looks simple to do, just using replace.

Code:
[b] I am bold[/b]

into this:

Code:
I am bold

You could also use , it is depreciated but still works.
In most cases I think replacing [ for < will do the job.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
December 07, 2020, 05:58:15 PM
#1
I am trying to make a Python script that can convert any bitcointalk post into the equivalent in HTML. This will allow you to naturally embed them in web pages for example. To do this I have to write a BBCode parser that outputs an HTML tag for each BBCode tag, and also handle all tags specific to Bitcointalk. Since this is tantamount to writing a state machine for an entire language, a daunting task, I have decided to look for some existing program to base my work on rather than write it from scratch. A fully-working program suitable for BTT does not exist as far as I know.

https://github.com/chaomodus/ppcode This is someone's bare bones implementation of a bbcode state machine, it needs a lot of work like handling url= and img= tags and recognizing "quote", it needs to be made case insensitive and it needs to handle all the other smileys, not just the smile face. But I think it will be worth it in the long run if I manage to build this. I forked it at https://github.com/ZenulAbidin/ppcode if you want to track its progress.
Jump to:
© 2020, Bitcointalksearch.org