Pages:
Author

Topic: VOTE PLEASE > [Request]Use of Homographs to be forbidden. (Read 786 times)

legendary
Activity: 2184
Merit: 3134
₿uy / $ell
I guess if homograph character set(cyrillic characters) support is disabled in this forum where primary language of posting is English then it will be very easy to detect the homogarphs, They  will convert to gibberish and everybody will understood that cyrillic  character are used to hide copy-pasting. I guess it will reduce the work of everybody.

No, the characters are converted automatically to Latin, but searching for them get them listed, see here for example :
https://i.imgur.com/j0CPpxA.mp4

I want to add them to the rules so they can be easily reported.

Instead of trying to make new rules to stop everything which will likely not be enforced any more heavily than it already is, use this discovery to build tools to find these people faster and report them.

If I get them banned for using hompgraphs then I can easily reported them, even list them automatically.
legendary
Activity: 3318
Merit: 1958
First Exclusion Ever
Instead of trying to make new rules to stop everything which will likely not be enforced any more heavily than it already is, use this discovery to build tools to find these people faster and report them.
jr. member
Activity: 56
Merit: 7
I believe that the use of homographs definitely suggests the intention to hide plagiarism and so a form of ASCII binding could be set up to deny the use of homographs. This would prevent plagiarized character strings appearing identical to the original if a web scraper or bot was to view the strings, but the scraper should still be able to tell that 99% of the text is copied.

I'm not sure if Bitcointalk has a bot that checks for plagiarism but I assume it does, (or else the use of homographs here wouldn't make a difference) and so another alternative would be to check any text that uses 'mimic characters' with more scrutiny. For eg, a string without the use of homographs could clear a plagiarism check if 70% of the text is original, but a text with homographs might only pass if 90% is original, or not pass at all because the only logical reason for the use f homographs is to evade recognition.

A third option would be to just report simular looking posts to moderators, but even here, the use of homographs to conceal plagiarism is negligible.

So the use of homographs logically means that it is the posters intention of hiding plagiarism from bots so the best course of action would probably be to implement some function into the web crawler that check Bitcointalk and give it permission to delete and ban all posts and accounts that use homographs. The Armenian characters set has characters identical to the Latin character set, eg. o, n, u, S and Լ. There isn't a Latin board here, so removing Latin characters from this forum could work.
sr. member
Activity: 742
Merit: 395
I am alive but in hibernation.
I guess if homograph character set(cyrillic characters) support is disabled in this forum where primary language of posting is English then it will be very easy to detect the homogarphs, They  will convert to gibberish and everybody will understood that cyrillic  character are used to hide copy-pasting. I guess it will reduce the work of everybody.
legendary
Activity: 2184
Merit: 3134
₿uy / $ell
Finally the homographs problem is solved and I'm locking this thread.
I can reopen it if the situation get worse.
Thank everyone for the support. Smiley



 150 new hompgraphs are posted every day, im monitoring this for the past 6 days.
I think it's time to reopen this thread again.

I want to add them to the rules so we can report them directly without looking for plagiarism.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
Big drawback is that LoyceV's script cannot search for homographs. I have to hunt them manually.
I can search for them, but only for Newbies and it's more work.
If you want me to search for them, can you make a list of common words and show the both the  homograph version and the HTML equivalent?
Example:

wallet
Code:
wаllеt

If one word has different variations, post them all. I'll continue downloading new patrol pages next week, I'm not online much at the moment. But feel free to use this time to collect them for me.
legendary
Activity: 2184
Merit: 3134
₿uy / $ell
Just a few from the latest ones,
Please, tell me again that this homographs are not used to hide plagiarism/copy-pasting?

Big drawback is that LoyceV's script cannot search for homographs. I have to hunt them manually.

C'mon let's add them to the rules people!!

I've written a inquiry in telegram, waiting for your return!
Riveting job, cognitive idea!  Good luck guys.

I have sent a inquiry in telegram, waiting for your answer. Really good vision, noticing approach, unblemished website!

I have sent a inquiry in telegram, waiting for your reply.
Very nice project, percipient design, excellence project!

I've sent a request in telegram, waiting for your return!
Spotless project, very nice logo, cognitive design.

I've sent a inquiry in telegram, waiting for your answer!
Riveting business, aesthetic project, impeccable plan!

legendary
Activity: 2184
Merit: 3134
₿uy / $ell
This topic requires a higher priority. While it used to be a relatively small problem, I'm afraid I've pushed massive spammers to using homograph attacks by getting many of their spambots banned.
See here for many examples of spambots who started using homograph attacks today, which they didn't do yesterday.

I do check for homographs almost ever day, or in the worst cases evety other day. I can tell you that almost all of the "single character" hompgraphs popping up in the search results ( I just check the past day/two days posts ) are made from newbies for the first time, refering to the "a very" case I have asked you to check a few days ago.

Let's be honest, all those are bots, it is known.
From time to time I spot a regular hompgraphs with many vocals replaced, but those are very rare now, and mostly posted by the "usual suspects".
I stopped reporting them as I got a bad report on one case - this one below, and I still have 49 hanging.

you say in your whitepaper and that the traditional digital advertising has a lot of issues right now which is not good I suppose... But do these issues really influence market in some bad way? I mean, I think it is alright in its current state and don't necessarily require radical changes. I still think your solution is great though, I just think I doesn’t worth it

So, I know this is getting bigger, it was big before too, but as I started reporting the thing started to look better.
Seems like the time spend reporting is a bit waisted, as the banned accounts are easily replaced by new ones, and everything done with a script.

bump
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
This topic requires a higher priority. While it used to be a relatively small problem, I'm afraid I've pushed massive spammers to using homograph attacks by getting many of their spambots banned.
See here for many examples of spambots who started using homograph attacks today, which they didn't do yesterday.
legendary
Activity: 2184
Merit: 3134
₿uy / $ell

There is no problem in detecting them but reporting them. Let me explain.
Using only homographs is no rule-breaking, the problem is that those who use this technique are trying to hide copy-pasting.
You could argue it's not English, which isn't allowed on the English boards, but it's a bit far fetched.

Quote
I just want to add the homographs to the rules, because using them is no beneficial for the forum at all.
Doing so, you can directly report the homographs and skip the plagiarism part.
Since there is no legitimate use for it, they should just be banned. It's clearly abuse.

I gave up looking for them though, because I can't see which accounts are banned already.

I have around 50 hompgraph reports hanging as unhandled for more than a month now and I stopped reporting them. Not enough time to waste on listing them on my rule-breakers list, so I'll push this thread a bit until we have a clear solution to the hompgraph problem.
Actually I don't see so many now, just 30-40 per day max, before they were like a few pages in the search results.

Bump, another 20-ish for the past day. I'll start reporting them as soon as they are forbidden.


Bump.I have reported already a few from today, lets see what the mods will do.
legendary
Activity: 3290
Merit: 16489
Thick-Skinned Gang Leader and Golden Feather 2021
There is no problem in detecting them but reporting them. Let me explain.
Using only homographs is no rule-breaking, the problem is that those who use this technique are trying to hide copy-pasting.
You could argue it's not English, which isn't allowed on the English boards, but it's a bit far fetched.

Quote
I just want to add the homographs to the rules, because using them is no beneficial for the forum at all.
Doing so, you can directly report the homographs and skip the plagiarism part.
Since there is no legitimate use for it, they should just be banned. It's clearly abuse.

I gave up looking for them though, because I can't see which accounts are banned already.
legendary
Activity: 2184
Merit: 3134
₿uy / $ell
I am not sure if there might ever be a legitimate need for the use of these symbols. If not, we may want to change SMF settings so that users cannot use these symbols, or that these symbols are automatically changed to the letter they are designed to look like.

We are waiting for reaction from theymos and I know it can take a few more months, in the mean time I just want to report all those using homographs but I have to find another reason to report them, coz using homographs is not against the rules... yet.
copper member
Activity: 2870
Merit: 2298
I am not sure if there might ever be a legitimate need for the use of these symbols. If not, we may want to change SMF settings so that users cannot use these symbols, or that these symbols are automatically changed to the letter they are designed to look like.
legendary
Activity: 2184
Merit: 3134
₿uy / $ell
Are there any legitimate reasons for needing to use monographs in a forum post? I can think of none. Can anyone correct me?

If there are none, then the only reason to use them is to hide plagiarism, in which case they should be banned.

No reasons whatsoever..

OK then, a question:
What it takes to add something to the rules, when it comes to something abuseful like homogprahs?



Another bump. I'll do this until there is a reaction on the case.
Already 22 cases from today only /one is a copy/paste report actually/

And a new bump and anoter 24 cases only for today...

legendary
Activity: 2268
Merit: 18509
-snip-

Are there any legitimate reasons for needing to use monographs in a forum post? I can think of none. Can anyone correct me?

If there are none, then the only reason to use them is to hide plagiarism, in which case they should be banned.
legendary
Activity: 2184
Merit: 3134
₿uy / $ell
Here's an even better tool for checking for homographs: https://www.textmagic.com/free-tools/unicode-detector

Just copy and paste the text in, and any Unicode character will be highlighted in red. If you suspect someone of using homograph plagiarism, you can go their profile and copy in an entire page of recent posts to check them all in about 10 seconds.

This is a great tool, thanks.

There is no problem in detecting them but reporting them. Let me explain.
Using only homographs is no rule-breaking, the problem is that those who use this technique are trying to hide copy-pasting.
But to accuse someone in copy-pasting first you have to correct the post back to normal Latin characters, and then search for the original posts. Which takes time, even if you are using Word with "replace all" option.

I just want to add the homographs to the rules, because using them is no beneficial for the forum at all.
Doing so, you can directly report the homographs and skip the plagiarism part.



Bump, last 3 days history with more than 70 cases of using homographs:
https://i.imgur.com/F9np4wB.jpg
legendary
Activity: 2268
Merit: 18509
Here's an even better tool for checking for homographs: https://www.textmagic.com/free-tools/unicode-detector

Just copy and paste the text in, and any Unicode character will be highlighted in red. If you suspect someone of using homograph plagiarism, you can go their profile and copy in an entire page of recent posts to check them all in about 10 seconds.
legendary
Activity: 2184
Merit: 3134
₿uy / $ell
I'll keep bumping this thread until there is some reaction on the case.

What can be done >
  • Theymos adds feature that automatically converts the homographs to Latin outside the Local section
  • We all together stop this madness as we list using homographs on the rules

The spammers already made an improvement. Instead of changing all the letters with homographs, now they change only one letter, which is more difficult to detect / at least they think so/.

Here is an example form the last few days. The marked with yellow letter is Cyrillic "a".
... image loading




Useful link to look at when checking for these: http://unicode.org/cldr/utility/confusables.jsp

Thanks man for helping, this is a good tool to see what type of characters to look for.
copper member
Activity: 2562
Merit: 2504
Spear the bees
Useful link to look at when checking for these: http://unicode.org/cldr/utility/confusables.jsp
legendary
Activity: 2184
Merit: 3134
₿uy / $ell
I just added poll, so hopefully we'll see the big picture in a few days, of course if most of the people are active and willing to vote Smiley
Pages:
Jump to: