Author

Topic: Project Ideas: Visualizing Shilling (oh and hi again) (Read 215 times)

sr. member
Activity: 602
Merit: 295
Hail Eris!
Shilling is generally on the more positive side, but I've also seen those that are a part of a bumping farm be both somewhat negative towards the Altcoin, and positive. I guess with bumping farms all they care about is creating some sort of commotion, and therefore giving the illusion that the coin is a popular one, since it no longer bumps the threads in the section, unless of course they're unaware of that.  

The actual interest in this topic came from wanting to model and detect pump and dump schemes which where popular and openly coordinated back in the day.  I can probably find and make a list of these coordinated pump and dumps to use as labeled data as we know some of them were shilled.  And yeah definitely need to bring in coin volume, sentiment volume, and maybe even some bumping metrics.  

Bumping farms "should" just increase volume but in a big data sense preserve the sentiment of the users which may or may not indicate positive or negative shilling.  That is the patterns should still be there, right?  Unless they bump with bias in which case thats just adding to the shilling pattern even more.

Quote from: PrimeNumber7
You also do not necessarily need to create exotic visualizations.

You have no idea how hard your eyes will roll when I tell you what I have up my sleeve next.  You see, I recently decided to get into game development so bought a bunch of books on Unity 2021, but now got back into and excited about bitcointalk which leads me in a different direction (an incongruity!). I also have a really weird sense of humor so came up with the most ridiculous thing to blog about on here which is to do the Bitcointalk Visualization Project (um there is a github somewhere with the project manifesto) inside a Unity based VR game.  Like it is going to be so ridiculous, prodding the crawler to get it to process a thread, which it spits out page objects I have to stack up before hobbling over to the chute you throw them in that spits out a word cloud - or the NLP pipeline being an actual pipeline of weird objects, the stopwords remover spitting out a pile of removed words I have to shovel away, and so on.  Inverted decision trees everywhere!  Oh god I get so giddy thinking about throwing that out with a straight look on my face.

Oh and it might be actually pretty awesome to be able to interface with BCT as well as analyze historical BCT data  (keeping to strictly benign page access limits!) inside a weird VR world...  but yeah not sure if anyone else will get it.
staff
Activity: 3332
Merit: 4117
My experience with reading altcoin threads is that it is unusual for "everyone" to be in agreement. I think it is likely to be more common for a group of accounts to defend a project, or for there to be a group of accounts that is positive about a project in order to drown out negativity.
Shilling is generally on the more positive side, but I've also seen those that are a part of a bumping farm be both somewhat negative towards the Altcoin, and positive. I guess with bumping farms all they care about is creating some sort of commotion, and therefore giving the illusion that the coin is a popular one, since it no longer bumps the threads in the section, unless of course they're unaware of that.

I do expect timing to be important which we should see if we visualize things right.
Absolutely, I think timing should be included in anything that's analyzing data. Even, if it appears to offer no additional insight at first, as you build a better data set, timing can be quite a good indicator when linked with other forms of data. I would include it as a standard going forward. 
sr. member
Activity: 602
Merit: 295
Hail Eris!
Of course not everyone is in agreement.  The question is whether or not natural occurences of disagreement will 'look' different than shilling which should take on different distributions and something to hope to be able to visualize so you can differentiate.  And if not what features it would take.

As for removing some of the dimensionality I totally agree.  The thing is I was using another means of assigning 'positive' and 'negative' sentiment which where independent of each other and I wanted to differentiate between situations where we have one score being low and the other high versus both scores being high or low.  With naive bayes of course because they add to one you can preserve the information with a much simpler visualization.  Also keep in mind I thought this one up like 4 years ago when I wanted a project to visualize bitcointalk phenomena.

Another thing which might be illuminative is that I am specifically interested in visual data mining which while it does utilize various traditional ML approaches for modeling and feature generation there is some visualization component which lets the data tell a story.  I love seeing data tell stories.  Thus there are tons of other things I could do for this task but yeah do have that ulterior motive.

Thanks for the feature ideas, the more the merrier - I have some ideas (mostly semantic analysis) but am naive.  I do expect timing to be important which we should see if we visualize things right.  I will reread your posts and let your ideas sink in.

Also this toy concept is kind of a goal in terms of having fun, coding, and making my post limit - I would have to start over and build up so back to simple things like NLP basics (who doesn't love comparing shitposter and non shitposter word clouds?) and expect it to hit some hurdles. Also looking forward tro learning.  Anyhoo I just hope my experimentation is going to be looked positively on rather than bugging people with sloppy research and rambling.  I guarantee there will be some good things here.

Speaking of shitposters I would love to see how the semantic distributions differ when someone is doing a signature campaign and not.  I don't want to be one of 'those' people.
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
My experience with reading altcoin threads is that it is unusual for "everyone" to be in agreement. I think it is likely to be more common for a group of accounts to defend a project, or for there to be a group of accounts that is positive about a project in order to drown out negativity.

As I said, I don't think that sentiment alone is going to be enough to detect shilling. I suggested some features for a model that might be able to be the input of a model that might be able to predict if shilling is occurring in a thread.

You also do not necessarily need to create exotic visualizations. For example, you can remove one dimension from your chart in the OP, as a binary classification model should use an activation function that forces the sum of the prediction outputs to equal 1.0. If you are predicting if shilling is occurring, you can probably just give a raw prediction. Or you could create a time series chart that shows when the probability of shilling was occurring over time.
sr. member
Activity: 602
Merit: 295
Hail Eris!
Thank you so much for the feedback.  And I absolutely agree.  A new cluster of opposing sentiment might not be enough (some people might just hate a new feature). I just have to start somewhere pursuing my hobby combining NLP and ML with Altcoin Analysis.... in a way I can entertain you guys.

As for situations which seem like shilling the hope is we can visualize the data in a way which shows the difference.  One hypothesis is anything a human can do we can do with data mining (or visual data mining which is what I do) and if YOU can differentiate between actual shill situations and the one you referenced which was a justified shift in sentiment distribution then we can visualize the data in a way which allows the same classification.  

Here is an example I use myself:  there just happen to be products out there with mixed reviews, that is some people like it and some don't, where shillling doesn't exist.  In this case one can hypothesize that the incongruity here will take on a different distribution than one we see with shillers who are artificial in nature.

One thing I am curious about is if I can't accurately model and detect shilling is bringing in things like semantics (keyword frequencies) and such will help.  If a person can do it by analyzing the thread there has to be a way.

I could always go back to word clouds as a means of making my signature tokens.   I am actually pretty open to what I do - I just need to do something.  It would be cool to see a plot of word clouds over time for different threads (maybe plotted against their tokens price or sentiment or something).
copper member
Activity: 1666
Merit: 1901
Amazon Prime Member #7
When taproot was "locked-in", I would expect to see sentiment regarding bitcoin to increase. Ditto for when the bitcoin-futures ETF was approved. Does this mean there were bitcoin shills posting during these times? No, I would be very surprised if there was any actual evidence to support this.

I am not sure what your actual goal is, so some of my advice may or may not be helpful to your project. You could compare sentiment over time to the price of a particular altcoin. For example, the sentiment on a particular day could be compared to the current price, the price some number of days in the future, the price some number of days in the past, or by some measure of volatility in the past, present or future.

Some people have a genuine interest in a coin/token, even if there are shills promoting the coin (this is the point of shilling a coin). So it is possible that someone speaking positively about a coin when the coin is being shilled. However, if there are a group of accounts that consistently are positive about a coin when there is apparent shilling, there is a good chance that the group of accounts are professional shills.

If you can create a model that you are confident can accurately predict if a post is positive or negative about a coin/token, you can use the output of that model (in aggregation) as an input for a second model that can predict if shilling is occurring on a particular date. Other inputs for the second model could be data points such as the percentage change over the previous x amount of time as of the date in question ("x" could be substituted with "x, y, and z"), the percentage change in price over the previous x amount of time as of y amount of time in the past, and/or in the future, the number of posts made in a thread on the date in question compared to the number of posts made the previous week, expressed as a percentage. You can be creative and will have to experiment some.

Your second model will have to utilize some unsupervised learning method, as I don't think you can reliably label a thread being shilled at a sufficient scale.
sr. member
Activity: 602
Merit: 295
Hail Eris!
Guess I am back and need to find a productive way to support our POS coins (proof of shill) with fancy signatures so I am attempting to start the project I was starting before some jerk offered me a job and distracted me - the bitcointalk visualization project whatever.  Basically using ML and visualization for things like shitposter modeling, shilling detection, strange sentiment analysis etc....

Anyhoo who knows, rather do this than shitpost.  And it is good for me.  So lets talk about how to visualize shilling within things like forum threads (or product review sets which many are)..

I made a stab on it based on the idea of visualizing distributions of sentiment over time so we can see when incongruities of sentiment form (which should be like two opposing clusters) and applied it to a toy situation involving a toy set of product reviews.

Step 1:  Build a classifier to provide positive and negative sentiment scores.  I used Python and Naive Bayes (though I am addicted to decision and regression tree ensembles now but whatevs this was awhile back) and some sentiment labeled reviews.

Step 2:  Simulate a situation where a shill starts posting.  Yeah so I just came up with a bunch off the top of my head that were clearly positive and negative     

         "wonderful wonderful wonderful",
         "real neat ",
         "best movie of the year",
         "the movie was great I loved it",
         "best acting favorite",
         "awful",                             <--- shills start
         "the movie was amazing",
         "terrible movie",
         "favorite actor",
         "worst movie bad acting"
         "most amazing movie ever",
         "sucks bad"

Step 3:  Score each statement with your classifier which as Naive Bayes does gives both positive and negative sentiment scores.
Step 4:  Plot these as colocated coordinates, that is your x axis represents negative sentiment, y axis positive, and the z axis would be time.

And you end up with this!


Observation:   Basically it is simple enough to speak for itself but in this toy situation when the shillers start posting a second 'opposing' cluster of sentiment forms. 

Problem:  Of course applying this to real world messy data.  These forum threads are not like my toy data set and might need to do some serious many dimensional data visualization wizardry to bring them out of the noise.  Second building a classifier using specifically altcoin discussion data as it is going take some cleverness to label enough of them for supervised learning (we could try unsupervised fun stuff though).  We are not a product review site and have all our own lingo. 

Next steps:  This has potential to go somewhere so I would build a new model using altcoin data on twitter, set up my BCT crawler and get it running because you have to adhere to crawling limits around here, then just start visualizing.  Though probably switch to classification trees... they tell stories.
Jump to: