https://bellmar.medium.com/how-to-be-an-amateur-bot-hunter-8c5ff1dc7bd

Social networks are full of fake accounts, tracing their networks and observing their behavior can reveal interesting things about how the world works. But only if you can find them!

Some posts from the fake influencer I created to trap bots ;D

I have a long running fascination with scammers. From the very first moment my father introduced me to the film The Sting, nothing has proved more entertaining than exploring an elaborately constructed con. It’s a form of storytelling in a way. The best scams say more about how their victims see themselves than the attackers.

When fake accounts started to take over social media (and they have taken over social media) I found the perfect hobby. These networks are used in scams, but they are also commodities. Accounts are bought, sold, rented — used for one purpose, but generating passive income from other activities. You can see evidence of this in the demographics of the bots’ fake identities. Accounts change names, topics, profile pics, pretending to be completely different races, genders, sometimes even changing from adults to children.

Recently, I’ve been spending my time researching a massive network running ambassador scams on Instagram, but in writing that post I found myself unable to stop from digressing into the mechanics of the amateur bot hunting itself — writing long paragraphs and then cutting them out over and over again.

So this is intended as a companion piece to that article. A tutorial on how I found the Vincere network and collected the data to explore it.

Pick Your Platform

One of the hardest parts of bot hunting is attribution. Good attribution relies on information most social networks will not expose to you in order to protect users privacy. So while actors absolutely operate across platforms, network analysis tends to stay on a single one.

Some platforms are easier than others. Twitter has no problem listing an account’s followers and who they are following. Activity data is just an API request away. Facebook, on the other hand, will restrict this data according the permissions your account has.

In general, anything that’s visible to you as a logged in user is usually accessible via an official API. As long as you respect the rate limits, you can usually grab the data you need to start studying the network relatively easily.

One notable exception is Instagram. Instagram is a massive pain in the ass about this. There’s lots of information you can see on the mobile app that you cannot see on the browser version of Instagram. There’s a lot of information you can see on the browser version that is not available via API. And as I found out the hard way: Instagram works pretty hard to keep people from collecting this information. Even though it is visible to them through just normal websurfing. On most platforms, passively gathering information is only grounds for a suspension or temporary ban if you’re not respecting rate limits or you’re scraping the site to power a competitor. Instagram is actually pretty hostile to security researchers studying their platform.

Which is unfortunate because Instagram is one of the platforms most useful to bad actors. A report commissioned by the Senate Intelligence Committee named Instagram as the platform with the highest ROI for Russia’s Internet Research Agency’s disinformation campaigns. Millennial pink yoga feeds have been feeding Instagram’s sizable QAnon presence and recruiting for white nationalism. Private meme accounts on Instagram are the feeder funnels for all kinds of extremism.

At a minimum what you need to build a network is nodes and edges. Nodes can be content or accounts, edges might be interactions or might be connections (following/followers). When you’re picking your platform you need to be mindful of how difficult it is to access that information. Other pieces of metadata — like account age, post volume, geolocation/ip addresses, contact information, etc — can be useful in drawing conclusions about the network itself, but your access will vary by platform.

Develop a Hypothesis

Like all good detective stories, bot hunting can have unexpected twists and turns. Pretty regularly my assumptions about what behaviors I’ll see on a particular network are thrown out when much more complex stories reveal themselves in the data.

Still, data collection is often a lot of work and there’s nothing more frustrating than realizing there’s a critical piece of data you didn’t capture and having to redo the analysis to get it.

When I started with the Vincere network, my first hypothesis is that the bots attempting to scam me were connected to the bots I was voluntarily interacting with on the same account for likes and follows. After all, the honey trap account was not connected to the wider Instagram community any other way (or so I thought at the time). I did an initial data collection to test this hypothesis — which revealed no connection at all — then would from time to time revisit and retest it with more data. Within a week I had a small set of hypothesis that I would routinely consider as the network grew. This helped me figure out how to build the network over time — which nodes to prioritize, how deep to go.

Protect Your Actual Account

It’s best practice to respect the platform’s ToS and not run the risk of being suspended or banned at all, but the reality is that social network companies are not incentivized to be friendly to researchers. Trust me, they are not unaware of the problems they have with fake accounts and inauthentic activity. As soon as you start bot hunting, you will realize how easy it is to find lots of suspicious accounts with even simple tools, and you will ask yourself how it is that so much fraudulent activity happens in the clear.

Part of the reason is while you have the luxury of craving out a small corner of the graph to explore at your leisure, moderation teams must drink from the firehose … where there are often worse actors posting much more awful things. At the same time, conventional wisdom is not misplaced cynicism here: if social networks got better control over the fake accounts on their platforms, that would mean lower engagement levels and fewer active users driving advertising prices up.

For that reason, bot hunting always carries a risk. If you care about your personal account on your platform of choice, it’s a good idea to set up a separate account to do your research from (and also maybe backup your data before posting something about your activities that goes viral)

Bringing the Tech

If your platform of choice is not hostile to researchers, all you need to do to get your data is grab an API key and write a little python. On hostile platforms you sometimes how to get a little creative. I ended up spending a weekend writing a Chrome extension that would capture and transform responses sent to the browser into CSV data. That allowed me to just scroll through follower lists and copy/paste the output.

Define Your “Bot” Criteria

My first bot hunt showed me how much gray area there is between a purely fake account and an account that is partially automated or demonstrates some elements of inauthentic behavior. Before you can trace networks of fake accounts across a platform you have to decide what qualifies something as a fake account.

There are two main strategies for determining an account’s status: looking at their activity or looking at the account characteristics. The bots I was interested in with the Vincere network could be identified based on two factors:

Account characteristics

  • They followed some standard naming convention consisting of brand name + first name (or random word) + number. The different parts of the name are often separated by a divider — typically a . or an _. Sometimes the order of components is different (first name before brand name)
  • Identical profile info and content to other bots in the network
  • Profile instructing visitors to DM another account for more information

Activity

  • Is following a bot in the network
  • Has posted a “DM for collab?” style message.

Account characteristics tend to be easier to harvest than activity, but activity is sometimes the only way to catch well designed networks. It’s common, for example, to base the profile data of sophisticated fake account on stolen data from legitimate accounts. The only way you catch these accounts is by having some kind of theory about what suspicious activity looks like for them. On platforms like Twitter where volume is key, I like the Atlantic Council’s Digital Forensic Research Lab average post-per-day metric. On a platform like Instagram the easiest activity to look out for (although not the easiest to automate) is the same images reposted across accounts — or even slightly different closely cropped versions of the same image posted several times on the same account.

Mind the Noise

Networks love coincidences. The more layers of connections you build out, the more likely you are to find coincidences. We aren’t all actually six degrees of separation from each other, but the general principle that the average social distance in a network is logarithmic to the size of the population holds true. You’ll find strange connections that fundamentally mean nothing just in the superficial one-degree-of-separation pass. On the Vincere network I found a NBA pro basketball player, a teenage boy’s e-gaming group, a social media marketing company in Dubai. The deeper you go to friends of friends, the more noise you get.

In order to effectively bot hunt it’s important not to jump to conclusions. Especially if doing so means calling someone out or exposing personal information. I like to blog about my observations on bot networks I’m tracing, but I don’t reveal the identities of nodes unless I’m 100% sure they are in fact fake accounts. Especially — ESPECIALLY if the account appears to be a minor. It takes a lot of be 100% sure and people get really annoyed at me that they can’t “check my work” by looking at the data for themselves. I get this, but again … networks love coincidences.

Finding Bots and Tracing the Network

One of the easiest ways to start tracing a network is by throwing a couple of bucks at the bot dealer and seeing what shows up. In order to be effective, fake social network accounts need followers and activity on their content. You can be reasonably sure that accounts following a fake account are suspicious as well. From a relatively small investment you can find thousands of fake accounts to explore.

However, whether there’s anything interesting in these networks depends on your perspective. You are unlikely to find foreign spies spreading disinformation on these networks. But for me these networks are ideal. I’m interested in secondary markets. I do not believe that selling likes and follows produces enough profit to make sense and want to find evidence that these bots have additional streams of income.

Another interesting thing you can do with a network built from purchased bots is identify the network’s other customers. Although you will always find legit accounts mixed in for various reason — a legit account may be automating itself to follow back any new followers, for example — when looking at who a fake account is following (versus who’s following them) it’s common to see lots of legitimate accounts who might have bought followers or might have been targeted by the scam.

Another option for finding networks to trace is to scrape the interactions on accounts you know will be targeted by bots. Activists, journalists (particularly political analysts), and politicians are all common targets for malicious actors. The approach is nice because you’re not actively supporting the bot economy while conducting your research, but it’s noisier and harder to draw conclusions on. I learned this the hard way when I found a network of automated accounts around the failed Presidential campaign of Andrew Yang. Attribution was going to be impossible — was this some nation state meddling? Or was it legit Yang supporters using gray tactics? My inability to definitely say where the fake accounts were coming from meant I had a bunch of angry Yang Gang supporters accusing me of crafting a hit job.

The last option is to try to draw the bots out via content. I found the Vincere network in this fashion, although that was a complete accident. My original plan was to build a base network from fake followers and likes bought, but in order to get the fake followers I needed content to point them to, so I set up a honey trap on Instagram and started posting.

….And because I’m always just a little bit extra, rather than posting bland quotes, memes, or stock images (all of which would have served my purposes just fine) I bought some 3D modeling software, a $600 refurbished gaming computer and created a virtual influencer.

A side effect of this decision was that — while I doubt my 3D fake influencer art would fool anyone who bothered to take a closer look, it absolutely fooled the bots that hunt for gullible wannabe influencers and the human operators that command them.

But in general the internet is vast and finding bots via content is super inefficient. All the bot networks I’ve hunted via content have been pure serendipity. I found a stray bot on some legitimately posted content and decided to pull the thread.

Happy Hunting!

Nothing will teach you more about what gives social networks their power than hunting the fake accounts attempting to exploit that power. Bot networks also shine a light on how international and globalized the spaces we tend to think of as “American” really are. The world is big and full of different agendas and incentives. Some bot networks are built by large and powerful nation states, some are built by entrepreneurs looking to carve a living out from impossible situations. There are lots of interesting nooks and crannies to explore.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.