The Detection Of Bot Users In Twitter

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Abstract— In recent years, the popularity of social networking sites proves a lofty raise in their users. Twitter, a microblogging service less than seven years old, commands more than 500 million users as of July 2012[1] and is still growing fast. In twitter users communicate with each other by publishing text-based posts which can be a maximum of 140 characters [2]. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which becomes a tremendous threat for the internet users. In this paper we differentiate the trustworthy human users and the automated programs (bots) to assist the user to know with whom they are interacting with. We developed few sample bot programs to compare their characteristics with the characteristics of normal human user. We observed the difference among human and bot in terms of tweeting behavior, tweet content, account properties and its network feature. Based on the measurement results, we propose a classification system, which effectively classifies the user type.

Index Terms—Bot, Security, Network, Twitter, classification

Introduction

The term bot, derived from word "robot" in its generic form is used to describe a script or a set of instruction or program that are designed to perform predefined function repeatedly and automatically. Although bots originated as a useful feature for carrying out repetitive and time consuming operation, they are being exploited for malicious intent [3]. Now a days various attacks including spamming, phishing, click fraud, distributed DOS attack, key logging, etc. are made possible using the bots.

Twitter is one of the high ranking and prominent social networking website that having more than 5000 million users as of July 2012. Distinctive feature of Twitter is its openness and simplicity. The user interacts with each other by a text based post called tweets. The maximum size of a tweet is fixed as 140 character. Twitter relationship is bidirectional, which is a unique feature when compared to some other commonly used social networking sites such as Facebook, Orkut, etc.[4] In Twitter terms, if A follows B then it is said that A is a follower of B and B is a friend of A.

Users can group posts together by topic or type by use of hashtags – words or phrases prefixed with a "#" sign. Similarly, the "@" sign followed by a username is used for referring or replying to tweets of other users. To share or repost the tweets made by others retweet function is used and it is symbolized by"RT" in the tweet.

Twitter has become something most important as a discovery engine for finding out what is happening right now. The open structure of the twitter paved way to massive backdoor programmer to write a bot programs to attract people and rob their information or infect one’s computer.

Background and related works

Botnet

Bot is an automated program that runs in internet.. The prominent characteristic of bot is that they connect to a central sever, successfully compromising the host system thus forming a network and hence it is named as botnet. The first bot was created by Jeff Fisher in 1993 and it was named as Eggdrop which is used on IRC for verifying the status of the internet connection. From 1999, bots were programed to implement some malicious activity. The bot creation process do not require expert knowledge on programming and technical aspects, as lot of bot frameworks are sold online using which one can develop their own bot of specified function. Preventing a system from botnet requires a better awareness on computer security issues. Latest operating system, updates, firewalls, antivirus, minimize the risk of automation.

Social Networking Site

A social networking service is an online site or platform which helps in building relationships among people. It consists of the profiles of each individuals and their social links so as to develop the relationships. These services are web-based and it helps users to have a contact with people over the internet. Some of the online community services are group-centered whereas social community services are individual-centered. But both are usually considered the same. These sites allow users to share one’s own ideas, their events and their interests within their individual networks. Because of its popularity, such sites become an ideal target of exploitation.

Twitter and its Security issues

Some basic terms and function to understand the working of twitter are summarized below

Tweets: Messages posted by the users. Each message has its own limitation which is 140-character in size. so users who likes to post a URL, shortens them.

Follows: In general, it denotes a friend of a user. if A follows B, then B is a friend of A.

Followings: If a user A follows a user B, then A is a follower of B.

A user can receive the tweets from all user who is the friend of the user.

Mention: The term is mention used when a user is willing to mention some other user in his/her tweet even though the user is not his/her follower.

Retweet: A user can share and spread one's tweet using the function Retweet.

Most of the networking sites are mostly reliable on open-source software. The Web interface of twitter uses the Ruby which runs on Rails framework,. And deployed on a performance enhanced Ruby Enterprise Edition. Every tweet updated by user is provided with a Unique ID called Tweet ID using a special software called snowflake. The tweets updated by the user are stored in a MySQL using Gizzard and acknowledgement is being sent to the user when the operation is succeeded. Firehose API is used to exchange data with the search engine. FlockDB itself manages this process which takes an average time of 350ms. Geolocation data is stored using software called 'Rockdove'.

Twitter messages can be both public and private. Twitter has a special quality that it identifies some of the unique information from its users and shares it with the third-parties. The twitter reserves the right to sell their information as an asset if the company changes hands. Twitter does not allow any advertisement in their sit. The advertisers can advertise their product or service using the tweets specifically to a user based on the history of one’s tweets.

Bots in Twitter.com

The growing user population and open nature of Twitter have made itself an ideal target of exploitation from automated programs, known as bots. Like existing bots in other web applications, bots have been common on Twitter. Twitter does not inspect strictly on automation. It only care for authentication primarily at the time if registration through the recognition of a CAPTCHA image. After gaining the login information, a bot can perform most human tasks by calling Twitter APIs. Automation is a double-edged sword to Twitter. On one hand, legitimate bots generate a large volume of benign tweets, like news and blog updates. This complies with the Twitter’s goal of becoming a news and information network. Bots in twitter exploit the normal user by spreading spam messages and some malicious content through tweets or direct message. Basically bot recursively follows many users expecting that few may follow them back.

Related Works

Twitter is a most notable social networking site that is in use since March, 2006. There are few related literature in the microblogging service of twitter [6] [7] [8]. To know much about the usage of twitter, Java et al. [9] studied over 70,000 tweets and categorized into four main groups namely daily chatter, conversations, sharing information or URL and news reporting. Spam is a good indicator of automation. Most of the bot users are also a spam user. Alex Hai Wang made an investigation on users and tried differentiate normal users and spam users.[10]. In his investigation he examined around 25,000 users. He made use of directed social graph model to explore the "follower" and "friend" relationship. The Bayesian classifier is used to differentiate the suspicious user from the normal user.

Nikil Kant Gupta et al.[11] proposed a technique to select a followers based on the strength their social tie , so that the risk of sensitive information sent to unknown people is decreased. To achieve this two tools have been used – Exclusivity meter and the Twitter Response Estimator.

J.Song et al.[12] worked in twitter to filter the spam message Their work stated that conventional spam filtering methods are ineffective. They detect spam based on account features such as content similarity and ratio of URLs. In this paper we propose a noval spam filtering system which detects spam based on relational features such as distance and connectivity between the sender and receiver. From the analysis it is found that most spam comes from the account that has less relation with the receiver.

X.Zhang et al [13] proposed a framework to detect spam and promoting camping in twitter. The framework consist of three steps. First step is to connect the user who posts similar URLs. Next step is to extract candidate campaigns. Finally distinguish the user based on the URL in tweets.

Sangho Lee et al[14].studied the URL posted as a tweets in twitter and tried to detect suspicious URL. This paper stated that conventional spam detection schemes based on account features and relational features are ineffective against feature fabrications and consume much time and resources. In this paper they proposed Warning Bird, a suspicious URL detection system for Twitter. It investigates correlations of URL redirected chains extracted from several tweets. Because the resources are limited for the attackers, their URL redirect chains frequently share the same URL’s. Conventional suspicious URL detection systems are ineffective against conditional redirection servers that distinguish investigators from natural normal browsers. Warning Bird is robust against conditional redirection. It mainly focuses on the correlations of multiple redirect chains that share same redirection servers.

Rodrigo et al [15] describe the way of developing a chat bot in twitter social network which works without any intervention of spam detection technique used by twitter. To successfully implement a Chat Bot, a lot of factors must be considered. It is essential to monitor continuously its operation at the early stages and, if necessary, appropriate changes must be made. Furthermore, the Database must be persistently updated to add new search terms, keywords, or answers that are more consistent with the people interacting with the Bot.

Gianvencchio et al [16] studied a large scale collection of user accounts and classified into three types of users namely Human. Cyborg and Bot. The details such as tweets, tweeting source, time of tweet, number of followers, number of friends, date of account creation, tweeting device etc are considered to classify the users. They observed the difference between the human user and bot.

The rest of the paper is organized as follows. Section III presents our data collection approach. Section IV describes about the data analysis and botnet detection algorithm based on the account features and network features of a user. Section V is the experimental evaluation for our detection model with over few sample twitter account. Finally, some concluding remarks and future work are given in Section VI.

III.DATA COLLECTION

Twitter has become one of the important and widely used social networking site which handles millions of user accounts simultaneously. In this paper we are using Twitter API[16] to obtain public data. The data collection process has two parts/ In the first part the public account details of the user is collected using web crawler program. In the second part a sample bot program is developed. The program is executed and the flow of network packets are examined and stored. Fig 1 shows the interaction of web crawler with Twitter database and the user[18].

Twitter DB

Web Crawler

Twitter API

User Program

Fig 1: Interaction diagram

Using Web Crawler around 5000 account details have been obtained which served us with 1,00,000 tweets. The details that are derived from a user are as follows; userid,username, account creation date&time, language, country, time zone, protected/verified account, no of followers, total number of tweets updated and the latest 50 tweets of each user. For each tweets, tweet-id will be allotted, tweet time, language, date, source, retweets, URL if any. The algorithm used to obtained the user account details are as follows;

CurrentID=staring ID

EndID= finishing ID

While CurrentID<EndID

Username=getUsername (currentID)

If user exist

Get account details

Get latest 100 tweets and their details

Store the details in DB

End(If)

CurrentID=CurrentID+1

End (While)

Fig 2: Algorithm to retrieve account features

In the second part, we created a sample bot using Twitter 4J interface [19]. When we provide a key phrase to our bot, it searches for the latest tweets which possess the given keyword. After the tweets are obtained, they are analyzed thoroughly. Firstly, if they are not retweeted by the bot then the bot retweets the tweet or else no action is performed on them. Secondly, the relationship between the bot and the user whose tweeted tweet is being processed from the query search and the relationship is analyzed. If there exists no relation between them then the bot follows the user thus that user becomes the friend of the bot and a predefined message is send to the user.

The sample bot is made to run for 48 hours using threading concept. The incoming and outgoing packets are stored & examined. Similarly a normal user is asked to perform the same process in the same time interval, whose packets are also stored and examined. The examination tool used will be WINCAP and package storage tool is WIRESHARK. The algorithm used in deriving the bot is as follows;

Query="India"

Search for Query

Get latest Tweets which contains Query keyword

For all Tweets

Get the Tweet details

Update DB

If not retweeted

retweet

End (If)

If Tweets.getUser is not a friend

Create a friendship

Send a direct message

End (If)

End (For)

Fig 3: Simple Twitter Bot Algorithm

DATA ANALYSIS

As we said before, Twitter API support detailed user information query ranging from profile, followers, friends, lists, tweets posted. The Data analysis process involves two process. In the first process the account the network packet flow is examined.

A graph is drawn taking the time and packet size as the x-axis and y-axis respectively.

Fig 4: Net Flow graph obtained by running our simple bot program

Fig 5: Net Flow graph obtained from a human user using twitter with web browser

Figure 4 describes the total incoming and outgoing network packet flow while the sample bot program is made to run. Figure 5 describes the packet flow between twitter and the user. We could find a great difference between these two graphs. The packet size/length per tick of a bot user does not exceed 25000 in average. But when a user uses brewers to interact with twitter which most of a normal human user does exceeds 100000 bytes per tick. Another characteristic that show the difference between the human user and bot is the graph pattern. In general most of the bot are created using the timer concept that is threading concept, which results in the similar patterned network flow graph.

To examine the pattern in the network flow graph first the time interval between two peaks in calculated. To do this R-R Time Interval graph technique [20] is used. This technique is one of the popular technique to find the time interval between the peaks in a graph. It is widely used in Electro Cardio Gram Test (ECG). Then the time intervals are noted in database. Now clustering technique is used to find the presence pattern in the flow graph. Based on this result we get a trust value which is denoted as PN(bot).

The Second process is to mine the account details to find how far the user can be trusted. Based on lot of experimental results [21] [22] [23], we understand the characteristic of different user groups and types. A overview on the experimental results of the above papers are as below

Most of the human users uses Web browser or Mobile application to interact with twitter whereas bot used API.

Most of the tweets form trusted human does not have any malicious URL, whereas bot have.

The average tweets by a user per day ranges from 0-20 tweets.

Friend followers ratio rages up to 1.9

Human users tweets in the time 11 am to 3 pm

Human users has their communication with twitter maximum on weekdays.

Most of the bot are created in the year 2009 and above

Most of the bot user account will not be either verified or protected.

Bot tweets the same phrase many time to achieve their goal.

Human user does not care about punctuation in their tweets, whereas bot does.

All this features are considered while examining a account for bot detection. Bayesian Classification technique is used to classify the accounts as human user or bot user based on all above features. The result of this process is also a trust value which will be denoted as PA(bot).

Based on these two trust values [24] the final result is obtained.

P(bot) = (PN(Bot)*0.6 +PA(Bot)*0.4)

P(Bot) is the final probability of a account to be a bot.

Experimental evaluation

In this section, we evaluate the accuracy of our classification system. We crawled the data for 1000 accounts and used them as test data. The proposed system program is executed to classify the accounts based on the attributes taken into account. The result of the experiment labels the user into bots, human and may be bot. The experimental result resulted the expected level of accuracy.

Conclusion

In this paper, we have studied about the problem of automation by bots on Twitter. Twitter has become a major platform for information sharing with a large user base. The exposure of Twitter made it a tempting target for easy exploitation by automated programs. The threat of bots on Twitter increases day by day with the spreading of automation. In understanding the role of automation on Twitter, we have measured and characterized the behaviors of humans and bots. By crawling Twitter, we have collected 10,000 Twitter users with 500,000 tweets. Based on the data, we have identified features that can differentiate humans and bot users on Twitter. Using clustering technique and Bayesian classifying technique, we found that humans exhibit complex behavior whereas the bot users only show timing behavior. In examining the network flow we have observed that a complex pattern on human users whereas a similar continuous pattern on bot users. In comparison with human, bot depicts considerably low packet size while exchanging of packets in the network. While examining the account features, the parameters such as URL, friend-follower ratio, time of tweet, tweet content, account properties, date of account creation, results in a final trust value. The trust value from the account features and the network layer is used in classifying the user type. The effectiveness of the classification system is evaluated through test dataset containing around thousand Twitter accounts.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now