Introduction
Welcome to the latest episode of our Twitter API series. In our last episode, I built Twixxr.com which will let you discover influential women on Twitter for your account to follow. Today, I'm going to turn the focus inward to look at my own followers.
While I haven't really used Facebook since 2013, I've remained active on Twitter—even as they pumped my feed with ads and annoyed me by trying to algorithmically optimize it.
Recently, I was verified and started to gather followers at a slightly faster rate. I was hopeful that I might see more response to my tweets. Generally, I've been surprised at how little response there usually is on Twitter for the average person.
I have nearly 1,900 followers, but rarely do people comment or retweet pieces that I think are important and of general interest. For example, not a single person shared my piece on the sharp spike in rape reports in Seattle or commentary on Bill Gates at his most outrageously hypocritical.
For a long time I've wanted to look more closely at my Twitter followers and answer some questions: Who exactly is following me? And why aren't they more interactive? Is it possible that only 10% of my followers are real people?
Twitter's been having trouble finding a buyer, and maybe this has something to do with it.
The Twitter API is a good tool to investigate this. Yet it has a ton of rate limits which make even something simple like analyzing your followers quite complex. In today's episode, I'll show you how I worked with the rate limits to assess and build a scoreboard of my followers.
If you have any questions or feedback, please post them below in the comments or reach out to me on Twitter @reifman.
Analyze Our Twitter Followers
Just above, you can see the basic scoreboard I've created. Today's episode will focus mostly on the infrastructure and approach I took to create this. I hope I get a chance to write more about improving the scoring mechanism.
And yes, as you can see above, renowned gay rights leader and sex advice columnist Dan Savage follows me but never retweets anything I share. If there's time today, we'll analyze this to answer important questions like: is he real, a bot, or just following me for personal sex advice? What can we learn from his account to determine whether he's likely ever to interact with me on Twitter or, for that matter, any of my other followers?
The scoreboard code is mostly a prototype which I've built on top of the Twixxr code from the last episode, but it's not a live demo for people to use. I'm sharing so you can learn from it and build on it yourself.
Here are the basic elements of the code:
- Creating the database to store my followers and related data.
- Downloading my followers in pages of 20 followers each.
- Tracking the cursors for the pages as I download 15 pages per rate limited window.
- Storing data collected about my followers in the database.
- Building a prototype scoring algorithm to score all of the followers.
- Building a view to browse the scoreboard.
Diving Into the Code
Creating the Database Table Migrations
I created three different tables to store all the data and help me work with the Twitter API rate limiting. If you're not familiar with Yii database migrations, please see How to Program With Yii2: Working With the Database and Active Record.
First, I extended the SocialProfile table to record a lot more data from the follower's accounts such as whether they are verified, their location, and how many items they've favorited:
<?php use yii\db\Schema; use yii\db\Migration; class m161026_221130_extend_social_profile_table extends Migration { public function up() { $tableOptions = null; if ($this->db->driverName === 'mysql') { $tableOptions = 'CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE=InnoDB'; } $this->addColumn('{{%social_profile}}','social_id',Schema::TYPE_STRING.' NOT NULL'); $this->addColumn('{{%social_profile}}','name','string NOT NULL'); $this->addColumn('{{%social_profile}}','screen_name',Schema::TYPE_STRING.' NOT NULL'); $this->addColumn('{{%social_profile}}','description',Schema::TYPE_TEXT.' NOT NULL'); $this->addColumn('{{%social_profile}}','url',Schema::TYPE_STRING.' NOT NULL'); $this->addColumn('{{%social_profile}}','protected',Schema::TYPE_SMALLINT. ' NOT NULL DEFAULT 0'); $this->addColumn('{{%social_profile}}','favourites_count',Schema::TYPE_BIGINT. ' NOT NULL DEFAULT 0'); $this->addColumn('{{%social_profile}}','verified',Schema::TYPE_SMALLINT. ' NOT NULL DEFAULT 0'); $this->addColumn('{{%social_profile}}','location',Schema::TYPE_STRING.' NOT NULL'); $this->addColumn('{{%social_profile}}','profile_location',Schema::TYPE_STRING.' NOT NULL'); $this->addColumn('{{%social_profile}}','score',Schema::TYPE_BIGINT. ' NOT NULL DEFAULT 0'); }
Then, I built an indexing table called SocialFriend
to track followers for specific accounts. If I decide to formalize this service publicly, I'll need this. It links the User table with the user's followers in the SocialProfile table.
<?php use yii\db\Schema; use yii\db\Migration; class m161026_233916_create_social_friend_table extends Migration { public function up() { $tableOptions = null; if ($this->db->driverName === 'mysql') { $tableOptions = 'CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE=InnoDB'; } $this->createTable('{{%social_friend}}', [ 'id' => Schema::TYPE_PK, 'user_id' => Schema::TYPE_BIGINT.' NOT NULL', 'social_profile_id' => Schema::TYPE_BIGINT.' NOT NULL', ], $tableOptions); }
Next, the Twitter API requires that you page through requests of 20 followers at a time. To know the next page, you have to track the cursors, essentially tags, that mark the next page to fetch.
Since you're only allowed to make 15 requests for followers every 15 minutes, you have to store these cursors in the database. The table is called SocialCursor
:
<?php use yii\db\Schema; use yii\db\Migration; class m161027_001026_social_cursor_table extends Migration { public function up() { $tableOptions = null; if ($this->db->driverName === 'mysql') { $tableOptions = 'CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE=InnoDB'; } $this->createTable('{{%social_cursor}}', [ 'id' => Schema::TYPE_PK, 'user_id' => Schema::TYPE_BIGINT.' NOT NULL', 'next_cursor' => Schema::TYPE_STRING.' NOT NULL', ], $tableOptions); }
Eventually, I'll build background cron tasks to manage all this, but for today's prototype, I'm running these tasks by hand.
Collecting the Followers and Their Account Data
Next, I created a method Twitter::getFollowers()
to make the request. Here's the basics of the code:
public function getFollowers($user_id) { $sp = new SocialProfile(); $next_cursor = SocialCursor::getCursor($user_id); ... while ($next_cursor>0) { $followers = $this->connection->get("followers/list",['cursor'=>$next_cursor]); if ($this->connection->getLastHttpCode() != 200) { var_dump($this->connection); exit; } if (isset($followers->users)) { foreach ($followers->users as $u) { $n+=1; $users[]=$u; $sp->add($user_id,$u); } $next_cursor= $followers->next_cursor; SocialCursor::refreshCursor($user_id,$next_cursor); echo $next_cursor.'<br />'; echo '======================================================<br />'; } else { exit; } }
It gets the next_cursor
and repeatedly asks for followers, $followers = $this->connection->get("followers/list",['cursor'=>$next_cursor])
, until it hits rate limits.
The output looks something like this as it runs through each page of 20 results:
refresh cursor: 1489380833827620370 ====================================================== refresh cursor: 1488086367811119559 ====================================================== refresh cursor: 1486452899268510188 ====================================================== refresh cursor: 1485593015909209633 ====================================================== refresh cursor: 1485330282069552137 ====================================================== refresh cursor: 1485256983607000799 ====================================================== refresh cursor: 1484594012550322889 ====================================================== refresh cursor: 1483359799854574028 ====================================================== refresh cursor: 1481615590678791493 ====================================================== refresh cursor: 1478424827838161031 ====================================================== refresh cursor: 1477449626282716582 ====================================================== refresh cursor: 1475751176809638917 ====================================================== refresh cursor: 1473539961706830585 ====================================================== refresh cursor: 1471375035531579849 ======================================================
The data is stored by those $sp->add($user_id,$u);
methods. The SocialProfile::add()
method is a different version of the fill()
method from the Twixxr tutorial. It stores more data and manages the SocialFriend index:
public static function add($user_id,$profileObject=null) { $sp = SocialProfile::find() ->where(['social_id'=>$profileObject->id_str]) ->one(); if (!isset($profileObject->name) || empty($profileObject->name)) { $profileObject->name='Nameless'; } if (!isset($profileObject->url) || empty($profileObject->url)) { $profileObject->url=''; } if (!isset($profileObject->screen_name) || empty($profileObject->screen_name)) { $profileObject->screen_name='error_sn'; } if (!isset($profileObject->description) || empty($profileObject->description)) { $profileObject->description='(empty)'; } if (!isset($profileObject->profile_location) || empty($profileObject->profile_location)) { $profileObject->profile_location=''; } if (!isset($profileObject->profile_image_url_https) || empty($profileObject->profile_image_url_https)) { $profileObject->profile_image_url_https=''; } if (!is_null($sp)) { $sp->social_id = $profileObject->id; $sp->image_url = $profileObject->profile_image_url_https; $sp->follower_count= $profileObject->followers_count; $sp->status_count = $profileObject->statuses_count; $sp->friend_count = $profileObject->friends_count; $sp->listed_in = $profileObject->listed_count; $sp->url=$profileObject->url; if ($profileObject->protected) { $sp->protected=1; } else { $sp->protected=0; } if ($profileObject->verified) { $sp->verified=1; } else { $sp->verified=0; } $sp->favourites_count=$profileObject->favourites_count; $sp->location=$profileObject->location; $sp->profile_location=$profileObject->profile_location; $sp->name = $profileObject->name; $sp->description = $profileObject->description; $sp->image_url = $profileObject->profile_image_url_https; if ($sp->validate()) { $sp->update(); } else { var_dump($sp->getErrors()); } } else { $sp = new SocialProfile(); $sp->social_id = $profileObject->id; $sp->score = 0; $sp->header_url=''; $sp->url=$profileObject->url; $sp->favourites_count=$profileObject->favourites_count; if ($profileObject->protected) { $sp->protected=1; } else { $sp->protected=0; } if ($profileObject->verified) { $sp->verified=1; } else { $sp->verified=0; } $sp->location=$profileObject->location; $sp->profile_location=$profileObject->profile_location; $sp->name = $profileObject->name; $sp->description = $profileObject->description; $sp->screen_name = $profileObject->screen_name; $sp->image_url = $profileObject->profile_image_url_https; $sp->follower_count= $profileObject->followers_count; $sp->status_count = $profileObject->statuses_count; $sp->friend_count = $profileObject->friends_count; $sp->listed_in = $profileObject->listed_count; if ($sp->validate()) { $sp->save(); } else { var_dump($sp->getErrors()); } } $sf = SocialFriend::find() ->where(['social_profile_id'=>$sp->id]) ->andWhere(['user_id'=>$user_id]) ->one(); if (is_null($sf)) { $sf = new SocialFriend(); $sf->user_id = $user_id; $sf->social_profile_id = $sp->id; $sf->save(); } return $sp->id; }
It's written to save new records or update old records so that in the future you could track your follower data and update it regularly, overwriting old data.
This last section at the end makes sure there is a SocialFriend index between the User table and the SocialProfile table.
$sf = SocialFriend::find() ->where(['social_profile_id'=>$sp->id]) ->andWhere(['user_id'=>$user_id]) ->one(); if (is_null($sf)) { $sf = new SocialFriend(); $sf->user_id = $user_id; $sf->social_profile_id = $sp->id; $sf->save(); }
Scoring Twitter Followers
I had a handful of goals for my Twitter scoring:
- Eliminate accounts that follow everyone that follows them. For example, they have 12,548 followers and follow 12,392 people (see above).
- Eliminate accounts following more than say 1,500 accounts who are unlikely to ever see what I share. For example, Dan Savage follows 1,536 people.
- Eliminate accounts that have very few posts or very few accounts they follow, likely abandoned accounts.
- Eliminate accounts with few favorites—these are likely bots, not really using the app.
Similarly, I wanted to highlight some positive aspects:
- Accounts that are verified
- Accounts that have lots of followers
- Accounts that have less than 1,000 people that they follow—a sweet spot to me
Here's some rough basic code from SocialProfile::score()
that highlights some of the positives:
foreach ($all as $sp) { // score sp $score =0; // RULE IN if ($sp->verified==1) { $score+=1000; } // POSITIVE if ($sp->protected==1) { $score+=500; } if ($sp->follower_count > 10000) { $score+=500; } else if ($sp->follower_count > 3500) { $score+=750; } else if ($sp->follower_count > 1100) { $score+=1000; } else if ($sp->follower_count > 1000) { $score+=250; } else if ($sp->follower_count> 500) { $score+=250; }
Here's some code that eliminates some of the bad accounts:
// RULE OUT // make this a percentage of magnitude $magnitude = $sp->follower_count/1000; if ($sp->follower_count> 1000 and abs($sp->follower_count-$sp->friend_count)<$magnitude) { $score-=2500; } if ($sp->friend_count > 7500) { $score-=10000; } else if ($sp->friend_count > 5000) { $score-=5000; } else if ($sp->friend_count > 2500) { $score-=2500; }else if ($sp->friend_count > 2000) { $score-=2000; } else if ($sp->friend_count > 1000) { $score-=250; } else if ($sp->friend_count > 750) { $score-=100; } if ($sp->follower_count<100) { $score-=1000; } if ($sp->status_count < 35) { $score-=5000; }
Obviously, there's a lot to play with here and a variety of ways to improve this. I hope I get a chance to spend more time on this.
As the method runs, it looks like this, but updates the SocialProfile table with scores as it goes:
DJMany -6300 gai_ltau -7850 Michal92B -900 InvestmentAdvsr -2900 TSSStweets -7500 sandcageapp -1750 dominicpouzin 1950 daletdykaaolch1 -7850 suzamack -8250 writingthrulife -7500 ryvr -1550 RichardAngwin -8300 DanielleMorrill -7300 ReversaCreates 2750 BoKnowsMarkting -7500 TheHMProA -8500 HouseMgmt101 750 itsmeKennethG -1250 drbobbiwegner -8500 Mizzfit_Bianca -7300 wilsonmar 700 CoachVibeke -7300 jhurwitz 0 PiedPiperComms 500 Prana2thePeople -1100 singlemomspower -2250 mouselink -7300 MotivatedGenY -7300 brett7three -7300 JovanWalker 2950 ITSPmagazine 450 RL_Miller -2250
Displaying the Scoreboard
Yii's default grid makes it pretty easy to display the SocialProfile table and customize the scoreboard columns.
Here's SocialProfileController::actionIndex()
:
/** * Lists all SocialProfile models. * @return mixed */ public function actionIndex() { $searchModel = new SocialProfileSearch(); $dataProvider = $searchModel->search(Yii::$app->request->queryParams); return $this->render('index', [ 'searchModel' => $searchModel, 'dataProvider' => $dataProvider, ]); }
And here's the grid view customized:
<?php use yii\helpers\Html; use yii\grid\GridView; use yii\widgets\Pjax; /* @var $this yii\web\View */ /* @var $searchModel frontend\models\SocialProfileSearch */ /* @var $dataProvider yii\data\ActiveDataProvider */ $this->title = Yii::t('frontend', 'Social Profiles'); $this->params['breadcrumbs'][] = $this->title; ?> <div class="social-profile-index"> <h1><?= Html::encode($this->title) ?></h1> <?php // echo $this->render('_search', ['model' => $searchModel]); ?> <?php Pjax::begin(); ?> <?= GridView::widget([ 'dataProvider' => $dataProvider, 'filterModel' => $searchModel, 'columns' => [ ['class' => 'yii\grid\SerialColumn'], [ 'label'=>'Name', 'format' => 'raw', 'value' => function ($model) { return '<div><span><strong><a href="http://twitter.com/'.$model->screen_name.'">'.$model->name.'</a></strong><br />'.$model->screen_name.'</span></div>'; }, ], 'score', [ 'label'=>'Follows', 'format' => 'raw', 'attribute'=>'friend_count', ], [ 'label'=>'Followers', 'format' => 'raw', 'attribute'=>'follower_count', ], [ 'label'=>'Tweets', 'format' => 'raw', 'attribute'=>'status_count', ], [ 'label'=>'Favs', 'format' => 'raw', 'attribute'=>'favourites_count', ], [ 'label'=>'Listed', 'format' => 'raw', 'attribute'=>'listed_in', ], [ 'label'=>'P', 'format' => 'raw', 'attribute'=>'protected', ], [ 'label'=>'V', 'format' => 'raw', 'attribute'=>'verified', ], // 'location', // 'profile_location', [ //'contentOptions' => ['class' => 'col-lg-11 col-xs-10'], 'label'=>'Pic', 'format' => 'raw', 'value' => function ($model) { return '<div><span><img src="'.$model->image_url.'"></span></div>'; }, ], ], ]); ?> <?php Pjax::end(); ?></div>
Here's what the top scores look like with my initial algorithm:
There are so many ways to improve and tune the scoring. I look forward to playing with it more.
And there's more I'd like to write code for and expand my use of the API, for example:
- Use PHP gender to help eliminate companies from people (companies don't interact much).
- Look up the frequency of posts that people have made and the last time they used Twitter.
- Use Twitter's search API to see which followers have actually ever interacted with my content.
- Provide feedback to the scoring to tune it.
Looking Ahead
I hope you find the scoring approach intriguing. There's so much more that can be done to improve this. Please feel free to play with it and post your ideas below.
If you have any questions or suggestions, please post them in the comments. If you'd like to keep up on my future Envato Tuts+ tutorials and other series, please visit my instructor page or follow @reifman. Definitely check out my startup series and Meeting Planner.
Related Links
- Twixxr (recent sample Twitter API app to discover influential women to follow)
- Twitter Developer Documentation
- How to Program With Yii2 Series (Envato Tuts+)
Comments