It would seem to me that at present one of the biggest limitations of any Social Engagement solution is the rate limits imposed on APIs to social networks such as Twitter and Facebook. For example, what happens to a Customer Tweet when the hourly API rate limit has been exceeded and I cannot retrieve my new Followers, Mentions or Retweet timelines for example? How long should the Customer wait before receiving even an automated response?
For Twitter, the default rate limit depends on the authorisation method being used and whether the method itself requires authentication. Anonymous REST API calls are based on the IP address of the host and are limited to 150 requests per hour. Authenticated (OAuth) calls are limited to 350 requests per hour. However, there are additional Search rate limits, Feature rate limits, Account rate limits as well as unpublished “Twitter Limits”. If you are really naughty and do not honour the rate limit, your IP address might get Blacklisted!
Obviously I can get around per account (authenticated) rate limits by using multiple accounts. However, this just adds system management, configuration overhead and complexity.
The other “unknown” is how these rate limits will change in the future – is it too dangerous to build an application on an API you can’t control? With any public API, application developers are always at risk that their efforts will simply be erased by some unpredictable move on the part of the company that controls the API. Twitter says “… we will monitor how things are going and if necessary reduce the rate further“. Oh dear!
At present the best advice from Twitter is “it is best practice for applications to monitor their current rate limit status and dynamically throttle requests if necessary“. In other words either a) develop highly complicated strategies to manage the rate limits at the risk of these changing at any time or b) risk missing or being able to respond to an important Customer Tweet!
Given that not all Twitter REST API methods are rate limited (I can always update my status using statuses/update for example), may be I am worrying to much?
If you are serious on Sentiment and Influence analysis as part of Customer Service then I think not. This is because sentiment and influence analysis cannot be performed on a single Tweet. What matters is the sentiment across the whole tweet thread and not just the Tweet in isolation. How many times has the Tweet been Retweeted (RT) and by who? What is the sentiment associated with any Replies? What is the Klout or other influence score associated with the key actors in the thread? This sort of analysis will inevitably eat into REST API rate limits.
So where does this leave us?
Well in the past high volume users such as Klout were on a Whitelist which allowed them to make a much higher number of API requests per hour – 20,000 compared to 350. However, in February 2011 Twitter announced:
“Twitter will no longer grant whitelisting requests. Twitter whitelisting was originally created … at a time when the API had few bulk request options and the Streaming API was not yet available. Since then, we’ve added new, more efficient tools for developers, including lookups, ID lists, authentication and the Streaming API. Instead of whitelisting, developers can use these tools to create applications and integrate with the Twitter platform.”
The Streaming API would seem to be the only way forward then since they are designed to “allow high-throughput near-realtime access to subsets of public and protected Twitter data“.
There are 3 Twitter Streaming “products”: The Streaming API, User Streams and Site Streams. The user stream is intended to provide all the update data required for a desktop application after startup once a REST API backfill has been completed. This includes protected statuses such as followings and direct messages.
The main one of interest to us is the Streaming API since this provides filtered public statuses (including replies and mentions) from all users.
Even then, the Streaming API is only part of the equation for a couple of reasons:
- Status quality metrics and the data access level limits (Spritzer, Gardenhose, Firehose etc) are applied. This means that some statuses will be filtered out automatically.
- Duplicate messages can be delivered on the stream.
- The Streaming API Quality of Service (QoS) is “Best-effort and unordered”. This means that “on rare occasion and without notice, statuses may be missing from the delivered stream“.
For commercial applications which require deep historical search, analysis and data enrichment, what Twitter really wants is for developers to use Gnip (http://gnip.com/) who are a commercial reseller of Twitter data.
There is an interesting article on Gnip here: http://www.readwriteweb.com/hack/2011/02/twitter-sets-a-price-for-tweet.php
“Last week at Strata, Gnip released a new set of features for its social-stream processing platform. Called Power Track, the new layer allows customers to set up complex search queries and receive a stream of all the Twitter messages that match the criteria. Unlike existing ways of filtering the firehose, there are no limits on how many keywords or results you can receive.
On top of the standard $2,000 a month to rent a Gnip collector it will cost 10 cents for every thousand Twitter messages delivered.
For clients that want *every* Tweet for a keyword, it supplies a comprehensive solution, rather than trying to work around the traditional Twitter search APIs that have restrictions on volume and content. ”
All good stuff and starts to put a cost basis to Tweets – get used to them costing $.10 per 1k to receive. The Gnip data enrichments capabilities e.g. URL expansion and mapping and duplicate exclusion are also noteworthy. Gnip can even append Klout Scores to Tweets and filter for Tweets by users who have Klout Scores within a specified range – nice!
If you have read this far then thanks for reading! If you want to know why I am so interested in this then please check back over the next couple of weeks for further posts on a “Social Project” that I am working on.