As mentioned in a previous post (http://genesysguru.com/blog/blog/2011/05/20/tweets-10-per-1k-social-engagement-and-api-rate-limits/), for commercial applications which require deep historical search, analysis and data enrichment, what Twitter really wants is for developers to use Gnip (http://gnip.com/) who are a commercial reseller of Twitter data.
With Gnip, Twitter data is made available in either Twitter’s native JSON format or a Gnip-provided JSON Activity Streams format. Any data enrichment that Gnip adds to tweets is only available in the Activity Streams format (e.g. unwound URLs, Klout reputation scores, etc). For this reason, it is recommended to use the Activity Streams format.
With this in mind I set off to develop a custom application to take a Gnip activity stream and integrate it into Genesys Social Engagement (http://genesysguru.com/blog/blog/2011/04/08/genesys-social-engagement/).
The first step was to sign up for a free 72 hours Gnip trial (https://try.gnip.com/) and then configure my data collector:
1. Login to Gnip:
2. The main dashboard shows each feed into the Data Collector and the health / performance of each feed:
3. Click on “edit data feed” to edit the parameters associated with a feed:
4. Define any rules for filtering the stream:
5. Select the output format and any data enrichments. Here I select the output format as a JSON Activity Stream and also add some data enrichment to expand shortened (bit.ly) URLs:
6. Select the data delivery format. Here I select a HTTP stream:
7. Once the feed is configured, click on the feed to display activity:
8. The Overview tab provides a high level overview of the queries performed on the Twitter Stream including the number of polls performed and the number of activities returned:
9. The Data tab shows the data returned from each query. It also includes details of the HTTP stream e.g. https://trial66.gnip.com/data_collectors/11/stream.xml
10. The Rules tab shows the metrics associated with each rule defined to filter the stream:
Ok, so far so good! Next, a bit of C# coding to consume the activity stream via the Gnip Activities API which takes a number of parameters:
- max: The maximum number of activities to return capped at 10000.
- since_date: Only return activities since the given date, in UTC, in the format “YYYYmmddHHMMSS”
- to_date: return only activities before the given date, in UTC, in the format “YYYYmmddHHMMSS”
However, during my testing I could only get a JSON activity stream using the all activities stream. Even then all activities on this stream seemed to be for the Facebook feed. My Twitter stream would only return activities in an XML format:
After a quick email to Gnip support it turns out that the trial streams are only available in XML format. However, they did offer me a free 2 week trial of their Power Track product which puts me back in business!
To be completed … but there is are some new kids on the block … DataSnip.