DataSift integration with Genesys Social Engagement

Here we go again … time to develop a custom application to take a DataSift stream and integrate it into Genesys Social Engagement (

The first step was to sign up as an alpha tester. With 24 hours I received my invite:


Now time to create some custom streams!


1. Login to DataSift e.g.


2. Click on “Settings” to get my API key which I will need later to access my custom streams through the API:


3. Click on “My Licenses” to setup Twitter stream licensing:


4. Click on “My Streams” then “Create Stream” to create a new stream. Note that here I mark the stream as private:


5. Create a stream definition (filtering rules) for this Stream using CSDL (Curated Stream Definition Language). Clicking on the Code Toolbox makes this relatively easy but don’t expect a full IDE like Visual Studio!



When I click on Save I get some important information – the unique key that will be used to access the stream later through the API:


6. Having created a custom stream I can now click on the “Live” tab and see all live Tweets that match CDSL definition that I just created:


7. If I click on “Use” I can get an estimated cost to consume this stream. In this case $0.35 / hour. Also note in the screenshot below that my stream definition is versioned:


8. Finally, clicking on dashboard I can see all of my streams as well as the public streams created by other users:



Ok, so far so good!

All of that took less than 10 minutes. The web based GUI worked fine in Firefox (unlike Gnip) and was easy and intuitive to use. What I *really* like about this GUI is that it is simple enough for Business users to create and modify stream definitions and to see the results in realtime.

Versioning means that if Mr. Cockup is at home we can recover the situation! Also, the estimated cost to consume the stream means that budgets can be kept under control.

Right, back to techie land and a bit of C# coding to consume the stream via the DataSift API.

To be completed …

27/05/2011: DataSift Twitter feed has been down for 24 hours so development work stopped for now.

04/06/2011: Integration completed. Contacts and Interactions being created automatically. Just need to hook up a strategy to test out some auto reply functionality and then finish my custom social desktop application which uses the PSDK. Will post again with a demo ASAP.


MediaSift / DataSift



Just when I thought Gnip (or Ping backwards!) was the dogs b*llocks I discover MediaSift which on paper would seem to have even bigger b*llocks!

Based in the UK (hooray!), MediaSift is a British technology startup and was formed by CEO Nick Halstead (@nickhalstead / @nik) in 2007. In 2008 they launched TweetMeme and their next generation platform is called DataSift.

DataSift will allow customers to search (sift) the full Twitter firehose based on the data in a standard Twitter JSON object (see below) along with the addition of data enrichment (augmentation) from third party services including Klout (influence), PeerIndex (influence), InfoChimps, Qwerly (people search) and Lexalytics (text and sentiment analysis).


A nice feature of DataSift is the Qwerly integration. This will allow the linkage between Twitter and other linked social media accounts such as LinkedIn.

If we believe the hype then it should be possible to ‘sift’ the Twitter firehose for “Football fans in Manchester with over 50 followers who have mentioned Ryan Giggs in a negative way with no swearing in the past day!”

So how does it compare to Gnip?


Firstly, Gnip services are available now. DataSift has been in private alpha testing since Q4 2010 (08/12/2010) and will not officially launch until Q3 2011. In May 2011 (13/05/2011) DataSift started beta testing.

Gnip runs on individual Amazon EC2 instances per customer data collector. I believe DataSift uses a cloud based architecture running a custom engine named “Pickle” (Nick, please DM or email me if I am wrong).


Cost wise, for the Premium Twitter Power Track feed, Gnip charges $2000 / month to rent a data collector and then $0.10 per 1k Tweets delivered. Since Twitter charges all companies $0.10 per 1k Tweets, effectively Gnip are just charging a fixed monthly rental for the collector (Amazon EC2 instance).

The DataSift model is based on a Pay per Use subscription model with processing tiers. As such there is no fixed monthly cost. A user can set an upper limit on the amount of money they are willing to spend per month on DataSift Stream data.

If that upper limit is reached, the user will automatically be disconnected from all their chargeable Streams until the monthly spend amount is increased, or a new month starts.

Basically, Custom streams require processing power which is split into three tiers, the more complex the search definition, the higher the tier and cost. There are then additional publisher costs on top.

As a cost comparison, if I assume 20 working days per month or $100 per day, what can I get for my $2000? Using the DataSift pricing calculator ( the answer is 23,000 Tweets (interactions) per hour using a highly complex search definition. An additional $55.20 of publisher costs ($0.10 per 1k Tweets from Twitter) would apply to both Gnip and DataSift. Hence, a total cost of $0.28 per 1k Tweets.


But hang on – 23K Tweets Per Hour (TPH)! For normal commercial applications we are probably talking an absolute *maximum* of 1000 filtered TPH (which is a bit less than the 12.4M TPH reported during Bin Laden’s Death – This is further backed up by an analysis of the #CustServ hashtag which averages 1000 TPH on Tuesdays.

Using the DataSift pricing calculator again and this time assuming a 12 hour working day using a medium complex search definition we have a total cost of $13.80 per day which is less than $300/month.


Unless I am very wrong (please email me to let me know if I am), this means that Gnip is x7 more expensive than DataSift. Yikes – time for a new business model!

Data Enrichment

Gnip provides data enrichments capabilities such as URL expansion and mapping and duplicate exclusion. Gnip can also append Klout Scores to Tweets and filter for Tweets by users who have Klout Scores within a specified range.

Similarly, DataSift provides Augmentation services. These services include Influence Analysis (Social Authority) from Klout and PeerIndex, Natural Language Processing (NLP) using Lexalytics Salience for Sentiment analysis, and Social Identity Aggregation (People Search) using Qwerly.


DataSift supports CSDL (Curated Stream Definition Language) which was previously called FSDL (Filtered Stream Definition Language). CDSL is powerful search language used to define complex rules which are used to filter curate streams. This is similar to Gnip rules. However, the capabilities provided by DataSift are much more comprehensive than Gnip.

DataSift sources of data are called “targets” or “input services”. Custom streams are defined in CSDL using targets in the “My Streams” section of the DataSift dashboard. CSDL also provides access to augmentation (data enrichment) targets through services such as Lexalytics Salience, TweetMeme, Peer Index, Klout and InfoChimps that allow streams to be augmented with third party data.

Regular expressions are supported in CSDL via the Google RE2 regular expression engine.

Here are some CSDL examples:


Slightly worryingly augmentation targets as specified in CDSL seem to be tied to data in JSON objects as exposed by third party APIs. I wonder if this could cause problems moving forward if and when these APIs change.

Once the stream has been built it can also be used in the definition of another user stream, and it in another stream and so on.


Interestingly, DataSift is encouraging users to build public streams that are discoverable and accessible to other users by providing a number of options on the stream page such as tagging, sharing, comments and visits. Also, most commented and top rated streams are also featured on the home section of the DataSift dashboard. Cool!


API and data formats

Both Gnip and DataSift provide HTTP Streaming APIs using Basic rather than OAuth authentication. Gnip supports output in XML, JSON and Activity Stream formats. DataSift only supports streaming output in the JSON format. Once again there is concern there that the DataSift API might change in the future which is one of the advantages of Gnip since it supports Activity Streams.

In addition to the Streams API, DataSift also provide a Data API which makes application backfilling on startup easy!

Gnip provides APIs to add or delete rules on a data collector. DataSift provides APIs to comment on and manage streams (get, create, update, duplicate, rate, delete, browse, search and compile). In addition to being able to modify CSDL definitions this API also allows for the public discovery of existing public streams.

Finally, DataSift provides a Recording API which allows a stream to be recorded and subsequently retrieved and analysed offline.


Time to write another piece of integration code into Genesys Social Engagement!


Gnip integration with Genesys Social Engagement (Part 1)

As mentioned in a previous post (, for commercial applications which require deep historical search, analysis and data enrichment, what Twitter really wants is for developers to use Gnip ( who are a commercial reseller of Twitter data.


With Gnip, Twitter data is made available in either Twitter’s native JSON format or a Gnip-provided JSON Activity Streams format. Any data enrichment that Gnip adds to tweets is only available in the Activity Streams format (e.g. unwound URLs, Klout reputation scores, etc). For this reason, it is recommended to use the Activity Streams format.

With this in mind I set off to develop a custom application to take a Gnip activity stream and integrate it into Genesys Social Engagement (

The first step was to sign up for a free 72 hours Gnip trial ( and then configure my data collector:


1. Login to Gnip:


2. The main dashboard shows each feed into the Data Collector and the health / performance of each feed:


3. Click on “edit data feed” to edit the parameters associated with a feed:


4. Define any rules for filtering the stream:


5. Select the output format and any data enrichments. Here I select the output format as a JSON Activity Stream and also add some data enrichment to expand shortened ( URLs:


6. Select the data delivery format. Here I select a HTTP stream:


7. Once the feed is configured, click on the feed to display activity:


8. The Overview tab provides a high level overview of the queries performed on the Twitter Stream including the number of polls performed and the number of activities returned:


9. The Data tab shows the data returned from each query. It also includes details of the HTTP stream e.g.


10. The Rules tab shows the metrics associated with each rule defined to filter the stream:


Ok, so far so good! Next, a bit of C# coding to consume the activity stream via the Gnip Activities API which takes a number of parameters:

  • max: The maximum number of activities to return capped at 10000.
  • since_date: Only return activities since the given date, in UTC, in the format “YYYYmmddHHMMSS”
  • to_date: return only activities before the given date, in UTC, in the format “YYYYmmddHHMMSS”

However, during my testing I could only get a JSON activity stream using the all activities stream. Even then all activities on this stream seemed to be for the Facebook feed. My Twitter stream would only return activities in an XML format:


After a quick email to Gnip support it turns out that the trial streams are only available in XML format. However, they did offer me a free 2 week trial of their Power Track product which puts me back in business!

To be completed … but there is are some new kids on the block … DataSnip.



Tweets $.10 per 1k – Social Engagement and API Rate Limits

It would seem to me that at present one of the biggest limitations of any Social Engagement solution is the rate limits imposed on APIs to social networks such as Twitter and Facebook. For example, what happens to a Customer Tweet when the hourly API rate limit has been exceeded and I cannot retrieve my new Followers, Mentions or Retweet timelines for example? How long should the Customer wait before receiving even an automated response?

For Twitter, the default rate limit depends on the authorisation method being used and whether the method itself requires authentication. Anonymous REST API calls are based on the IP address of the host and are limited to 150 requests per hour. Authenticated (OAuth) calls are limited to 350 requests per hour. However, there are additional Search rate limits, Feature rate limits, Account rate limits as well as unpublished “Twitter Limits”. If you are really naughty and do not honour the rate limit, your IP address might get Blacklisted!

Obviously I can get around per account (authenticated) rate limits by using multiple accounts. However, this just adds system management, configuration overhead and complexity.

The other “unknown” is how these rate limits will change in the future – is it too dangerous to build an application on an API you can’t control? With any public API, application developers are always at risk that their efforts will simply be erased by some unpredictable move on the part of the company that controls the API. Twitter says “… we will monitor how things are going and if necessary reduce the rate further“. Oh dear!

At present the best advice from Twitter is “it is best practice for applications to monitor their current rate limit status and dynamically throttle requests if necessary“. In other words either a) develop highly complicated strategies to manage the rate limits at the risk of these changing at any time or b) risk missing or being able to respond to an important Customer Tweet!

Given that not all Twitter REST API methods are rate limited (I can always update my status using statuses/update for example), may be I am worrying to much?

If you are serious on Sentiment and Influence analysis as part of Customer Service then I think not. This is because sentiment and influence analysis cannot be performed on a single Tweet. What matters is the sentiment across the whole tweet thread and not just the Tweet in isolation. How many times has the Tweet been Retweeted (RT) and by who? What is the sentiment associated with any Replies? What is the Klout or other influence score associated with the key actors in the thread? This sort of analysis will inevitably eat into REST API rate limits.

So where does this leave us?

Well in the past high volume users such as Klout were on a Whitelist which allowed them to make a much higher number of API requests per hour – 20,000 compared to 350. However, in February 2011 Twitter announced:

“Twitter will no longer grant whitelisting requests. Twitter whitelisting was originally created … at a time when the API had few bulk request options and the Streaming API was not yet available. Since then, we’ve added new, more efficient tools for developers, including lookups, ID lists, authentication and the Streaming API. Instead of whitelisting, developers can use these tools to create applications and integrate with the Twitter platform.”

The Streaming API would seem to be the only way forward then since they are designed to “allow high-throughput near-realtime access to subsets of public and protected Twitter data“.

There are 3 Twitter Streaming “products”: The Streaming API, User Streams and Site Streams. The user stream is intended to provide all the update data required for a desktop application after startup once a REST API backfill has been completed. This includes protected statuses such as followings and direct messages.

The main one of interest to us is the Streaming API since this provides filtered public statuses (including replies and mentions) from all users.

Even then, the Streaming API is only part of the equation for a couple of reasons:

  • Status quality metrics and the data access level limits (Spritzer, Gardenhose, Firehose etc) are applied. This means that some statuses will be filtered out automatically.
  • Duplicate messages can be delivered on the stream.
  • The Streaming API Quality of Service (QoS) is “Best-effort and unordered”. This means that “on rare occasion and without notice, statuses may be missing from the delivered stream“.

For commercial applications which require deep historical search, analysis and data enrichment, what Twitter really wants is for developers to use Gnip ( who are a commercial reseller of Twitter data.

There is an interesting article on Gnip here:

“Last week at Strata, Gnip released a new set of features for its social-stream processing platform. Called Power Track, the new layer allows customers to set up complex search queries and receive a stream of all the Twitter messages that match the criteria. Unlike existing ways of filtering the firehose, there are no limits on how many keywords or results you can receive.

On top of the standard $2,000 a month to rent a Gnip collector it will cost 10 cents for every thousand Twitter messages delivered.

For clients that want *every* Tweet for a keyword, it supplies a comprehensive solution, rather than trying to work around the traditional Twitter search APIs that have restrictions on volume and content. ”

All good stuff and starts to put a cost basis to Tweets – get used to them costing $.10 per 1k to receive. The Gnip data enrichments capabilities e.g. URL expansion and mapping and duplicate exclusion are also noteworthy. Gnip can even append Klout Scores to Tweets and filter for Tweets by users who have Klout Scores within a specified range – nice!

If you have read this far then thanks for reading! If you want to know why I am so interested in this then please check back over the next couple of weeks for further posts on a “Social Project” that I am working on.


iCFD (Intelligent Customer Front Door)

A quick overview of Genesys iCFD (Intelligent Customer Front Door) since Charlie Isaacs (@charlieisaacs) was Tweeting a bit about it …


iCFD is a solution that spans across Genesys products – specifically Genesys Voice Platform (GVP) 8.1.2+ and Universal Contact Server (UCS) 8.0.3+ with Composer as the IDE.

Context Services are part of Universal Contact Server (UCS) 8.0.x. Context Services is an optional set of features supporting the management and retrieval of data concerning customer service, enabling real-time service personalisation and service continuity. This set of capabilities is the foundation of iCFD and Conversation Manager.

In the future, a Genesys Rules System will be added to Conversation Manager to allow use of rules to control customer treatment:


At the core of the solution is concept of “Customer Centric Routing” based on maintaining the context of a conversation spanning multiple interactions across multiple channels:



Clients connect to Context Services (UCS) and send requests, to which UCS responds. Clients communicate with UCS via RESTful web services, using HTTP request methods that are based on the GET, POST, PUT, and DELETE methods.

Clients of Context Services may include Orchestration Server, Genesys Voice Platform (GVP), agent desktops, or any third party application that makes use of real-time customer service information. JSON is used for object serialisation.

Context Services (UCS) uses a flexible schema that allows each application to easily define its own custom attributes – at any time an application can add attributes.

In short, Context Services delivers an extensible framework for identifying, modifying and updating customer profile, service history, and other customer attributes.

Typical usage scenarios of Context Services include:

  • Customer identification
  • Service resumption
  • Customer profile (retrieval and management)
  • Callback offers
  • Service resumption with an agent
  • Proactive notification
  • Schedule callback with enhancement multimedia confirmation


One of the primary features of the Context Services API is the ability to identify customers based on one or more attributes of the customer, known as Identification Keys. Each identification key consists of one or more attributes of the core customer profile, or of any defined extension.

Customer profiles are built on top of legacy UCS Contact Attributes. The operation “Identify Customer” enables an application to retrieve customer profiles based on a few attribute values passed in as parameters, without specifying the customer ID.


Context Services makes use of a model in which customers are associated with any number of Services. Services are composed of any number of States, and States can in turn be composed of any number of Tasks. This three-level structure provides a flexible vocabulary by which organizations store the history of the services that they provide to customers.

Services are customer commitments defined by the business application (IVR, Orchestration, Agent Desktop, etc.) which interacts with the customer. Each service potentially spans multiple interactions over a variety of media channels and should link to a Customer profile as soon as it is created or retrieved through identification operations. In some ways, Services can be considered as workflows.

Services are defined by association to Service Types that are created as Business Attributes in CME. States may be used to represent components of customer service.

Services, States and Tasks exist over some application-defined lifecycle. Upon completion, applications may specify a Disposition. For example, the offering of a new product or service might be recorded as a State of type “Offer another service”. The Disposition might be set to show whether the customer accepted or declined the offer. Information on past declined or accepted offers could then be used to calculate the likelihood that the customer might be interested in the offer at some point in the future.

An anonymous service is a service which is assigned to an anonymous customer.

Services are started using the “POST /services/start” operation. The following specific business attribute fields are validated against specified mapped Business Attributes in CME:

  • service_type
  • state_type
  • task_type
  • application_type
  • resource_type
  • media_type
  • disposition

When a customer starts interacting with a service, the application creates a new service resource to manage the service’s context data, and then nested state and task resources to manage further states and tasks’ context data.

A service, state, or task is active if a customer is still interacting with it. In that case, the service, state, or task is started, but not complete. Once the resource is completed, it is no longer part of the active list, but part of the completed list.


Apple Push Application Service

Some time ago I started to develop a Genesys Queuing application for the iPhone. The basic idea was to enable Customers to establish the queue time (EWT) for a particular service and then be notified when the queue time was lower than a certain threshold using a “Call Us Now” notification sent via the Apple Push Notification Services (APNS).

Technically I developed a Queue Management service which provides a secure interface between a custom iPhone application and core Genesys components such as Stat Server and interaction server. The application manages the re-attachment of data when the Customer call is received such that the iPhone application is used as an on-device IVR application to collect information such as the service required and/or account number. On top of this I added some additional services such as automatically scheduling an advisor initiated call-back to the iPhone for important Customer such as those in the dunning cycle.

As reported by Tony Tillyer (@ttillyer) this approach has also been perused by other developers such as Wyn Owen (@wynowen) with Exodus Software’s HERA (

However, the main problem with this approach is that in order to use Apple Push Notifications (APNS) the user must first find and install the custom iPhone application.

This barrier to deployment may soon be overcome if Apple decide to implement Temporary Location Applications as described in an Apple patent here:

The idea is simple. Deliver a location based service to information savvy iPhone users that wish to receive temporary retail and service-based applications – Apple Push Application Services (APAS) possibly.

In particular, what caught my eye was figure 5B which shows a “Wait Time” application:


So how could I see this working – well in lots of ways!

If we ignore the obvious opportunities for integration to store queuing systems such as QMATIC (, what if I was walking past my local Vodafone store and got pushed a local application which reminded me that I need to call Vodafone and change my tariff?

On a more local basis what if I was passing my local takeout restaurant and got pushed an application with the menu and some indication of the wait time? In this case, I wonder what could be used as the queue management solution? Maybe an opportunity for Genesys Customer Interaction Management (CIM) outside of the more traditional Contact Centre!