Category Archives: UK Utility Company

Documenting Genesys Strategies

Now that Release 1 has gone live we are in a tidy up phase working on documentation while at the same time preparing for Release 2 in a few months.

One of the tasks is to update documentation relating to routing strategies. In the past we have taken screenshots from IRD but this is not very efficient. Therefore I have written a little C# application to take a strategy export (XML) and convert it into a Visio diagram.

  1. Login to IRD
  2. Click on the “Export/Import” tab
  3. Click on “Solution export”
  4. Highlight a strategy
  5. Double click on the “Add” column. Double click on the “Select format” column and select “open (*.xml)”
  6. Right click on the strategy and select “Export”
  7. Select a folder for the export and then click on “Select”


Run the Strategy Analyser:


Here is an example strategy both in IRD and then when exported to Visio:



From a documentation perspective each standard IRD block has a corresponding Visio stencil with custom properties for each of the associated block parameters.

The next step (time permitting) is to add some realtime functionality using Message Server / URS messages to overlay actual calls in the same way as realtime monitoring in IRD. It should also be possible to set “watch” statments on individual blocks.


Release 1 Rollout (and abandoned calls)

After MI and softphone issues were resolved, this week we finally rolled out Release 1 of the solution for this client to the remaining Contact Centre sites. This means we are now live with approximately 2000 advisors handling 35K+ calls per day.

As usual the go-live was not quite as smooth as we would have liked. By Monday afternoon we were starting to get calls from Customers complaining that they had waited a long time in queue, hungup, called back and got answered immediately.

Realtime and historical statistics gave no indication of the problem. After a lot of digging by the team we identified that calls were getting stuck on external routing points (ERP) between the Genesys SIP side of the solution and the Avaya side of the solution.

The root cause of the problem was the Class of Restriction (COR) on the Avaya stations was not set correctly on a small number of stations. This resulted in the call setting stuck in the vector associated with the ERP (VDN). Unfortunately these advisors kept getting targeted again and again which meant that the total number of stuck calls was much higher than the number of incorrectly configured stations (for obvious reasons I won’t tell you how many calls were affected by this problem!).

As a secondary problem we found that there was no skill applied to the ERP VDNs. This caused the call to enter a “black hole” since as a backup even through the associated vector was set to target advisors there was no Avaya skill / split specified.

Further analysis showed that the maximum time to abandon was 99 minutes. We tracked this down to the “Vector Disconnect Timer” specified in the ACM system parameters:



SAP Gplus Load Balancer

A load balancing mechanism between SAP CRM and multiple Genesys SAP Gplus (ICI) adapter instances seems to be the source of much debate and frustration!

The standard response from SAP and Genesys seems to be to create multiple SAP roles / profiles and to tie these to individual Gplus instances via the SAP Communcation Profile.

However, in large deployments (2000+ users) such as at this client this does not really seem a feasible approach.

Further investigation into the Integrated Communication Interface (ICI) between SAP and the Gplus adapters seems to reveal the problem – traditional load balancing (without using SSL possibly) is not possible since there is no session information in the HTTP header:


Therefore, technically the only option is to implement “user affinity” based on the <user> element in the SOAP header:


So with this in mind I wrote a custom SAP Gplus Load Balancer Service which provides a load balancing mechanism using user affinity between SAP CRM and multiple Genesys SAP Gplus adapter instances.

The SAP Gplus Load Balancer is implemented as a Windows service which can be deployed on multiple physical servers and run in parallel to provide resilience and failover capabilities. SAP Gplus Load Balancer instances can be configured as application of type Third Party Server in Genesys CME allowing instances to be monitored and controlled using standard Genesys management framework components such as Solution Control Interface (SCI).

Here is an architecture diagram:


Let me know if you want more information!


Avaya Call Classification

Some last minute tuning of Avaya Call Classification has been required in the last couple of weeks prior to go-live which is now set for Monday 22/11/2010!

For Release 1 we do not really need to use any call classification but since Virtual Hold is using TmakePredictiveCall to initiate callback requests it needed to be tuned. Of course, for Release 2 will be use using Genesys outbound so it still needed to be done.

As a reminder, on the Avaya core telephony platform, the function of Call Progress Detection (CPD) and Answer Machine Detection (AMD) is provided by TN744E call classifier and tone detector circuit packs. The TN744 also detects Special Intercept Tones (SIT) to detect Fax machine for example.

Within Avaya Communication Manager, call classification is a pooled resource with each call classifier circuit pack providing eight ports of tone detection. TN2312 (IPSI) circuit packs also provide 8 ports of global call classification each.

During testing we found that it was better to set the priority to use TN2312 (IPSI) resources first and then overflow to TN744E (call classifier) resources. We also found out that IPSI firmware 49-51 should not be used due to hardware compatability problems. Hopefully we can upgrade the firmware in the future to take advantage of an improvement in FW49 – “FW49 supports the CM5.2.1 feature that provides enhanced call classification to meet certain country regulations for silent calls. Silent calls are outbound calls arriving at the destination without an agent being connected to the call”. This enhancement is already in TN744E (call classifier) FW3.

Call classification is enabled automatically (when enabled on the switch) when a TmakePredictiveCall request is received via the Genesys Avaya T-Server component. Global Call Classification is enabled on the switch by setting the system parameter “Answer Supervision by Call Classifier” as shown below:


Here are the final call classifier settings that we have settled on (for Virtual Hold anyway!):


Global Classifier Adjustment (dB): 3
USA Default Algorithm? y
Global Busy Tone Detection Adj (db): 0
Cadence Classification After Answer? n


SIT Ineffective Other: answered
SIT Intercept: answered
SIT No Circuit: answered
SIT Reorder: answered
SIT Vacant Code: answered
SIT Unknown: answered
AMD Treatment: answered
Pause Duration (seconds): 0.5
Talk Duration (seconds): 1.5


Disconnect Supervision – In? y Out? y
Answer Supervision Timeout: 0
Administer Timers? n
CONNECT Reliable When Call Leaves ISDN? y


“Magic” Avaya T-Server options

Every Genesys project needs them and now we have applied some I am 100% confident the project will go live and all will work OK!

In out case these options relate to the Avaya TSAPI components. We added some additional DNs this week in preparation for go-live and then hit a CTI link disconnected problem on switchover of the primary and backup pair. This resulted in looping where the system kept flipping back between the primary and backup.

Genesys found some magic settings to fix this problem. These are:

[Section: tsapi-configuration]




Although these are undocumented T-server options, their meaning can be found in the Avaya TSAPI API documentation:


There is a mention of exactly the same problem in Avaya CCE (Contact Center Express) 4.1 documentation:



Performance Testing and Avaya Overload!

Well after many attempts and many hours of testing we have finally been able to demonstrate that the solution at this client can support 15000 busy hour call attempts. Here is the evidence courtesy of Empirix Hammer on Call (HOC) reporting (the blip at midnight can be ignored as we went closed for 1 minute):


Great job team (you know who you are!)

The final hurdle we had to get over in the last few weeks was driving the Avaya S8730 Media Server into an overload condition. This can be seen quite clearly during performance testing after 8PM:


For information, processor occupancy is defined as the percentage of time the configuration’s processor is busy performing call processing tasks, maintenance tasks, administration tasks, and operating system tasks. Occupancy is further divided into:

  • Static Occupancy (Static Occ) which is the percentage of occupancy used by high priority background processes in support of call processing, maintenance, and administration functions
  • Call Processing Occupancy (CP Occ) which is the percentage of occupancy used by call processing-level processes
  • System Management Occupancy (SM Occ) which is the amount of time taken by lower priority activities such as administration and maintenance command processing
  • dle Occupancy (Idle Occ) which is the amount of time the processor is unused. There are several factors that drive down this number. These factors may reduce the idle occupancy to almost 0 percent during several 3-minute intervals. On a heavily-loaded configuration with frequent demand testing, the idle occupancy may drop to low levels for longer periods (perhaps 1-2 hours). These situations are normal and do not indicate a problem with the configuration.

It is not desirable for any system to function at 100 percent processor occupancy. Rather, the Static and Call Processing Occupancy should total no more than a maximum of 75%. By maintaining this 75% maximum limit, other system functions can be performed and bursts of caller activity can also be accommodated.

The Occupancy report below clearly shows the call processing (CP) occupancy rising to 81% in one 3 minute interval!


In the end the fix was quite simple!

Previously we had been injecting test calls in directly over SIP trunks. However at the end of the day this was producing too many SIP messages for ACM to handle. Therefore, for the final test above we went to (expensive) test injection over the PSTN and all worked OK.

The Occupancy report below shows the call processing (CP) occupancy rising to a maximum of 35% in one 3 minute interval which is perfectly acceptable:


For future reference this is what we learnt during our diagnostic efforts ….

Doubled Calls = Double Call Processing

A single test call shows 4 connections in total per customer call. Hence with Genesys treatments there are an additional 2 connections (as expected). Thus it can be reasonably expected that the call processing load with Genesys treatments will be doubled:


Note: Tandem calls are those calls into Genesys which then come back out e.g. tromboned calls


Look Ahead Routing (LAR)

MST traces showed a lot of denial events 5008/1191. This means the outgoing SIP INVITE (to Genesys) did not get a response within the period set in the Alternate Route Timer on the routing pattern.

We had this timeout set to 2 seconds (rather than the default of 6 seconds) to fix an OAT defect. Therefore, after 2 seconds if there is no ACK back to a SIP INVITE, ACM cancels the call and tries another Trunk Group. Setting the Alternate Route Timer lower causes more LAR retries and higher CPU load than it would if the timeout value was higher.

This assertion is confirmed since the average call connect time (as measured by Empirix) was on average 3 seconds:

SAT Commands

When multiple System Access Terminal (SAT) administration and maintenance commands are performed per second via the Communication Manager (CM) Operations Support Systems Interface (OSSI), system management processor occupancy can increase very rapidly, thus causing overall CPU occupancy to spike. In some instances this can drive the system into CPU overload.

Great care must be exercised when running CPU intensive SAT administration and maintenance commands. These commands should only be run when the system is processing low call volumes (off hours) and never during busy call traffic periods.

Avaya Considerations

Avaya are a bit coy about stating what the SIP message processing throughput of Communication Manager 5.2.1 SP4 actually is.

The document “Avaya Aura™ Communication Manager System Capacities Table” describes the IP endpoint capacity of this system but not in the context of call attempts and connections.

The document “Avaya Aura™ Communication Manager 5.2.1 SP#5 Release Notes” show that there a quite a few “SIP issues” which are fixed in every release.

The effect of duplication on SIP message processing should be considered e.g. PSN002232u – “H.323 and SIP station capacities and SIP trunk capacities for S8xx0 Servers running Avaya Aura™ Communication Manager 5.2.1” stated that Software Duplication feature is not optimised for use with SIP endpoints. Fortunately, at this client we are using hardware (DAL 2) duplication.

The following comments in the Avaya Aura™ Communication Manager 6.0 SP#1 Release Notes should not go unread!

“However, note that the capacities specified in that document pertain to general business configurations and may not be valid or recommended for Call Center (CC) solutions. Simultaneously achieving the upper bounds for multiple capacities including SIP trunks may not be possible for real-world CC systems. Call rates and other operational aspects of these CC systems may preclude realizing the maximum limits”

“*** IMPORTANT: All Call Center designs should be reviewed by the Sales Factory Design Center. Call Center designs that involve SIP trunking *must* go through the Sales Factory. ***”

Genesys Considerations

We never got chance to re-test this but we suspect that when an overload condition occurs, Genesys SIP server causes further overload by resending REFER messages without backing off “for several seconds” at it should do according to the SIP specification.

Under load conditions Avaya CM sends back status code 503 (Service Unavailable). The behaviour we observe is that the SIP message (REFER in this case) gets resent multiple times causing additional load.

For reference, overload occurs in the Session Initiation Protocol (SIP) when SIP servers have insufficient resources to process all SIP messages they receive. The SIP protocol specified in RFC 3261 provides the 503 (Service Unavailable) response code as a remedy for servers under overload. However, the current definition of 503 (Service Unavailable) has problems and can in fact amplify an overload condition. There is an Essential Correction to RFC 3261 which relates to this. Please see

The fix may be in SIP Server 8.0.400.25:

Release Number 8.0.400.25
SIP Server now correctly releases a call when it receives a 503 Service Unavailable message in response to a re-INVITE request that it sent to the call originator. (ER# 248405320)


Performance Test Update

Just a quick update – we are very nearly there!

In the last test we have managed to get to 10 calls per second (CPS). This was achieved by injecting calls directly into a seperate Avaya SES server.

The issue we now see is that the Avaya S8730 Media Server (aka ACM main brain!) hits high CPU (occupancy) which slows everything down and results in new calls being rejected. The resulting behaviour is normal in so much as CPU proriity is given to call processing (CALPRO process) rather than administrator and maintenance processes.

Analysis by Avaya support suggests that the problem is down to the number of AES / CTI links we have to other adjunct systems such as Verint Voice Recording.

CM Service Pack 5 has been suggested as this gives approx. 20% better CPU utilisation on a S8730. However, the root cause will need further investigation e.g. stop non-Genesys adjunct links and re-test. Also we will try increasing the Avaya T-Server query timer from 3 seconds to 10 seconds.


OAT and Performance Test Update

Good progress on both fronts this week!

Default Routing

We had a defect when testing a failure scenario with URS down. Basically calls were not being (Avaya) default routed.

We had SIP T-server options default-dn and router-timeout both configured. After 10 seconds, (router-timeout) response status “302 Moved Temporarily” is sent back to SES. However, a TAC trace on ACM showed that the call is not re-routed and the channel goes IDLE. This is the preferred solution but for some reason Avaya CM does not process “302 Moved Temporarily” correctly.

As an alternative solution we configured Look Ahead Routing (LAR) set to “next” on the Avaya route pattern which puts the call on to a SIP trunk to Genesys in the first place. With this configuration, after no response on SIP Server after 2 seconds and before the Genesys router-timeout expires, ACM cancels the call and tries another Trunk Group.

Once all the trunks configured in the route pattern have been tried the call now drops into the next vector step and is default (Avaya) routed. In our configuration with have 4 Trunk Groups so the call is now default routed after 8-9 seconds.

SES Crashes

During an Empirix performance test this week we managed to crash SES. This was resolved by installing Service Pack SP4a on top of the current version (SES 5.2.0 SP2a).

SIP Trunk Utilisation

We did some tweaking of Trunk Group members and SES Media Server address maps this week.

In our configuration, a SES Media Server address map exists for each Media Server (CLAN interface). Each CLAN interface is associated with an ACM signaling group and trunk group on a 1:1 basis. Each trunk was defined as two-way and each had 250 members.

Even though a CLAN interface can technically support 400+ SIP trunks (channels) it is only possible to configure up to 255 members in each trunk group. Therefore we needed to add some additional “shadow” trunk groups configured with the same Near-end node name e.g. CLAN interface to be able to increase the number of channels assigned to that CLAN.

On the SES end, Address Map Priorities assign a priority to each address map. This priority determines the order in which the proxy tries to match an incoming call pattern to an address map pattern. For example, if an incoming call pattern matched 4 address map patterns, the proxy would route the call to the address map with the higher priority. This does not take into account the utilisation of the underlying Media Server (CLAN). Therefore the first matching address map will always be used.

To allow for this during Empirix performance testing whereby we are injecting calls directly in SES we needed split out the address maps into different number ranges and assign each of these maps to a separate CLAN interface.

This is the final configuration we came up with:


UPDATE: Further testing has shown a problem with this configuration and we are now moving to a configuration with a completely separate / standalone SES server to inject Empirix calls in to.