Avaya Call Classification

Some last minute tuning of Avaya Call Classification has been required in the last couple of weeks prior to go-live which is now set for Monday 22/11/2010!

For Release 1 we do not really need to use any call classification but since Virtual Hold is using TmakePredictiveCall to initiate callback requests it needed to be tuned. Of course, for Release 2 will be use using Genesys outbound so it still needed to be done.

As a reminder, on the Avaya core telephony platform, the function of Call Progress Detection (CPD) and Answer Machine Detection (AMD) is provided by TN744E call classifier and tone detector circuit packs. The TN744 also detects Special Intercept Tones (SIT) to detect Fax machine for example.

Within Avaya Communication Manager, call classification is a pooled resource with each call classifier circuit pack providing eight ports of tone detection. TN2312 (IPSI) circuit packs also provide 8 ports of global call classification each.

During testing we found that it was better to set the priority to use TN2312 (IPSI) resources first and then overflow to TN744E (call classifier) resources. We also found out that IPSI firmware 49-51 should not be used due to hardware compatability problems. Hopefully we can upgrade the firmware in the future to take advantage of an improvement in FW49 – “FW49 supports the CM5.2.1 feature that provides enhanced call classification to meet certain country regulations for silent calls. Silent calls are outbound calls arriving at the destination without an agent being connected to the call”. This enhancement is already in TN744E (call classifier) FW3.

Call classification is enabled automatically (when enabled on the switch) when a TmakePredictiveCall request is received via the Genesys Avaya T-Server component. Global Call Classification is enabled on the switch by setting the system parameter “Answer Supervision by Call Classifier” as shown below:

Image

Here are the final call classifier settings that we have settled on (for Virtual Hold anyway!):

SYSTEM PARAMETERS OCM-CALL-CLASSIFICATION:

TONE DETECTION PARAMETERS:
Global Classifier Adjustment (dB): 3
USA Default Algorithm? y
Global Busy Tone Detection Adj (db): 0
Cadence Classification After Answer? n

SIT TREATMENT FOR CALL CLASSIFICATION:

SIT Ineffective Other: answered
SIT Intercept: answered
SIT No Circuit: answered
SIT Reorder: answered
SIT Vacant Code: answered
SIT Unknown: answered
AMD Treatment: answered
Pause Duration (seconds): 0.5
Talk Duration (seconds): 1.5

TRUNK PARAMETERS:

Disconnect Supervision – In? y Out? y
Answer Supervision Timeout: 0
Administer Timers? n
CONNECT Reliable When Call Leaves ISDN? y

Share

“Magic” Avaya T-Server options

Every Genesys project needs them and now we have applied some I am 100% confident the project will go live and all will work OK!

In out case these options relate to the Avaya TSAPI components. We added some additional DNs this week in preparation for go-live and then hit a CTI link disconnected problem on switchover of the primary and backup pair. This resulted in looping where the system kept flipping back between the primary and backup.

Genesys found some magic settings to fix this problem. These are:

[Section: tsapi-configuration]

recv-extra-bufs=100

recv-q-size=100

send-q-size=100

Although these are undocumented T-server options, their meaning can be found in the Avaya TSAPI API documentation:

Image

There is a mention of exactly the same problem in Avaya CCE (Contact Center Express) 4.1 documentation:

Image

Share

Performance Testing and Avaya Overload!

Well after many attempts and many hours of testing we have finally been able to demonstrate that the solution at this client can support 15000 busy hour call attempts. Here is the evidence courtesy of Empirix Hammer on Call (HOC) reporting (the blip at midnight can be ignored as we went closed for 1 minute):

Image

Great job team (you know who you are!)

The final hurdle we had to get over in the last few weeks was driving the Avaya S8730 Media Server into an overload condition. This can be seen quite clearly during performance testing after 8PM:

Image

For information, processor occupancy is defined as the percentage of time the configuration’s processor is busy performing call processing tasks, maintenance tasks, administration tasks, and operating system tasks. Occupancy is further divided into:

  • Static Occupancy (Static Occ) which is the percentage of occupancy used by high priority background processes in support of call processing, maintenance, and administration functions
  • Call Processing Occupancy (CP Occ) which is the percentage of occupancy used by call processing-level processes
  • System Management Occupancy (SM Occ) which is the amount of time taken by lower priority activities such as administration and maintenance command processing
  • dle Occupancy (Idle Occ) which is the amount of time the processor is unused. There are several factors that drive down this number. These factors may reduce the idle occupancy to almost 0 percent during several 3-minute intervals. On a heavily-loaded configuration with frequent demand testing, the idle occupancy may drop to low levels for longer periods (perhaps 1-2 hours). These situations are normal and do not indicate a problem with the configuration.

It is not desirable for any system to function at 100 percent processor occupancy. Rather, the Static and Call Processing Occupancy should total no more than a maximum of 75%. By maintaining this 75% maximum limit, other system functions can be performed and bursts of caller activity can also be accommodated.

The Occupancy report below clearly shows the call processing (CP) occupancy rising to 81% in one 3 minute interval!

Image

In the end the fix was quite simple!

Previously we had been injecting test calls in directly over SIP trunks. However at the end of the day this was producing too many SIP messages for ACM to handle. Therefore, for the final test above we went to (expensive) test injection over the PSTN and all worked OK.

The Occupancy report below shows the call processing (CP) occupancy rising to a maximum of 35% in one 3 minute interval which is perfectly acceptable:

Image

For future reference this is what we learnt during our diagnostic efforts ….

Doubled Calls = Double Call Processing

A single test call shows 4 connections in total per customer call. Hence with Genesys treatments there are an additional 2 connections (as expected). Thus it can be reasonably expected that the call processing load with Genesys treatments will be doubled:

Image

Note: Tandem calls are those calls into Genesys which then come back out e.g. tromboned calls

Image

Look Ahead Routing (LAR)

MST traces showed a lot of denial events 5008/1191. This means the outgoing SIP INVITE (to Genesys) did not get a response within the period set in the Alternate Route Timer on the routing pattern.

We had this timeout set to 2 seconds (rather than the default of 6 seconds) to fix an OAT defect. Therefore, after 2 seconds if there is no ACK back to a SIP INVITE, ACM cancels the call and tries another Trunk Group. Setting the Alternate Route Timer lower causes more LAR retries and higher CPU load than it would if the timeout value was higher.

This assertion is confirmed since the average call connect time (as measured by Empirix) was on average 3 seconds:
Image

SAT Commands

When multiple System Access Terminal (SAT) administration and maintenance commands are performed per second via the Communication Manager (CM) Operations Support Systems Interface (OSSI), system management processor occupancy can increase very rapidly, thus causing overall CPU occupancy to spike. In some instances this can drive the system into CPU overload.

Great care must be exercised when running CPU intensive SAT administration and maintenance commands. These commands should only be run when the system is processing low call volumes (off hours) and never during busy call traffic periods.

Avaya Considerations

Avaya are a bit coy about stating what the SIP message processing throughput of Communication Manager 5.2.1 SP4 actually is.

The document “Avaya Aura™ Communication Manager System Capacities Table” describes the IP endpoint capacity of this system but not in the context of call attempts and connections.

The document “Avaya Aura™ Communication Manager 5.2.1 SP#5 Release Notes” show that there a quite a few “SIP issues” which are fixed in every release.

The effect of duplication on SIP message processing should be considered e.g. PSN002232u – “H.323 and SIP station capacities and SIP trunk capacities for S8xx0 Servers running Avaya Aura™ Communication Manager 5.2.1” stated that Software Duplication feature is not optimised for use with SIP endpoints. Fortunately, at this client we are using hardware (DAL 2) duplication.

The following comments in the Avaya Aura™ Communication Manager 6.0 SP#1 Release Notes should not go unread!

“However, note that the capacities specified in that document pertain to general business configurations and may not be valid or recommended for Call Center (CC) solutions. Simultaneously achieving the upper bounds for multiple capacities including SIP trunks may not be possible for real-world CC systems. Call rates and other operational aspects of these CC systems may preclude realizing the maximum limits”

“*** IMPORTANT: All Call Center designs should be reviewed by the Sales Factory Design Center. Call Center designs that involve SIP trunking *must* go through the Sales Factory. ***”

Genesys Considerations

We never got chance to re-test this but we suspect that when an overload condition occurs, Genesys SIP server causes further overload by resending REFER messages without backing off “for several seconds” at it should do according to the SIP specification.

Under load conditions Avaya CM sends back status code 503 (Service Unavailable). The behaviour we observe is that the SIP message (REFER in this case) gets resent multiple times causing additional load.

For reference, overload occurs in the Session Initiation Protocol (SIP) when SIP servers have insufficient resources to process all SIP messages they receive. The SIP protocol specified in RFC 3261 provides the 503 (Service Unavailable) response code as a remedy for servers under overload. However, the current definition of 503 (Service Unavailable) has problems and can in fact amplify an overload condition. There is an Essential Correction to RFC 3261 which relates to this. Please see http://tools.ietf.org/html/draft-hilt-sip-correction-503-01

The fix may be in SIP Server 8.0.400.25:

Release Number 8.0.400.25
SIP Server now correctly releases a call when it receives a 503 Service Unavailable message in response to a re-INVITE request that it sent to the call originator. (ER# 248405320)

Share