Usual suspects in the usual place!
Merry Christmas guys and see you next year.
Usual suspects in the usual place!
Merry Christmas guys and see you next year.
Some last minute tuning of Avaya Call Classification has been required in the last couple of weeks prior to go-live which is now set for Monday 22/11/2010!
For Release 1 we do not really need to use any call classification but since Virtual Hold is using TmakePredictiveCall to initiate callback requests it needed to be tuned. Of course, for Release 2 will be use using Genesys outbound so it still needed to be done.
As a reminder, on the Avaya core telephony platform, the function of Call Progress Detection (CPD) and Answer Machine Detection (AMD) is provided by TN744E call classifier and tone detector circuit packs. The TN744 also detects Special Intercept Tones (SIT) to detect Fax machine for example.
Within Avaya Communication Manager, call classification is a pooled resource with each call classifier circuit pack providing eight ports of tone detection. TN2312 (IPSI) circuit packs also provide 8 ports of global call classification each.
During testing we found that it was better to set the priority to use TN2312 (IPSI) resources first and then overflow to TN744E (call classifier) resources. We also found out that IPSI firmware 49-51 should not be used due to hardware compatability problems. Hopefully we can upgrade the firmware in the future to take advantage of an improvement in FW49 – “FW49 supports the CM5.2.1 feature that provides enhanced call classification to meet certain country regulations for silent calls. Silent calls are outbound calls arriving at the destination without an agent being connected to the call”. This enhancement is already in TN744E (call classifier) FW3.
Call classification is enabled automatically (when enabled on the switch) when a TmakePredictiveCall request is received via the Genesys Avaya T-Server component. Global Call Classification is enabled on the switch by setting the system parameter “Answer Supervision by Call Classifier” as shown below:
Here are the final call classifier settings that we have settled on (for Virtual Hold anyway!):
SYSTEM PARAMETERS OCM-CALL-CLASSIFICATION:
TONE DETECTION PARAMETERS:
Global Classifier Adjustment (dB): 3
USA Default Algorithm? y
Global Busy Tone Detection Adj (db): 0
Cadence Classification After Answer? n
SIT TREATMENT FOR CALL CLASSIFICATION:
SIT Ineffective Other: answered
SIT Intercept: answered
SIT No Circuit: answered
SIT Reorder: answered
SIT Vacant Code: answered
SIT Unknown: answered
AMD Treatment: answered
Pause Duration (seconds): 0.5
Talk Duration (seconds): 1.5
Disconnect Supervision – In? y Out? y
Answer Supervision Timeout: 0
Administer Timers? n
CONNECT Reliable When Call Leaves ISDN? y
Every Genesys project needs them and now we have applied some I am 100% confident the project will go live and all will work OK!
In out case these options relate to the Avaya TSAPI components. We added some additional DNs this week in preparation for go-live and then hit a CTI link disconnected problem on switchover of the primary and backup pair. This resulted in looping where the system kept flipping back between the primary and backup.
Genesys found some magic settings to fix this problem. These are:
Although these are undocumented T-server options, their meaning can be found in the Avaya TSAPI API documentation:
There is a mention of exactly the same problem in Avaya CCE (Contact Center Express) 4.1 documentation:
Well after many attempts and many hours of testing we have finally been able to demonstrate that the solution at this client can support 15000 busy hour call attempts. Here is the evidence courtesy of Empirix Hammer on Call (HOC) reporting (the blip at midnight can be ignored as we went closed for 1 minute):
Great job team (you know who you are!)
The final hurdle we had to get over in the last few weeks was driving the Avaya S8730 Media Server into an overload condition. This can be seen quite clearly during performance testing after 8PM:
For information, processor occupancy is defined as the percentage of time the configuration’s processor is busy performing call processing tasks, maintenance tasks, administration tasks, and operating system tasks. Occupancy is further divided into:
It is not desirable for any system to function at 100 percent processor occupancy. Rather, the Static and Call Processing Occupancy should total no more than a maximum of 75%. By maintaining this 75% maximum limit, other system functions can be performed and bursts of caller activity can also be accommodated.
The Occupancy report below clearly shows the call processing (CP) occupancy rising to 81% in one 3 minute interval!
In the end the fix was quite simple!
Previously we had been injecting test calls in directly over SIP trunks. However at the end of the day this was producing too many SIP messages for ACM to handle. Therefore, for the final test above we went to (expensive) test injection over the PSTN and all worked OK.
The Occupancy report below shows the call processing (CP) occupancy rising to a maximum of 35% in one 3 minute interval which is perfectly acceptable:
For future reference this is what we learnt during our diagnostic efforts ….
Doubled Calls = Double Call Processing
A single test call shows 4 connections in total per customer call. Hence with Genesys treatments there are an additional 2 connections (as expected). Thus it can be reasonably expected that the call processing load with Genesys treatments will be doubled:
Note: Tandem calls are those calls into Genesys which then come back out e.g. tromboned calls
Look Ahead Routing (LAR)
MST traces showed a lot of denial events 5008/1191. This means the outgoing SIP INVITE (to Genesys) did not get a response within the period set in the Alternate Route Timer on the routing pattern.
We had this timeout set to 2 seconds (rather than the default of 6 seconds) to fix an OAT defect. Therefore, after 2 seconds if there is no ACK back to a SIP INVITE, ACM cancels the call and tries another Trunk Group. Setting the Alternate Route Timer lower causes more LAR retries and higher CPU load than it would if the timeout value was higher.
When multiple System Access Terminal (SAT) administration and maintenance commands are performed per second via the Communication Manager (CM) Operations Support Systems Interface (OSSI), system management processor occupancy can increase very rapidly, thus causing overall CPU occupancy to spike. In some instances this can drive the system into CPU overload.
Great care must be exercised when running CPU intensive SAT administration and maintenance commands. These commands should only be run when the system is processing low call volumes (off hours) and never during busy call traffic periods.
Avaya are a bit coy about stating what the SIP message processing throughput of Communication Manager 5.2.1 SP4 actually is.
The document “Avaya Aura™ Communication Manager System Capacities Table” describes the IP endpoint capacity of this system but not in the context of call attempts and connections.
The document “Avaya Aura™ Communication Manager 5.2.1 SP#5 Release Notes” show that there a quite a few “SIP issues” which are fixed in every release.
The effect of duplication on SIP message processing should be considered e.g. PSN002232u – “H.323 and SIP station capacities and SIP trunk capacities for S8xx0 Servers running Avaya Aura™ Communication Manager 5.2.1” stated that Software Duplication feature is not optimised for use with SIP endpoints. Fortunately, at this client we are using hardware (DAL 2) duplication.
The following comments in the Avaya Aura™ Communication Manager 6.0 SP#1 Release Notes should not go unread!
“However, note that the capacities specified in that document pertain to general business configurations and may not be valid or recommended for Call Center (CC) solutions. Simultaneously achieving the upper bounds for multiple capacities including SIP trunks may not be possible for real-world CC systems. Call rates and other operational aspects of these CC systems may preclude realizing the maximum limits”
“*** IMPORTANT: All Call Center designs should be reviewed by the Sales Factory Design Center. Call Center designs that involve SIP trunking *must* go through the Sales Factory. ***”
We never got chance to re-test this but we suspect that when an overload condition occurs, Genesys SIP server causes further overload by resending REFER messages without backing off “for several seconds” at it should do according to the SIP specification.
Under load conditions Avaya CM sends back status code 503 (Service Unavailable). The behaviour we observe is that the SIP message (REFER in this case) gets resent multiple times causing additional load.
For reference, overload occurs in the Session Initiation Protocol (SIP) when SIP servers have insufficient resources to process all SIP messages they receive. The SIP protocol specified in RFC 3261 provides the 503 (Service Unavailable) response code as a remedy for servers under overload. However, the current definition of 503 (Service Unavailable) has problems and can in fact amplify an overload condition. There is an Essential Correction to RFC 3261 which relates to this. Please see http://tools.ietf.org/html/draft-hilt-sip-correction-503-01
The fix may be in SIP Server 8.0.400.25:
Release Number 8.0.400.25
SIP Server now correctly releases a call when it receives a 503 Service Unavailable message in response to a re-INVITE request that it sent to the call originator. (ER# 248405320)
Just a quick update – we are very nearly there!
In the last test we have managed to get to 10 calls per second (CPS). This was achieved by injecting calls directly into a seperate Avaya SES server.
The issue we now see is that the Avaya S8730 Media Server (aka ACM main brain!) hits high CPU (occupancy) which slows everything down and results in new calls being rejected. The resulting behaviour is normal in so much as CPU proriity is given to call processing (CALPRO process) rather than administrator and maintenance processes.
Analysis by Avaya support suggests that the problem is down to the number of AES / CTI links we have to other adjunct systems such as Verint Voice Recording.
CM Service Pack 5 has been suggested as this gives approx. 20% better CPU utilisation on a S8730. However, the root cause will need further investigation e.g. stop non-Genesys adjunct links and re-test. Also we will try increasing the Avaya T-Server query timer from 3 seconds to 10 seconds.
Finally got round to adding a GUI to my Genesys Test Utility (GTU).
GTU is a simple Microsoft Windows application which allows telephony commands to be executed either on a single extension or ACD login or concurrently on a group of extensions or ACD logins.
Test Advisors are configured in the “agents.xml” configuration file. The format of this file is exactly the same as for Empirix VAS (Virtual Agent Simulator).
Also took the opportunity to add some new features such as:
Here are some more screenshots:
Good progress on both fronts this week!
We had a defect when testing a failure scenario with URS down. Basically calls were not being (Avaya) default routed.
We had SIP T-server options default-dn and router-timeout both configured. After 10 seconds, (router-timeout) response status “302 Moved Temporarily” is sent back to SES. However, a TAC trace on ACM showed that the call is not re-routed and the channel goes IDLE. This is the preferred solution but for some reason Avaya CM does not process “302 Moved Temporarily” correctly.
As an alternative solution we configured Look Ahead Routing (LAR) set to “next” on the Avaya route pattern which puts the call on to a SIP trunk to Genesys in the first place. With this configuration, after no response on SIP Server after 2 seconds and before the Genesys router-timeout expires, ACM cancels the call and tries another Trunk Group.
Once all the trunks configured in the route pattern have been tried the call now drops into the next vector step and is default (Avaya) routed. In our configuration with have 4 Trunk Groups so the call is now default routed after 8-9 seconds.
During an Empirix performance test this week we managed to crash SES. This was resolved by installing Service Pack SP4a on top of the current version (SES 5.2.0 SP2a).
SIP Trunk Utilisation
We did some tweaking of Trunk Group members and SES Media Server address maps this week.
In our configuration, a SES Media Server address map exists for each Media Server (CLAN interface). Each CLAN interface is associated with an ACM signaling group and trunk group on a 1:1 basis. Each trunk was defined as two-way and each had 250 members.
Even though a CLAN interface can technically support 400+ SIP trunks (channels) it is only possible to configure up to 255 members in each trunk group. Therefore we needed to add some additional “shadow” trunk groups configured with the same Near-end node name e.g. CLAN interface to be able to increase the number of channels assigned to that CLAN.
On the SES end, Address Map Priorities assign a priority to each address map. This priority determines the order in which the proxy tries to match an incoming call pattern to an address map pattern. For example, if an incoming call pattern matched 4 address map patterns, the proxy would route the call to the address map with the higher priority. This does not take into account the utilisation of the underlying Media Server (CLAN). Therefore the first matching address map will always be used.
To allow for this during Empirix performance testing whereby we are injecting calls directly in SES we needed split out the address maps into different number ranges and assign each of these maps to a separate CLAN interface.
This is the final configuration we came up with:
UPDATE: Further testing has shown a problem with this configuration and we are now moving to a configuration with a completely separate / standalone SES server to inject Empirix calls in to.
Rob from Empirix had a “little” accident this week. No excuse for not cracking on with a baseline test though!
Get well soon mate.
As feared, our Avaya SIP interoperability issues from last year have come back when we upgraded to SIP Server 8.0.400.25 in order to fix a Stream Manager resilience problem (see earlier post – Release 1 Operational Acceptance Testing)
SIP INVITE messages are now getting bounced with error 416 – Unsupported URI Scheme. We believe that this is because the INVITE contains “;transport=tls” even though we are not using TLS on the Genesys side and it is not enabled in SES on the Genesys mappings!
However, TLS is enabled between Avaya CM (CLAN cards) and SES. The options would seem to be:
1) Set “sip-tls-port=0” on SIP server to disable it
2) Disable TLS between CM and SES (mappings and SIP signalling links)
My money is option (2) at the moment.