Implementing secure voice using Secure RTP (SRTP)


In a SIP implementation, DTMF information can be transported between SIP endpoints with out-of-band (OOB) or in-band signaling. In-band DTMF transport methods send DTMF tones as either raw tones in the RTP media stream or as signalled tones in the RTP payload with RFC 2833. Among SIP product vendors, RFC 2833 has become the predominant method to send and receive DTMF tones.

From a Payment Card Industry (PCI) perspective if a SIP connected IVR is used to host payment applications, there is an issue in that the DTMF digits (cardholder details) can be intercepted in the RTP payload if the underlying network infrastructure is not secured.

Since, the RTP payload format itself does not have any built-in security mechanisms, confidentiality of the media streams must be achieved by encryption using external mechanisms, such as Secure RTP (SRTP).

Secure RTP (SRTP) is a profile of RTP defined in RFC3711 that provides encryption and authentication of audio (and video) data in a RTP stream. SRTP encryption keys and options are exchanged in SIP INVITE and response messages, preferably using secure SIP (SIPS).

For encryption and decryption of the data flow (and hence for providing confidentiality of the data flow), SRTP utilises the Advanced Encryption Standard (AES) as the default cipher. Besides the AES cipher, SRTP allows the ability to disable encryption outright using the so called “NULL cipher”.

AES specifies three possible key sizes, and by default the Avaya implementation uses AES operating in 128-bit Counter mode (AES-128-CTR) using a 128-bit key.

Almost everything is standardised for secure SIP calls, except for a widely adopted key exchange (derivation) mechanism. The key derivation function is used to derive the different keys used in a crypto context (SRTP encryption keys and salts, SRTP authentication keys) from one single master key in a cryptographically secure way. SRTP relies on an external key management protocol to set up the initial master key.

The most common method to negotiate the SRTP keys is the Security Descriptions for media streams (SDES / sDescriptions) key exchange method as defined in RFC4568. This is the key exchange mechanism used by Avaya Communication Manager.

SDES uses plain text key exchange via the SIP Session Description Protocol (SDP) within SIP messages and ideally requires TLS for enhanced security. However the SDES method, even if coupled with TLS, allows any SIP server that is in the signalling path to see the SRTP Master Key in plain text (but not the session key).

From a PCI perspective, encryption of the SIP signalling traffic is typically not mandated by the PCI QSA since using that master key to deduce the session key is not a simple undertaking, which means that SRTP does come with a lot of added value even if not coupled with TLS.

However, depending on the SIP endpoints there is a risk that if a SIP endpoint is requested to negotiate a secure RTP (SRTP) session but a secure SIP transport is not being used e.g. TLS is not specified as the transport and port 5061 is not being used, it will reject the INVITE message.

The SRTP standard (RFC 3711) defines the SRTP cryptographic parameters. The SRTP master key is passed using the Session Description Protocol (SDP) within SIP signalling messages as the “inline” parameter within SDP packets.

The receiver of an encrypted RTP packet needs to know the encryption cipher and mode, the authentication transform and tag length, the key derivation rate, and other information about the SRTP stream. This information is described with the media stream in SDP using a SRTP SDP attribute, “a=crypto“. An example is shown below:


The diagram below shows how the master key is used in the SRTP Key Derivation process:


A single SRTP master key is input to the Key Derivation Function (KDF). The other input may be the SRTP packet index, derived using the RTP packet sequence number. Thus, SRTP creates the several keys needed for packet encryption at the synchronisation source (SSRC) and authentication from a single master key.

Once the master key is exchanged (or installed) and session keys are derived, SRTP encryption and authentication keys can be periodically refreshed when the key derivation rate is non-zero and is set to some period. A zero key-derivation rate, however, restricts the KDF to one invocation at the start of the session. A non-zero rate means that every time the packet-index modulo key derivation rate is zero, the KDF will be invoked and a new encryption and a new authentication key will be derived. Normally, setting the key derivation rate to zero is recommended.

Genesys support for SRTP

Genesys GVP 7.6 components do not support voice encryption using Secure RTP (SRTP). GVP 8.x supports SRTP as well as SIP over a secured (TLS) transport.

The default behaviour of GVP 8.1 is:

  • If the other side (for example Avaya) ignores SRTP, GVP will fall back to non-SRTP mode
  • If a previously negotiated “m-line” attribute in an SDP is used in a re-offer or if the far end requests an offer and that m-line did not have SRTP negotiated, SRTP will not be added
  • If the far end re-offers and adds SRTP to a previously negotiated m-line, SRTP will be negotiated

GVP 8.x supports the following SRTP modes (srtp.mode):

  • None – No SRTP support. The Media Control Platform will ignore the “crypto” attribute in SDP offers
  • accept_only – SRTP is supported for SDP offers sent to the Media Control Platform, but the platform will not add SRTP to m-lines in outgoing offers that did not previously contain it
  • offer – SRTP is supported for SDP offers sent to the Media Control Platform, and will be included in all outgoing SDP offers
  • offer_strict – The Media Control Platform accepts SRTP received in the offer, and sends a crypto line in its own offer, but will fail if the answer does not contain a valid crypto line

GVP 8.x supports the following SRTP cryptography methods (strp.cryptomethods):

  • AES_CM_128_HMAC_SHA1_80
  • AES_CM_128_HMAC_SHA1_32

Implementation of SRTP between Avaya and GVP 7.6

The diagram below shows a high level overview of a solution architecture using Session Border Controllers (SBC) with Back to back User Agent (B2BUA) functionality deployed in front of Genesys Voice Platform (GVP) 7.6 instances to act as a bridge between secure voice traffic (SRTP) and insecure voice traffic (RTP).

Secure RTP (SRTP) is used to provide encryption and authentication of audio streams between the Session Border Controllers (SBC) and the Media Gateway (Avaya TN2602AP IP Media Resource circuit packs).

A back-to-back user agent (B2BUA) is a logical SIP network element. It resides between both end points of a phone call / SIP session and divides the communication session into two call legs and mediates all SIP signaling between both ends of the call, from call establishment to termination.

In the originating call leg the B2BUA acts as a user agent server (UAS) and processes the request as a user agent client (UAC) to the destination end, handling the signaling between end points back-to-back. A B2BUA maintains complete state for the calls it handles. Each side of a B2BUA operates as a standard SIP network element

Thus, the SBC acts on behalf of caller and creates a second call leg to the GVP port (destination party) and performs specific protocol “normalisation” or “fix-up”. The second call leg therefore does not negotiate any encryption and uses RTP rather than SRTP which is not supported on GVP 7.6.


As shown in the diagram below, a high availably pair of Session Border Controllers are deployed in front of multiple Genesys Voice Platform (GVP) instances. Therefore the B2BUA functionality must support multiple routes allowing SIP requests to be forwarded to different GVP instances. For example, SIP messages received on port 5060 would be forwarded to GVP server 1, SIP messages received on port 5061 would be forwarded to GVP server 2 etc. etc