
Transparent SIP communication with NAT
Number, Please
Mapping internal IP addresses to external IP addresses is essential for Voice over IP (VoIP) communications through network address translation (NAT) gateways and firewalls. Session Initiation Protocol (SIP) is the signaling protocol for establishing VoIP connections; however, SIP-based communications have problems working through firewalls and session border controllers, and all too often, VoIP calls or some unified communications functions fail because of NAT. In this article, I show you how IT managers can resolve these issues with the session traversal utilities for NAT (STUN), traversal using relays around NAT (TURN), and Interactive Connectivity Establishment (ICE) techniques to ensure transparent transitions and improve overall SIP security.
NAT Characteristics
Some years ago, the limited availability of IP addresses led to the development of various strategies by the Internet Engineering Task Force (IETF) for covering a wide environment with the available addresses. One of the intermediate solutions, called NAT (RFC 3022) [1] or PAT (port and address translation), uses conversion between private and public IP addresses.
NAT uses tables to assign the IP addresses of a private (internal) network to public IP addresses (Figure 1). The internal IP addresses remain hidden. NAT services exchange the sender and receiver IP addresses in the IP header. The simplest form of address conversion is known as static NAT. Address translation converts a private IP address sent from a private address space into a public IP address to be received in a public address space. In the reply packet, this conversion takes place in reverse order.

The types of NAT systems include:
- Full cone NAT: IP address conversion takes place independently of a previous outbound connection on the basis of fixed address entries. Every user of the external network can send their packets to the public IP port. The packets are automatically forwarded from the NAT system to the computer with the corresponding address.
- Restricted cone: Address mapping is only performed if it was triggered by an outgoing connection. If an internal computer sends its packets to an external computer, the NAT system uses mapping to translate the client address. The external computer can then send its packets directly back to the internal client (via address mapping). However, the NAT system blocks all incoming packets from other senders.
- Port-restricted cone: Similar to restricted cone NAT, address mapping only takes place if it was triggered by an outgoing connection (identified by the IP and port address).
- Symmetric cone: Fundamentally different from the NAT mechanisms described so far, mapping from the internal to the public IP port address depends on the target IP address of the packet to be transmitted. For example, if a client with the address pair 10.0.0.1:8000 is transmitting to external computer B, address mapping is performed to the external address pair 202.123.211.25:12345. If the same client sends its packets from the same port (10.0.0.1:8000) to a different destination address (computer A), it is mapped to the address 202.123.211.25:45678. The external hosts (A and B) can only send their packets to the respective NAT mapping address. Any attempt by an external machine to send the packets to another address mapping will result in the packets being dropped.
PAT Mechanisms
The PAT mechanism maps all IP addresses of a private network to a single public IP address (Figure 2). In this way, a completely private network only needs a single registered public IP address. Some manufacturers also refer to the PAT function as "hidden NAT." In practice, if two internal computers share an external IP address on the basis of the private IP addresses, an address conflict inevitably occurs. If both internal computers communicate simultaneously with external communication partners, the NAT component must decide to which internal computer the received packet will be forwarded. Because the routing or forwarding decision is based only on the IP addresses integrated into the IP header, this problem cannot be solved.

As with dynamic address mapping, the NAT component only has to create a corresponding mapping table during the connection setup and, with that, is able to assign the individual connections to the correct IP addresses. The NAT process simply searches the mapping table for the connection to which the packet belongs. If there is a match, the address is converted and forwarded to the right IP address on the internal network – theoretically.
In practice, however, this process is far more complicated. For example, two internal machines communicate with a common external IP address and both transmit a DNS request to the DNS server operated by the ISP for the company in question. The DNS server operated by the ISP resides on the external network from the point of view of the DNS clients, which means that all DNS queries always pass through the NAT process and address conversion always takes place.
The DNS clients transmit their DNS requests to the DNS server on the public network. The packets transmitted to the public IP network thus contain the following IP/TCP/UDP information: the same IP source address, the same IP destination address, and the same destination port number (UDP port 53 for DNS queries). Only the source port numbers differ in the DNS queries, and it is exactly this information that is used to identify the internal connections.
Most operating systems start the assignment of the sender ports with the value 1025 and then assign the source port numbers sequentially to the individual connections. Under certain circumstances, both IP transmitters can use the same source port numbers for communication with the DNS server. In this case, a conflict is unavoidable. To avoid this statistical possibility of a perfect address equation, the PAT process not only converts the IP addresses but also the port numbers, ensuring that the internal IP components always use an individual port number to communicate with the external IP resources.
SIP Log Problems
SIP, according to RFC 3261 [2], is today's standard signaling mechanism for real-time communication streams in an IP environment. However, SIP-based communication also has a flaw: A terminal device on the LAN cannot communicate directly with a communication partner if one or more NAT functions (e.g., in firewalls) exist in the communication channel for security reasons.
When NAT converts IP addresses as described above, some protocols, including SIP, communicate the endpoint addresses when establishing a connection. If the addresses do not match, the terminals do not communicate. Several NAT traversal methods can now be used to eliminate this problem – but more about that later.
NAT Traversal and VoIP
One tried and tested means for working around NAT components is manual device configuration, wherein NAT is configured to forward certain data packets to a specific local computer. NAT usually determines forwarding on the basis of the destination port in the data packet and therefore requires a port number (or port range) and the IP address of the local computer for port forwarding. With the help of fixed forwarding by port number, the local computer outside the network can be reached on a fixed port (range). The big advantage of port forwarding is that it is the only NAT traversal technique that actually works for many applications, although it is offset by a number of important disadvantages:
- Other local computers cannot use this port because of the fixed assignment of a port number to a specific computer.
- Many applications select the port dynamically, making it difficult to determine beforehand or to select a port from a port range.
The STUN mechanism for transparently routing VoIP streams across NAT systems enables a VoIP endpoint to determine the correct public IP address, provides a mechanism for checking connections between two endpoints, and provides additional mechanisms for maintaining NAT address mappings using a keepalive protocol (Figure 3).

An earlier version of STUN described in RFC 3489 [3] – now referred to as "classic STUN" – required a complete revision of the STUN concept on the basis of experience gained in practice. The new STUN (according to RFC 5389 [4]) is now just a mechanism used in conjunction with other specifications (e.g., SIP-OUTBOUND, TURN, and ICE).
The task of a standalone STUN server is to provide the correct transport addresses using the STUN binding function. A STUN server must be able to send and receive messages by the UDP and TCP protocols. A plain vanilla STUN server provides only a partial solution to the problem of correct transfer over NAT gateways. For this reason, a STUN server always collaborates with other components. STUN is more like a tool within a more comprehensive NAT gateway solution. The following STUN uses are currently defined:
- Interactive connectivity establishment (ICE)
- Client-oriented SIP connections to external resources (SIP-OUTBOUND)
- NAT behavior discovery (BEHAVE-NAT)
For VoIP endpoints, STUN provides a mechanism for correctly determining the IP address and the port currently used at the other end of a NAT gateway or router (transition between the private and a public IP address range). In contrast to classic STUN, the information can be transmitted over TCP as well as UDP. The new STUN can also be used to negotiate optional attributes and authentication with VoIP servers.
TURN as a Last Resort
STUN enables a client to determine the correct transport address on which the terminal device can be reached from the public network. If direct communication between the two SIP terminals is not possible and STUN does not provide functional address mapping, the services of a relay computer are used. This mechanism was published in RFC 5766 – "Traversal Using Relays Around NAT (TURN)" [5].
The goal of TURN is to provide the client a publicly accessible address/port tuple even in these situations. The only way to achieve this in all cases is to route the data through a TURN server that can be reached on the public network. For this purpose, a client on the TURN server can request an endpoint on which it will then be publicly accessible. The server will then forward the packets to the client.
Because TURN behaves like port-restricted NAT here, the process does not undermine the security functions of NAT and firewalls. For a client that has defined an endpoint on a server via TURN, it must first send a packet to the clients from which it wants to receive packets. Operating servers on well-known ports behind NAT is therefore not possible. The protocol is based on STUN and shares its message structure and basic mechanisms.
Although TURN always makes it possible to establish a connection, redirecting all traffic through the TURN server places a heavy load on the server. Therefore, TURN should only be considered as a last resort if other methods like STUN do not lead to success.
ICE as a Lubricant
In 2004, the IETF began to develop the ICE technique. For any type of session protocol, ICE ensures trouble-free passage through all types of NAT and firewalls. ICE was designed so that the required addressing functions can be implemented with the SIP protocol and thus also with the Session Description Protocol (SDP). ICE acts as a uniform framework around STUN and TURN. Additionally, ICE supports TCP as well as UDP media sessions.
Instead of only STUN or TURN, an ICE client is able to determine the required addresses with both methods. Both addresses are transmitted to the communication partner along with the local interface addresses in the subsequent SIP call setup message. The elements of the address information contained in the invitation message are known as the "candidates," which are the potential communication endpoints for the SIP agent. When an invitation message reaches the call recipient, the latter also runs the ICE address collection functions and transmits specific addresses in its SIP reply. Both agents then check the possible connections that are implemented by STUN messages from an agent to the other end of the communication path. A check is performed to discover which pair of candidates works. Once a functioning pair of candidates has been found, the media stream begins to flow between the two communication partners.
ICE goes through six steps to establish a connection:
Step 1. The call initiator collects the IP and port addresses of all potential communication candidates before the actual call. The first candidates are sought by the interfaces of the local computer (host). If the host has several interfaces, the agent obtains a candidate from each interface. The candidates of the computer interfaces (including virtual interfaces) are referred to as host candidates. The agent then directly contacts the STUN server on any host interface. The results of these tests are server-reflexive candidates, which translate to the IP and port addresses of the outermost NAT on the path between the agent and the STUN server and is usually the NAT facing the public Internet. Finally, the agent also receives all the candidates from TURN servers. These IP and port addresses reside on the relay servers.
Step 2. Each candidate is prioritized after the agent has collected its candidates. The highest priority defines the candidate to be used. As a rule, relay candidates receive the lowest priority because they have the highest voice delay.
Step 3. According to the identified and prioritized candidates, the agent generates its SIP INVITE request to establish the call. The SDP header is part of the INVITE request, which the caller uses to transmit the connection information required for the call, including the codec, its parameters, and the IP and port addresses to be used. ICE extends SDP by adding some new attributes. The most important of these is the candidate attribute. Because the agent might know more than one possible candidate, it transmits a separate candidate attribute in the SDP header for each possible media stream. The attribute contains the IP and port addresses for the candidate concerned, its priority, and the type of candidate (host, server reflexive, or relay). Additionally, the SDP message contains information for safeguarding the STUN functions.
Step 4. SIP transmits the SIP INVITE message with the corresponding SDP information over the network. If the called agent also supports ICE, the phone will ring. The party being called collects its candidates and generates a preliminary SIP response, which signals to the caller that the SIP request is still being processed. The preliminary response contains an SDP message with the communication partner's candidates.
Step 5. The caller and the called party have exchanged the necessary SDP messages. The agents involved in the call know all candidates for transferring the media streams. Note that certain applications (e.g., videophones) generate more than one media stream. ICE then performs the most important part of its tasks. Each agent pair knows the possible candidates and the corresponding candidates of its peer – the list of possible candidate pairs. Each agent calculates the priority of the candidate pairs (combined priority of the individual candidates), and the candidate couple with the highest priority has the optimal path between the two communication partners.
Step 6. For the final review of the candidate pair, ICE conducts connection checks on the basis of STUN transactions from each agent. The STUN transactions use the IP and port addresses of the selected candidate pairs, which grow in proportion to the square of the number of candidates, and control their bidirectional accessibility. This process makes a parallel review of each candidate pair problematic. ICE checks the candidate pairs sequentially by priority. Every 20ms each agent generates a STUN transaction for the next pair of candidates in the list. If an agent receives a STUN request for a candidate pair, it immediately generates a STUN transaction in the opposite direction, known as a triggered check, accelerating the entire ICE process. After completing the review of a candidate pair, the agent knows that it has found a connection pair for transmitting the media stream correctly. Because the checks are carried out according to the priorities of the candidate pairs, the first functioning candidate pair represents the best possible connection between the two communication partners at the given time. The caller usually confirms the candidate pair found by this process to the other agent, concluding the selection process.
All previous processes (candidate collection and connection tests) take place before the phone rings at the called agent's end; consequently, the connection setup is minimally delayed by ICE. The advantage, however, is that ghost calls and misconnections (i.e., the phone rings, but the called party hears nothing) are eliminated.
If the ICE handshake reveals that the candidate pair differs from the default setting selected in the SDP message (IP and port addresses), the caller initiates an update of the default setting on the basis of a SIP re-INVITE message to synchronize all intermediate SIP elements that do not support ICE but need to know through which addresses the media streams are running.
Conclusions
Correct mapping of the internal IP addresses to external IP addresses is essential to enabling unhindered VoIP communication through NAT gateways and firewalls. STUN, TURN, and ICE not only ensure a transparent transition via NAT gateways but also improve the security of the SIP environment as a whole.