Thursday, November 18, 2004

Audio Conference

You would never think that voice conference over internet was so big a deal ! Till I saw so many questions on the newsgroups asking for source code and others scolding them for being beggars. Eric Raymond wouldn't be too pleased I guess :)

A little bit of theory is in order. Audio and video transmission takes place using a protocol called RTP (Real-time Transport Protocol). It usually runs over UDP but it is possible to run RTP over connection-oriented protocols also. You need another protocol to manage the semantics of the conference, for example call signalling and control. H.323 is one of the earlier suite of protocols for this purpose. SIP, the Session Initiation Protocol, is another suite for Internet conferencing and telephony. Thus, voice conference clients can communicate with each other using IP addresses, or using multicast addresses. But client IP addresses are not known behind NATs and many routers have multicasting disabled.

NAT (Network address translation) is the translation of private IP addresses by routers. A NAT translates a private address of an internal host like 192.168.1.30/4000 to the IP address of the router and some randomly chosen port before sending a packet to the outside world. When the response from the outside world comes back, the router uses the mapping to deliver the packet to the correct internal host.

Being a lazy programmer and because of my assumption that any self respecting programmer could do this project in a couple of days, I googled for audio conferences. Surprisingly, most of the ones that I found rely on static IP addresses or multicasting enabled routers between the conference clients. Here is the list of the audio conferences I found.

1. JusTalk2 - Implemented using server and client model. Source code unavailable.
2. ConferenceXP - Microsoft research project. Requires multicasting enabled. The best part is that it contains an implementation of RTP in .NET.
3. OpenH323 - An implementation of H323 server and client. Requires multicasting.
4. Jose Botella's code on javaranch - JMF implementation of peer communication. Requires static IPs.
5. JXCube - Implemented over JXTA p2p stack. Haven't got it running yet.
6. OpenMash - The popular vic and vat tools which require multicasting. The yoid project has a version that uses multicast simulation using a server, but I have not experimented with it.
7. SpeakFreely - Works for static IPs only.
8. Shtoom - Python implementation of VOIP and a good critique of the H323 protocol.
9. JRTPLib - C++ implementation of RTP protocol.
10. Freephone developed at INRIA - Uses unicast for one-to-one and multicast for conferences.
11. Web conferencing over HTTP by Pramod and Yayati - Uses HTTP files upload and download for communication between clients. Seems like a weird idea, but it works !
12. Voice Chat using client server on Codeproject - VC++ implementation using client server architecture. Messy code, but you can get it to work over internet.

I thought about the problem of communicating behind NAT and came up with the obvious solution - a server that acts as an introducer. All clients connect to the server, and reveal their translated IP. The server distributes these translated IPs to all the clients. Now the clients send packets to each other using these translated IPs and in the process make entries in their respective NAT tables. Initially some packets will be dropped, but finally NAT routers will forward packets from the other client.

There a complete site devoted to the problem of communicating over middleboxes. In the process, I read this excellent RFC which describes various types of NAT. There are some other specifications in progress to solve this problem. STUN (Simple Traversal of UDP through Network address translators) is a protocol devised to discover the type of NAT. IETF has also proposed a framework for middlebox communication.

However, to take no chances, I implemented the server as a dumb packet forwarder. The server knows the clients translated IP addresses. Every client communicates the names of clients to send the packets to, and the server forwards any packet received from that client to all the clients specified. Using this server, I was able to talk with my friend Aditya in Geneva, and also had a 3-way conference. Please note that this simple forwarder is useful for a conference with few people, but the bandwidth requirement increases by n*n. Hence some kind of mixing needs to be implemented in the next version of server. Do let me know if there already exists one :)