There are two main elements involved in a VoIP call:
- Analog-to-Digital (and vice-versa) conversion of voice (media)
- Call setup (signaling)
These are the same things involved in a traditional PSTN-based call - but the VoIP implementations are very different. In a traditional call, there's no voice conversion involved for the media (at least not the subscriber end) and most of the heavy lifting is in the call setup.
In a VoIP call, the media portion involves not only the actual A-D and D-A conversion, but compression (to minimize the bandwidth required) and packetization (to get the voice data into IP form and onto the network). These elements are handled via a wide (and growing wider) range of stand-alone devices and combinations of software and hardware that I'll get back to later. These devices are known by different names, but I'll use the term VoIP phone when referring to them.
In most VoIP phones, the RTP (Real-time Transport Protocol) protocol is used to get the digitized voice packets into IP form for transmission over the network. Actually, RTP packets are enclosed in UDP packets which are sent and received via a range of ports. UDP is used because it has less overhead than TCP/IP and is more suited to the "real-time" delivery requirements of voice (vs. plain data).
If you want to get really picky about the whole thing, what actually happens is that two ports are used in each call direction. One is the RTP "media", i.e. voice stream, the other an RTPC (Real-time Control Protocol) stream for Quality of Service (QoS) and media control.
No matter what you use for a VoIP phone, it will support some number of codecs (coder / decoder). VoIP codecs perform the same function as those used in DVD and digital music players - analog to digital and digital to analog conversion of the source material - but are optimized for the smaller bandwidth required for voice. As in video and music applications, various voice codecs have different advantages and disadvantages.
The most important codec-related criteria that you'll need to worry about is whether your VoIP phone supports the codecs required by your VoIP service provider or used by the party that you're calling. In most cases, VoIP phones support multiple codecs and automatically negotiate the best one to use, much like a dial-up connection figures out the best connection rate (and modem standard) to use.
Call setup is also different for VoIP, with the specifics depending on the VoIP protocol being used. For the home and small office VoIP applications that I'll be focusing on, the two most often encountered protocols are H.323 and SIP (Session Initiation Protocol). H.323 is the older protocol and a standard from the telecom world maintained by the International Telecommunications Union (ITU). (If you've used Netmeeting, you've used H.323.) SIP is the up-and-coming newer kid on the standards block, born out of the Internet world and maintained by the Internet Engineering Task Force (IETF).
It's not my goal to get into a religious battle of which standard is better (although you might want to read this Winnetmag article that describes what it says is Microsoft's reasons for switching from using H.323 in NetMeeting to SIP in Windows Messenger). But it seems that mass-market VoIP products are opting for SIP compatibility over H.323, although some products support both.
One negative that SIP and H.323 share is that they have problems working through the NAT-based routers that are used to share your Internet connection. The root of the problem is the use of random multiple ports to carry the voice part of a VoIP connection. There's a lot of work going on to eliminate, or at least reduce, this problem. But for now, you'll need to pay attention to your VoIP phone or adapter's installation instructions if you're using a router.
As with VoIP phones, there is a wide range of hardware and software products for taking care of VoIP signaling duties. Although SIP phones ("end points" or "User Agents" in SIP jargon) can work on a peer-to-peer basis, most practical use will involve proxy and registrar network elements. These usually are combined into one piece of equipment commonly referred to as a SIP server, which can be a dedicated hardware device or computer running SIP server applications.
The H.323 world has its own terminology, of course, with user devices referred to as terminals (or endpoints) and the other essential piece called a gatekeeper. A gatekeeper is a logical entity and provides call control services for the terminals including address resolution, authorization, and authentication services, and call logging.
Optional H.323 elements are gateways and multipoint control units (MCU). As you can infer from the name, gateways translate call signaling and media transmission when terminals need to reach each other via other network types (Internet, PSTN, etc.) or segments of the same network. MCUs are perhaps the most specialized H.323 element and are required only when three or more H.323 terminals need to connect for a multipoint conference.
Before we move on, here are two other terms you're sure to encounter (actually the definitions come from here) in your VoIP product explorations:
FXO (Foreign Exchange Office)
An FXO interface connects to the Public Switched Telephone Network (PSTN) central office and is the interface offered on a standard telephone.
FXS (Foreign Exchange Station)
An FXS interface connects directly to a standard telephone and supplies ring, voltage, and dial tone.
Physically, both these interfaces usually appear as standard RJ11 connectors that are appropriately marked Line (for FXO) and Phone (for FXS) or something similar.