There are several major important networking protocols that have made the internet possible. Learning about them could be overwhelming as a beginner to computer networks. However, knowing these will help you understand how the internet works. Here are the major protocols which I think are the most interesting ones.
HTTP
HTTP is the very first in the list and the easiest to understand. If you have used the internet, you have used HTTP. Any site you visit works on the basic principles of HTTP. This protocol describes how the browser and a server should transfer files between them. It is stateless i.e The browser fires a request, the server responds and that’s it. The server then never remembers the client again. The most common resources a browser requests are the following: HTML, CSS, JS, Images, Fonts, etc.
HTTP was designed and developed by the genius dude Tim Berners-Lee in 1989 at CERN. Interestingly, he is also the one, who developed the first browser and also the WWW protocol. HTTP/1 was released in 1996 followed by HTTP/1.1 in 1997 and then in 2015, HTTP/2 was released. The latest successor HTTP/3 was released in 2022.
HTTP uses headers which are key value pairs to provide more information about the request and the response. Like this for example:
Access-Control-Allow-Origin: *
Age: 2318192
Cache-Control: public, max-age=315360000
Connection: keep-alive
Date: Mon, 18 Jul 2020 16:06:00 GMT
Server: Apache
A typical HTTP connection relies on the presence of a reliable transport protocol like the TCP protocol. Making an HTTP call is very simple in most programming languages. Here is an example for Javascript:
fetch('http://example.com/movies.json')
.then((response) => response.json())
.then((data) => console.log(data))
There are also so many other ways to make HTTP calls including CLI tools like cURL and applications like Postman, etc.
TCP/IP
TCP/IP stands for Transmission Control Protocol and Internet Protocol. TCP/IP is also commonly called as Internet Protocol suite. These are the set of rules that govern the internet. Understanding this suite means you understand the majority of the concepts. These are the layer 4 and layer 3 protocols.
The internet operates on the basis of abstraction. A HTTP request does not care about how it reaches the server. A TCP protocol does not care about which computer it talks to. The IP protocol does not care about how the data is sent over the wire. Each of the layer receives a packet, adds additional info to that packet and then sends it. Moreover, these are just software operations written in the Operating system rather than a physical individual layer.
In this regard, TCP/IP defines how a packet is routed, addressed and received. Here is a summary of the operations:
- Once a packet comes from an upper layer using protocols like HTTP, the layer 4 protocol TCP adds information like source port, destination port, sequence number, etc.
- It is then sent to layer 3 which is the IP protocol. It then adds more information to the packet like Source IP address, destination IP address, etc.
- It is then sent to successive layers for transmission over the wire.
But how is the port and address is determined?
That is the interesting part. Since there are two parties involved in a single request ( client and server ) we need two sets of ports and IP addresses.
Determining the port for a server is easy. It typically depends on the layer 7 protocol. For example, a HTTP server is typically operated in port 80 whereas HTTPS server in port 443. SSH in port 22, etc. These are predetermined and everyone follows these ports.
However, a client is not a server and does not need to operate in a standard port. Therefore for each request, a random port is chosen called the ephemeral port.
But what is a port in the first place?
A port is a software construct managed by the Operating System. It is used to identify network processes running in a computer. It is a 16 bit integer ranging from 0 to 65535. If you imagine the computer to be a house, ports are the electric sockets through which data flows.
Now coming to IP addresses
IP address is the network address of a computer or a device. It is typically assigned by the ISP and it changes every time a device reconnects to the network. There are 2 types of IP addresses namely IPv4 and IPv6. IPv4 is 32 bit address whereas IPv6 is 128 bit address.
Thus every server like Google or Apple will have an unique IP address. The data packet is sent to this IP address from the client via the router. The router uses several algorithms to find the shortest path to reach the destination similar to how Google maps find the shortest distance to reach the destination. I will be writing those in detail in future articles.
But if the address of a server is a 32 bit number, how do I reach Google if I type google.com? Good question, that is where DNS comes in.
DNS
A DNS is a simple protocol that keeps a key-value pair of human readable names and binary IP addresses. Thus when you type google.com, the request is sent to a known DNS server and it returns the IP address of the server. The actual request to Google is then placed. The DNS also helps the server in updating its IP address without affecting the actual domain name.
DNS is known as Domain Name System. It is one of the oldest protocols in existence. The DNS is hugely distributed and could also be described as the phone book of the internet. The following diagram shows the hierarchical nature of the DNS.
But if every computer and device connected to the internet has an IP address, one can imagine that the number of available IP addresses is going to be exhausted quickly. Thus few other concepts like CIDR and NAT were developed. Let’s look at those in a future article.
ARP
Ok. I obtained the IP address from the DNS server. I know the port. I built the IP packet. But how do I transfer the data over the wire to the server?
This process is governed using some other lower layer protocols mainly the ARP (Address Resolution Protocol) and RARP (Reverse Address Resolution Protocol). To understand both, first we must understand what a MAC address is.
A MAC(Media Access Control) address is the hardware address of a network device (NIC). A MAC address is used by major network devices including Ethernet, Wi-Fi and Bluetooth. A MAC address is 48 bits in length and is usually fixed. It is also called as the burned in address because it never changes.
Now, back to ARP. It is a simple protocol that maps IP address to MAC address. This mapping is managed by the router and added every time a device joins the network. A router acts like the gateway to a network like a home network for example. Multiple devices can be connected to that router. Whenever the router receives a data packet, it must know to which device it must forward the packet to.
Thus it looks at the ARP table. Gets the MAC address for the equivalent IP address and then sends the packet to that device.
The following is an example mapping table:
But how is the data transmitted over the wire?
There is an entire field of analog and digital communication and contributions from genius people through out history that has made this possible.
Thanks for reading :)