A protocol - in vague terms - is a set of behaviors that ought to be followed. Wikipedia has a disambiguation on this, but looking through them, I believe they all fit the description I just gave.
In software development a protocol is used to describe how some code needs to be implemented. This is important for the internet communication protocols because in order for everyone to communicate with one another, they need to be doing it the same way. With a protocol, one person can implement it in D another in C++, and another in ook, and - so long as they are all implemented properly and the protocol is defined properly - they should all be able to communicate with one another.
The internet has many protocols that do many things. A long time ago a couple of models of how the internet should work were formed. A four layer internet model (somtimes called the TCP/IP model… by at least me) and a 7 layer model also called the OSI model, named after the organization who put it together. The TCP/IP model describes some things well and other things not so well. Luckily the OSI model does those parts well, even if it doesn’t do all of it well either. Most particularly, the four layer model doesn’t describe the hardware levels very well, and (in my opinion) the OSI model doesn’t describe the application parts very well.
In this article, we will be focusing on the TCP/IP model and ignoring the OSI model.
At each of layer of the TCP/IP model (and the OSI model) there are protocols. At each layer there usually more thean a few protocols that operate at that layer, but in the middle is the protocol that glues our modern web together, the Internet Protocol (IP) and it sits there pretty much by itself these days.
A hopefully helpful note:
- The link layer communicates using frames.
- The internet communicates in packets.
- They are both considered datagrams.
These are different formats that hold data at a low level. Packets are carried by frames across links.
The protocols of the link layer describe communication over individual links. IE from one device to another. They do not travel the internet, they only go as far as from one device to the next. Some examples of this include the ethernet protocol and the wifi protocol.
But Seth! if wifi doesn’t bring my data to the internet, then how can I be reading your blog right now?
Okay. Bear with me. let’s use the analogy of a mail delivery system. Your internet data is the envelope or packet that needs to be delivered to a destination. A link layer frame is like either the airplane that goes from one side of the country to the other, or the mail delivery truck that goes from the post office to your house. Both of those are different modes of transportation, both of them carry the package, but neither of them make the entire trip. They instead hand the packet off from one point to another until it gets to it’s destination.
There are two versions this protocol (both in use right now) IPv4 and IPv6. Both have the same purpose, which is to connect all the computers together.
At this layer, every device connected to the internet is given a unique* address. Your IP address is formatted kinda like how your physical address is. There is part that identifes you specifically, and there’s a part that identifies which network you’re on (like comcast, or time warner). Like how you might be locally addressed as 123 kicken avenue, but globally you are addressed as 123 kicken avenue, cool city, antarctica.
*for IPv4, this is a lie in most cases. We ran out of IPv4 addresses a few years ago, but bought ourselves time to switch to IPv6 by doing some funky stuff. One such thing is called NAT.
To go along with the physical address analogy, the transport layer is like the name on the packet. When you see this used in URL’s the typical sign is a port number. Port numbers identify which application (in the application layer) the packet needs to go to. There are some ports that are reserved, known as well known ports. And there are some that are not reserved, called ephemeral ports which can be used for whatever you want.
There are two main transport layer protocols. TCP and UDP.
UDP is the simpler of the two. This is the most similar to the mail system analogy because it is only concerned with one way, source to destination. And that’s it. It’s not even concerned about your packet making it to it’s destination. If something goes wrong, you’ll never know about it. Which is actually fine for some use cases.
For the times where you need something more reliable, TCP is the more popular of the two. It might not seem like it, but the internet is a dangerous place for a packet. Cable’s are cut, powerstrips unplugged, servers overloaded, etc… There are a lot of ways a packet can get lost or damaged on it’s way from one machine to another. TCP is popular because it provides some guarantees regarding delivery. It doesn’t garuntee that your packet will be delivered, but it does guarantee that you’ll know if it did or not.
TCP and UDP data are carried in packets, which are carried in frames between links. Application layer protocols are built on top of TCP and UDP
This is a very broad category. There are many protocols that sit at this layer. There are some for email (such as IMAP, SMTP, and POP3). There are some for video games, like DOOM (which has the “well known” port 666 registered for it’s use). The application layer protocol that I’d like to go over at this time, however, is HTTP.
The protocol which your browser is most likely using at this time is HTTPS which is a secure version of HTTP. HTTPS is built on HTTP and has all the same traits, except it has security built around it.
HTTP is used for many things across the internet. It’s used for websites, mobile apps, api’s. Lotsa stuff. It is able to be used for so many things because of it’s message format.
HTTP messages have a request format and a response format.
In regards to the request format. According to wikipedia:
The request message consists of the following:
- a request line (e.g., GET /images/logo.png HTTP/1.1, which requests a resource called /images/logo.png from the server)
- request header fields (e.g., Accept-Language: en)
- an empty line
- an optional message body
The request line specifies the request method and the location of the resource you are requesting. There are 9 HTTP methods Connect, Delete, Get, Head, Options, Patch, Post, Put, and Trace. Each of them have their own uses.
In regards to the response format; it has a similar format to the request format, except you don’t need a request line, and it has an additional property called a status code which is a number used to communicate the status of the response. Put another way, it tells the client how the request went. In general, 200 level status codes mean all is good, 300 level status codes are related to end points moving around and redirecting requests, 400 level codes mean that the client did something wrong, 500 level codes mean the server did something wrong. I don’t know why 100 level responses exist.
The header field is a way to store additional information about your request. It is in a key value pair format such as Key: Value
like Content-Type: "application/json"
. There are some header field names that are standardized and should only be used for their designated purpose. You can also make up header fields for your application, willy nilly.
The optional message body can contain whatever you want. The way that you tell the server or the client what format it’s in is by the Content-Type header. The content-type header takes a MIME type id to denote’s it’s type. Two common ones are application/json
and text/html
, for JSON and HTML respectively.
When you loaded sethhenderson.net today, your browser formed a GET request and sent it to the hosting server. The server responded with a 200 status code, a Content-Type: "text/html"
and a message body containing the html for my web site. After loading the initial HTML it made some more requests to get the rest of the resources for the site, in a similar manner.
Well, if you hadn’t noticed, I think that http is the bees knees. I glossed over a lot of things in this over view, this could have easily been a book in length. In fact there are books about this already. In fact I know of one that you can get for free right now as an epub or pdf. This text book is quite long, but thorough. If networking is your jam, this is a good textbook to go over.
Thanks!