Overview of HTTP

HTTP is the protocol of the web. It is the way in which web servers and web clients (usually browsers) communicate with each other. HTTP stands hypertext transfer protocol. 'Hypertext transfer' means that hypertext (html) files are transferred from web servers to their clients. Nowadays, HTTP allows for the transfer of many other types of files and data, but it started out as a way to browse the web by transferring HTML documents. The files and data that are transferred with HTTP are referred to as resources.

But what is a protocol? I like to use the analogy of the US Mail protocol. In order to successfully send a letter to someone, you must follow a protocol. You must put the message in an envelope, and the envelope must include the proper information, in the proper format. First comes the recipient's name, then the address (which must be properly formated). The return address is put on the upper left corner of the envelope. If you don't follow this protocol, your message may not reach the recipient. In this case, hopefully the return address is properly formatted on the envelope, and the letter comes back to you so that you can fix the errors and resend it.

Just like the rules of sending messages via the US Postal Service, the rules of HTTP specify how to format the messages sent between clients and servers.

In order to request a web page from a web server, you must have the domain name of the web server AND the path to the resource within the web server. The complete address for a resource is called a URL (Universal Resource Locator). Here's an example:

An example of a URL and it's parts

There are three main parts to a URL:

  1. protocol - for example: http://, https://, or ftp://
  2. domain name (aka 'host name', 'server name', or 'domain name', could also be an IP address)
  3. path - the path to the resource within the web server

Requests and Responses

When you type a URL into the address bar of your browser and press enter, the browser creates and sends an HTTP request. The request, like the US mail protocol, must follow a specific format. Here's what a typical request looks like:

GET /index.html HTTP/1.1
Host: www.acme.com
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive

The first line of the request has three pieces of information. The first piece is the type of request, in this case GET (more about request types soon). The second piece is the path portion of the URL, and the last part specifies the version of HTTP that the request is using (in this case, version 1.1).

The second line of the request specifies the domain name portion of the URL (recall that the domain name also known as the 'host' name). Notice the format of this line. It starts with Host, followed by a colon, and then the domain name. Lines that follow this format are known as headers, which are key/value pairs. In this case the header name (key) is 'Host' and the header value is 'www.acme.com'. Headers can be used to send additional information about the request to the web server. For example, the third line is the User-Agent header, and it tells the web server information about the browser that is sending the request. In this case the 'user agent' is Mozilla Firefox running on Windows. The fourth line of the request tells the web server that the browser is configured to use the English language. We won't get into the last two headers in this request, but note that there are many request headers that can be sent to a web server. Next, when we discuss the response that comes back from the web server, we'll see that it can also contain headers.

When a web server receives a request from a client, it will look for the resource that matches the path in the request, and then it will send that resource (which is commonly a file, such as an .html or .png file) to the client.

Here's a simple HTTP response that a web server may send to a browser:

HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
Content-Length: 88
Content-Type: text/html
Connection: Closed

<html>
   <body>
      <h1>Hello, World!</h1>
   </body>
<html>

The first line of the response shows the version of HTTP being used, and then a status code. In this case 200 means that the web server was able to successfully respond to the request. 'OK' is the status message that goes with a code of 200. If the request had contained a path that did not exist on the web server, then the status code in the response would be 404, and the status message would be something like 'page not found'. Here's a link to all the HTTP status codes.

The next six lines in the response are headers that offer information about the web server and the resource that is being returned to the client. We saw earlier that a request can include headers that provide additional information about the request. Headers can also be sent in the response from the server (there are 6 of them in the above sample response). The Content-Length header indicates that the resource being sent back to the client is 88 kilobytes, and the Content-Type header indicates that the resource is an html file.

After the headers there is a blank line. The blank line separates the response headers from the body of the response. The body of the response is the resource that was requested, in this case it's the contents of an HTML file. The browser will read through the response body and display/render the HTML (and create the DOM from it). Remember that a 'resource' is often a file (such as an .html, .pdf, or .jpg file), but it doesn't have to be a file. It can be something like the data returned by running a query from a database. We'll get into this more when we build a web service later in the program.

It's important to note that many requests are often required in order to display a single web page. For example, the browser requests an HTML file, and the server responds by sending the file. The browser then begins to parse through the HTML code from top to bottom. As it goes through each of the tags in the HTML, it may encounter tags that require requests for additional resources. For example, an img tag will require the browser to make a request for the image from the web server. There are other types of additional resources that may be required in order to display a web page in the browser, such as video files, audio files, JavaScript files, and CSS files. It's not uncommon for a browser to make dozens, even hundreds of requests just to display a single web page. To see this in action, open the developer tools in your browser and observe the activity in the Network tab. You'll likely see all sorts of requests happening behind the scenes when you visit a website.

Request Methods

There are different types of requests, which are known as request methods. The most common request method is GET, which a client (such as a browser) will use to fetch a resource from a web server. But a client may also want to send data to a server, edit data on the server, or even delete data from a server. There is a request method for each of these actions:

There are a few other types of requests, but these four are the most common. The best example of a POST request is when a web page has an HTML form on it. When a user enters data in the form and 'submits' it, the browser makes a POST request and sends the user input to the web server. Here's an example of a POST request:

POST /hobby-survey-form HTTP/1.1
Host: www.acme.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 42

firstName=Bob&LastName=Smith&Hobby=Reading

The first line of the request includes the following:

  1. The request method (POST)
  2. The path to the page or route on the server that will process the data (this is specified in the ACTION attribute of the HTML form)
  3. The version of HTTP being used to make the request.

The next three lines are headers, and the body of the request (which contains the form data that was entered by the user) comes after the blank line.

Take a look at the Content-Type header, this header specifies the format of the data being sent. When HTML forms are submitted, the browser encodes the data in a format known as application/x-www-form-urlencoded. To see what this
format looks like, have a look at the body of the request. This format requires data to be encoded in key/value pairs, with each pair separated by an ampersand. The key and it's corresponding value are separated by an equals sign. This is the same encoding used to add query string parameters to URLs, which we'll discuss in a minute.

When this request reaches a web server that is built with the Express framework, the body of the request (a string) will be parsed into the req.body object (but remember that you must install the body-parser package in order for this to work).

If the server is using PHP, the string will be parsed into the $_POST super global array.

Then the web server will put together it's response, which might look like this:

HTTP/1.1 200 OK
Content-Length: 88
Content-Type: text/html
Connection: Closed

<html>
<body>
<h1>Thank you for your data!</h1>
</body>
</html>

Query Strings (aka URL Parameters)

We previously discussed how additional information can be sent with a request by using HTTP headers, but you can also send additional information by appending data to a URL in the request. Take a look at the following URL:

www.acme.com/hobby-survey-form?firstName=Bob&LastName=Smith&Hobby=Reading 

Notice the question mark, and what comes after it. In a URL, everything after the question mark is the query string (not to be confused with a SQL query). It's also referred to as the URL parameters. It looks a lot like the body of the previous POST request, because it uses the same type of encoding (application/x-www-form-urlencoded) POST requests send data to a server by putting it in the body of the request. But you can also send data in a GET request by adding a query string to the URL.

Take a look at this sample HTTP request:

GET /some-page.html?firstName=Bob&lastName=Smith HTTP/1.1
Host: www.acme.com
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive

Notice the query string in the URL on the first line of the request.

In Express, the query string will be parsed into the req.query object.

In PHP the query string will be parsed into the $_GET super global array.

One of the interesting things about a URL that includes a query string is that you can bookmark it, and the additional information will be included in the book mark because it's embedded in the URL. Notice what happens to the URL when you run a Google search and view the results. When the results page appears, you’ll see that your search terms are added to the URL as a query string. This allows you to bookmark, or copy the URL. Then you can email it to someone and when they click on the link, they will see the same results. This is also commonly done on e-commerce sites, such as Amazon. URL parameters in a Google search

HTTP is 'Stateless'

If you create a website that uses JavaScript to set some variables on one page, what happens to those variables when the user clicks on a link and navigates to another page on the site? Unfortunately, all those variables are lost, and they would need to somehow be recreated when the new page loads. The values of all the variables on a page is known as the 'state'.

Web pages, by default, do not maintain JavaScript state as you navigate from one page to another within a website (in other words, all the variables are lost when you navigate to another page). So HTTP is considered to be 'stateless' because it does not include a way to send the state of a page in a request to a server.
But we can use cookies to allow data to persist from one page request to another when navigating through a web site.

Cookies

In the 1990s, cookies were added to HTTP to help overcome its stateless nature. If we wanted to keep track of a variable (let's say we call it 'userName'), as a visitor moves from one web page to another on a website, we could configure the server to set a cookie.

The server sets a cookie by adding a 'Set-Cookie' header in the response. When the browser receive the response, it will automatically save the value of the 'Set-Cookie' header in a file (the file is known as a cookie). And it will send a 'Cookie' header (along with the data) in every subsequent request to the server.

Here's what the first few lines of a server response might look like when it sets a cookie:

HTTP/1.1 200 OK
Set-Cookie: userName="Bob";

The browser would then store the information (userName=Bob), and send it with every subsequent request to the same web server. So the next request by the browser might look like something like this:

GET /some-page.html HTTP/1.1
Cookie: userName="Bob";

Note that the browser has added a Cookie header to the request. The server receives the request and adds UserName=Bob to the Set-Cookie header when it responds.

Either the client/browser or the server could change the userName variable by altering the Cookie or Set-Cookie header.

And so cookies allow data to persist as a visitor navigates from one page to another on a website.

There is a lot more to cookies than what we discussed here. For example, servers can set an expiration date for a cookie when it sends it to the browser. Then, when the expriation date is reached, the browser will stop sending the Cookie header when it makes requests, which would prevent the data from persisting as the user navigates.

Web Services (aka APIs)

Nowadays, many web servers do not generate HTTP responses that include HTML in the body. Instead they just return data (usually the data is encoded in the JSON format). These are known as web services, or APIs (we have used the word API many times before in my classes, but maybe not yet in this context - term confusion!). Web services have become an extremely powerful method for developing programs and integrated systems. Because they return just data (rather than data that is displayed within a bunch of HTML tags), the data can be 'consumed' by different client applications. You could build a mobile app that uses the data from a web service (a mobile app probably doesn't have much use for HTML). You could build a desktop app that consumes data from the same web service. You could also use AJAX to build a website that consumes web services. Web services offer a lot of flexibility, because you can build many variations of client applications that 'consume' the data they provide. Web services also offer many opportunities to integrate systems. Many businesses build web services to share data with other businesses, which allows for different systems to integrate with one another.

Paste this URL into your browser and have a look at the response you get:

https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=1min&apikey=demo

Your browser made a request to a web service (API) that returns raw data. In this case, it's a web service that offers financial data. You could modify the query string to instruct the web service to modify the data that it returns. HTTP requests sent to to web services are also known as 'API calls'. Just think of all the apps you could create by requesting data from a web service like this one!

Here's a link to more information about HTTP messages.

Here is a link to the API docs for this web service.

Here is a web service that serves up trivia questions (it has over 200,000 questions!).

Summary

In 2009 Google declared that 'the web has won', meaning the web has become the dominant platform for software development. The widespread usage of HTTP and the emergence of web services has created great demand for web developers. Every web developer should have solid understanding of HTTP and how it can be used to build web applications.