Labs 1: DNS, HTTP, netcat as a web browser

What Must a Web Browser Do to Obtain a Resource

URL structure [RFC3986, §3]

             host name  port
            /^^^^^^^^^\ /^^\
      foo://example.com:8042/over/there?name=ferret#nose
      \_/   \______________/\_________/ \_________/ \__/
    scheme     authority       path        query   fragment
            \_________________________/
                     hier-part        

Given a URL, a web browser does the following:

  1. Obtains the IP address for the host (server) name (example.com) in the authority.
  2. Opens a connection to the host on the port number (8042) specified in the authority.
  3. Using the protocol specified by the scheme, asks the host to provide the resource specified by the path and query.
  4. Locates the fragment in the resource obtained from the host.
  1. Is the fragment part of an URL processed on the client side or the server side?
  2. Explain the difference between URL, URI, and URN. (Hint: RFC3986)
  3. (1 point) Present in your browser examples of URIs with at least 5 different, commonly used schemes. Which of them are URLs and which URNs?

We will now closely examine how IP addresses are obtained from host names, and how the HTTP protocol works.

Finding Web Servers: Domain Name System

The main purpose of the Domain Name System (DNS, defined by many RFCs) is to translate hierarchical symbolic host names (e.g., courses.matfyz.sk) to IP addresses (e.g., 158.195.89.73) and vice versa.

DNS contains records of several types (SOA, NS, A, AAAA, CNAME, MX, PTR, …). We are now interested in A and NS records:

Record patternExample
name IN A IPv4_address courses.matfyz.sk. IN A 158.195.89.73
domain IN NS authoritative_DNS_server uniba.sk. IN NS dns1.uniba.sk.

Fully qualified (absolute) domain names (FQDNs) end with “.”. Domain names that are not “.”-terminated can be interpreted as relative to search domains (see below).

Examining DNS Records Using the Linux Command host

We will use the Linux command host to query the DNS:

$ host -v -t record_type domain_name DNS_server

Browser's DNS queries are answered by a DNS recursor server assigned to your computer by your internet service provider. You can obtain the IP address of your DNS recursor(s) as follows:

LinuxWindows (cmd.exe)
$ cat /etc/resolv.conf
search nw.fmph.uniba.sk
nameserver IP_of_DNS_recursor
C:\>ipconfig /all

DNS Servers . . . . . . . . . . . : IP_of_DNS_recursor

To obtain the IP address of courses.matfyz.sk (the A-type record), we will use the command:

$ host -v -t A courses.matfyz.sk IP_of_DNS_recursor

The important part of its output is:

;; ANSWER SECTION:
courses.matfyz.sk.      50016   IN      A       158.195.89.73

The number in the 2nd column (50016 in our example) is TTL (time to live) – the number of seconds the DNS recursor will keep the answer in its cache.

  1. Use the command host to find out the IP addresses of blog.matfyz.sk, mercury.rt.sk, google.com, www.sme.sk, sme.sk.
  2. How many domain names can be assigned to a single IP address?
    How many IP addresses can be assigned to a single domain name?
    Put in another way: Is the domain name–IP address relation 1:1, 1:n, m:1, or m:n?
  3. (1 point) Describe a practical situation in which it is important to know the TTL of your domain's DNS records.

DNS Recursion Step by Step

DNS is a hierarchical distributed database. A DNS recursor answers queries to its clients by querying a hierarchy of authoritative DNS servers.

An authoritative DNS server is responsible for the database of DNS records for a certain domain and its subdomains. For instance, the DNS server at IP 158.195.16.240 is authoritative for the domain fmph.uniba.sk. It can definitively answer all questions regarding the names in this domain (e.g., www.fmph.uniba.sk). For some subdomains, the authoritative server does not know the answer, but can point the recursor to authoritative servers for that subdomain (e.g., 158.195.16.240 cannot answer questions about the name www.dcs.fmph.uniba.sk, but will point the recursor to 158.195.18.163, the authoritative server for the domain dcs.fmph.uniba.sk).

Querying starts with the root DNS domain.”. Each recursor has a hints file with the list of IP addresses of authoritative servers for the root domain. You can also obtain a list of root DNS servers from your recursor using the command:

$ host -v -t NS . IP_of_DNS_recursor

The output has two important parts: The ANSWER SECTION contains NS records with domain names of authoritative DNS servers for the domain, e.g.:

.			177422	IN	NS	j.root-servers.net.

The ADDITIONAL SECTION contains A records with IP addresses of authoritative DNS servers, e.g.:

j.root-servers.net.	350291	IN	A	192.58.128.30

We can now start asking the authority servers for the IP address (the A record) of fmph.uniba.sk:

  1. $ host -v -t A fmph.uniba.sk 192.58.128.30 # j.root-servers.net.

    The output has no ANSWER SECTION because the root server cannot answer the question directly. It refers us to servers that know more about fmph.uniba.sk – the authoritative servers for the .sk domain. They appear as NS records in the AUTHORITY SECTION of the output:

    sk.			172800	IN	NS	ns.sk-nic.sk.
    sk.			172800	IN	NS	ns.uu.net.
    

    The ADDITIONAL SECTION contains their IP addresses:

    ns.uu.net.		172800	IN	A	137.39.1.3
    ns.sk-nic.sk.		172800	IN	A	195.12.159.2
    
  2. We try again, this time with one of the authoritative servers for the domain sk.:

    $ host -v -t A fmph.uniba.sk 195.12.159.2 # ns.sk-nic.sk.

    Still no direct answer, but we obtain authority servers for the uniba.sk. domain:

    ;; AUTHORITY SECTION:
    uniba.sk.		86400	IN	NS	dns1.uniba.sk.
    
    ;; ADDITIONAL SECTION:
    dns1.uniba.sk.		86400	IN	A	158.195.4.3
    
  3. We try again with one of the authoritative servers for the domain uniba.sk.:

    $ host -v -t A fmph.uniba.sk 158.195.4.3 # dns1.uniba.sk.

    and we finally get our answer:

    ;; ANSWER SECTION:
    fmph.uniba.sk.		86400	IN	A	158.195.4.134

Perform step-by-step recursive DNS querying to obtain the IP address of agents.fel.cvut.cz.

Talking to Web Servers: Hypertext Transfer Protocol

After the web browser obtains the IP address for the host name in the authority part of an URL (recall the URL structure), it opens a connection to the port at the IP address, and ask the server to provide the resource specified by the URL path.

The communication between the client (browser) and the server must follow the protocol given by URL scheme. Web resources are obtained using the Hypertext Transfer Protocol (HTTP) [RFC2616].

Using Netcat (nc) as a Web Client

We will use the Unix tool nc (netcat) to connect to a host with an HTTP server. We want to obtain the resource at the URL http://webdesign.courses.matfyz.sk/demo/static.xhtml.

  1. We open a connection to the HTTP server at webdesign.courses.matfyz.sk, port 80 (the default HTTP port) by typing:

    $ nc webdesign.courses.matfyz.sk 80

    nc will resolve the domain name to an IP address, and establish a connection.

  2. HTTP client and server exchange messages. The client sends requests to the server, and the server replies with responses. For each request, there is one response. We will type in (or paste) the requests and view the responses in a terminal.

    To obtain our resource, we send the following request:

    GET /demo/static.xhtml HTTP/1.0    The initial line: METHOD /path?query HTTP/version
                                        marks the end of the request header

    The client requests access to a resource through a method. The most-used method is GET, through which the client asks the server to send the content of the resource.

  3. The server sends a response, e.g.:

    HTTP/1.1 200 OK                          Status line: HTTP/version status-code status-message
    Date: Tue, 24 Sep 2013 11:36:49 GMT      Header fields: metadata, Name: value
    
    Content-Type: application/xhtml+xml
                                              separates the response header and body
    <?xml version="1.0" encoding="utf-8"?>   Body: the content of the requested resource
    <!DOCTYPE html                                 or an informational message to the user
         PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk">
    

    The status line describes whether the request was processed correctly or there was an error through a numerical status code and a brief status message.

    The header fields specify metadata – information about the requested resource or about the HTTP server.

  1. As shown above, use nc now to request the resource at http://www.fmph.uniba.sk/index.php?id=326.
  2. Study the header fields in the server's response. What do they mean? Consult the RFC2616, section 14 if unsure.

Virtual Hosts and the Host Header Field

An HTTP server at a single IP address can serve several web sites with different domain names. Such web sites are called virtual hosts. However,

  • only IP addresses are used to create a client-server connection, not domain names, and
  • the initial line of a HTTP request contains only the path and query parts of a URL.

These is a workaround specified in the HTTP protocol: The client sends a request with the header field Host.

For example, the resource at http://blog.matfyz.sk/ is requested from the server at IP 158.195.89.73 as follows:

$ nc 158.195.89.73 80
GET / HTTP/1.1            The initial line: METHOD /path?query HTTP/version
Host: blog.matfyz.sk      The Host header field: domain-name:port, port is optional
                          

The resource at http://webdesign.courses.matfyz.sk/ at the same server (IP 158.195.89.73) is requested as follows:

$ nc 158.195.89.73 80
GET / HTTP/1.1
Host: webdesign.courses.matfyz.sk

The Host header field is mandatory in HTTP/1.1 (was not in HTTP/1.0).

  1. Compare the responses of the HTTP server at IP 158.195.89.73 for host names matfyz.sk, blog.matfyz.sk, and mercury.rt.sk.
  2. Try sending an HTTP/1.1 request without a Host header field.

The HEAD Method

If a client requests a resource with the HEAD method, the server responds with only the metadata (header) of the resource, but not its content (body).

  1. Repeat the previous exercise with the HEAD method instead of GET.
  2. How is the HEAD method used to reduce the traffic between the client and the server?

The Statelessness of HTTP and Cookies

Notice the Set-Cookie header field in server responses from the previous exercise.

HTTP is specified as a stateless protocol: The client and the server do not maintain a persistent connection, each request is processed independently of the previous requests from the same client (or others).

Many web applications need to maintain a session with a client – a longer term dialogue with an evolving state.

The server sends a cookie to the client via the Set-Cookie header field in the response. The client is then expected to send the cookie (until it expires) back to the server with each subsequent request in the Cookie header field.

  1. Explain how cookies are used to create and maintain a session.
  2. Use nc to obtain a cookie from the resource at http://webdesign.courses.matfyz.sk/demo/cookie.php, and then send it back to the same resource.
  3. What are third-party cookies and how are they used?

Cookies, alternative approaches to sessions, privacy, tracking, authentication without cookies are all good blogging topics.

Sending HTML Forms via HTTP. The POST Method

An HTML form can be sent to a server using the method GET in the query part of the URL.

$ nc www.google.com 80
GET /search?q=modern+approaches+to+web+design HTTP/1.1
Host: www.google.com

Forms can also be sent using the method POST. However, not all forms should be sent by POST.

$ nc www.google.com 80
POST /search HTTP/1.1
Host: www.google.com
Content-Length: 33
Content-Type: application/x-www-form-urlencoded

q=modern+approaches+to+web+design
  1. Try the above two requests.
  2. Why does the second request fail? What is the (conceptual) difference between GET and POST? In which cases should you use GET to send a form? In which cases POST is the appropriate method? (Hint: [RFC2616, sections 9.3 and 9.5])
  3. The page at http://webdesign.courses.matfyz.sk/demo/post.php expects a POST request with the parameters user and password but lacks a login form.
    Create the POST request by hand and send it via nc (any username works and the password is heslo).
  4. (1 point) Create a request that correctly submits the username 50+&%50 (without the quotes). All username characters must show in the output.

Proxies. The TRACE and CONNECT Methods

HTTP proxies are intermediaries between clients and servers. They forward requests from clients to servers, and relay back the responses. They can provide caching, anonymization, malware-protection, firewalling services to clients, and caching, firewalling, load-balancing services to servers.

Requests send to a client-side proxy contain full URLs:

$ nc proxy.uniba.sk 3128
HEAD http://webdesign.courses.matfyz.sk HTTP/1.1

The TRACE method asks the server to echo back the request send by the client. It can be used for debugging, or to find out the information send by a proxy to the server.

  1. Compare the results of the direct and proxied TRACE requests:
    $ nc dai.fmph.uniba.sk 80
    TRACE / HTTP/1.1
    Host: dai.fmph.uniba.sk
    
    $ nc proxy.uniba.sk 3128
    TRACE http://dai.fmph.uniba.sk/ HTTP/1.1
    
  2. (1 point) Find the correct server response headers that will ensure that your page will never get cached anywhere (on the proxy, in the browser…) even if the cache understands only HTTP/1.0.
  3. How can the CONNECT method be used? Hint: RFC2616.

Proxies and caching are good blogging topics.

REST. The Methods PUT and DELETE

The method PUT is intended to upload a resource to a server, and DELETE to delete a resource from the server. They are used in so called RESTful web applications to provide a consistent, predictable API to query and manipulate database objects (along with GET for querying and POST for updates).

The WebDAV protocol extends HTTP with further update methods and versioning of resources. It is used, e.g., by the Subversion source management system, or calendaring applications.

PUT and DELETE methods, REST, WebDAV, CalDAV are good blogging topics.

OPTIONS

See RFC2616, §9.2.