Labs 1: DNS, HTTP, netcat as a web browser
What Must a Web Browser Do to Obtain a Resource
URL structure [RFC3986, §3]
host name port /^^^^^^^^^\ /^^\ foo://example.com:8042/over/there?name=ferret#nose \_/ \______________/\_________/ \_________/ \__/ scheme authority path query fragment \_________________________/ hier-part
Given a URL, a web browser does the following:
- Obtains the IP address for the host (server) name
example.com) in the authority.
- Opens a connection to the host on the port number
8042) specified in the authority.
- Using the protocol specified by the scheme, asks the host to provide the resource specified by the path and query.
- Locates the fragment in the resource obtained from the host.
- Is the fragment part of an URL processed on the client side or the server side?
- Explain the difference between URL, URI, and URN. (Hint: RFC3986)
- (1 point) Present in your browser examples of URIs with at least 5 different, commonly used schemes. Which of them are URLs and which URNs?
We will now closely examine how IP addresses are obtained from host names, and how the HTTP protocol works.
Finding Web Servers: Domain Name System
The main purpose of the Domain Name System (DNS, defined by many RFCs)
is to translate hierarchical symbolic host names
to IP addresses (e.g.,
and vice versa.
DNS contains records of several types
We are now interested in
Fully qualified (absolute) domain names
(FQDNs) end with “
Domain names that are not “
can be interpreted as relative to search domains
Examining DNS Records Using the Linux Command
We will use the Linux command
host to query the DNS:
$ host -v -t record_type domain_name DNS_server
$ cat /etc/resolv.conf search nw.fmph.uniba.sk nameserver IP_of_DNS_recursor
C:\>ipconfig /all DNS Servers . . . . . . . . . . . : IP_of_DNS_recursor
To obtain the IP address of
we will use the command:
$ host -v -t A courses.matfyz.sk IP_of_DNS_recursor
The important part of its output is:
;; ANSWER SECTION: courses.matfyz.sk. 50016 IN A 126.96.36.199
The number in the 2nd column (
is TTL (time to live) – the number of seconds the DNS recursor
will keep the answer in its cache.
- Use the command
hostto find out the IP addresses of
- How many domain names can be assigned to a single IP address?
How many IP addresses can be assigned to a single domain name?
Put in another way: Is the domain name–IP address relation 1:1, 1:n, m:1, or m:n?
- (1 point) Describe a practical situation in which it is important to know the TTL of your domain's DNS records.
DNS Recursion Step by Step
DNS is a hierarchical distributed database. A DNS recursor answers queries to its clients by querying a hierarchy of authoritative DNS servers.
An authoritative DNS server is responsible
for the database of DNS records for a certain domain
and its subdomains.
For instance, the DNS server at IP
is authoritative for the domain
It can definitively answer all questions regarding the names in this
For some subdomains, the authoritative server
does not know the answer,
but can point the recursor to authoritative servers
for that subdomain (e.g.,
cannot answer questions about the name
but will point the recursor to
the authoritative server
for the domain
Querying starts with the root DNS domain “
Each recursor has a hints file
with the list of IP addresses of authoritative servers for the
root domain. You can also obtain a list of root DNS servers
from your recursor using the command:
$ host -v -t NS . IP_of_DNS_recursor
The output has two important parts:
ANSWER SECTION contains
with domain names of authoritative DNS servers for the domain,
. 177422 IN NS j.root-servers.net.
ADDITIONAL SECTION contains
records with IP addresses of authoritative DNS servers, e.g.:
j.root-servers.net. 350291 IN A 188.8.131.52
We can now start asking the authority servers for the IP
$ host -v -t A fmph.uniba.sk 184.108.40.206 # j.root-servers.net.
The output has no
ANSWER SECTIONbecause the root server cannot answer the question directly. It refers us to servers that know more about
fmph.uniba.sk– the authoritative servers for the
.skdomain. They appear as
NSrecords in the
AUTHORITY SECTIONof the output:
sk. 172800 IN NS ns.sk-nic.sk. sk. 172800 IN NS ns.uu.net.
ADDITIONAL SECTIONcontains their IP addresses:
ns.uu.net. 172800 IN A 220.127.116.11 ns.sk-nic.sk. 172800 IN A 18.104.22.168
We try again, this time with one of the authoritative servers for the domain
$ host -v -t A fmph.uniba.sk 22.214.171.124 # ns.sk-nic.sk.
Still no direct answer, but we obtain authority servers for the
;; AUTHORITY SECTION: uniba.sk. 86400 IN NS dns1.uniba.sk. ;; ADDITIONAL SECTION: dns1.uniba.sk. 86400 IN A 126.96.36.199
We try again with one of the authoritative servers for the domain
$ host -v -t A fmph.uniba.sk 188.8.131.52 # dns1.uniba.sk.
and we finally get our answer:
;; ANSWER SECTION: fmph.uniba.sk. 86400 IN A 184.108.40.206
Perform step-by-step recursive DNS querying
to obtain the IP address of
Talking to Web Servers: Hypertext Transfer Protocol
After the web browser obtains the IP address for the host name in the authority part of an URL (recall the URL structure), it opens a connection to the port at the IP address, and ask the server to provide the resource specified by the URL path.
The communication between the client (browser) and the server must follow the protocol given by URL scheme. Web resources are obtained using the Hypertext Transfer Protocol (HTTP) [RFC2616].
Using Netcat (
nc) as a Web Client
We will use the Unix tool
nc (netcat) to connect
to a host with an HTTP server.
We want to obtain the resource at the URL
We open a connection to the HTTP server at
webdesign.courses.matfyz.sk, port 80 (the default HTTP port) by typing:
$ nc webdesign.courses.matfyz.sk 80
ncwill resolve the domain name to an IP address, and establish a connection.
HTTP client and server exchange messages. The client sends requests to the server, and the server replies with responses. For each request, there is one response. We will type in (or paste) the requests and view the responses in a terminal.
To obtain our resource, we send the following request:
GET /demo/static.xhtml HTTP/1.0 The initial line: METHOD /path?query HTTP/version marks the end of the request header
The client requests access to a resource through a method. The most-used method is GET, through which the client asks the server to send the content of the resource.
The server sends a response, e.g.:
HTTP/1.1 200 OK Status line: HTTP/version status-code status-message Date: Tue, 24 Sep 2013 11:36:49 GMT Header fields: metadata, Name: value Content-Type: application/xhtml+xml separates the response header and body <?xml version="1.0" encoding="utf-8"?> Body: the content of the requested resource <!DOCTYPE html or an informational message to the user PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk">
The status line describes whether the request was processed correctly or there was an error through a numerical status code and a brief status message.
The header fields specify metadata – information about the requested resource or about the HTTP server.
As shown above, use
ncnow to request the resource at
- Study the header fields in the server's response. What do they mean? Consult the RFC2616, section 14 if unsure.
Virtual Hosts and the
Host Header Field
An HTTP server at a single IP address can serve several web sites with different domain names. Such web sites are called virtual hosts. However,
- only IP addresses are used to create a client-server connection, not domain names, and
- the initial line of a HTTP request contains only the path and query parts of a URL.
These is a workaround specified in the HTTP protocol:
The client sends a request with the header field
For example, the resource at
is requested from the server at IP
$ nc 220.127.116.11 80 GET / HTTP/1.1 The initial line: METHOD /path?query HTTP/version Host: blog.matfyz.sk The Host header field: domain-name:port, port is optional
The resource at
at the same server (IP
is requested as follows:
$ nc 18.104.22.168 80 GET / HTTP/1.1 Host: webdesign.courses.matfyz.sk
field is mandatory in HTTP/1.1
(was not in HTTP/1.0).
- Compare the responses of the HTTP server at IP
22.214.171.124for host names
- Try sending an HTTP/1.1 request without a
The HEAD Method
If a client requests a resource with the HEAD method, the server responds with only the metadata (header) of the resource, but not its content (body).
- Repeat the previous exercise with the HEAD method instead of GET.
- How is the HEAD method used to reduce the traffic between the client and the server?
The Statelessness of HTTP and Cookies
Set-Cookie header field in server responses
from the previous exercise.
HTTP is specified as a stateless protocol: The client and the server do not maintain a persistent connection, each request is processed independently of the previous requests from the same client (or others).
Many web applications need to maintain a session with a client – a longer term dialogue with an evolving state.
The server sends a cookie to the client via the
Set-Cookie header field in the response.
The client is then expected to send the cookie (until it expires)
back to the server with each subsequent request
Cookie header field.
- Explain how cookies are used to create and maintain a session.
ncto obtain a cookie from the resource at
http://webdesign.courses.matfyz.sk/demo/cookie.php, and then send it back to the same resource.
- What are third-party cookies and how are they used?
Cookies, alternative approaches to sessions, privacy, tracking, authentication without cookies are all good blogging topics.
Sending HTML Forms via HTTP. The POST Method
An HTML form can be sent to a server using the method GET in the query part of the URL.
$ nc www.google.com 80 GET /search?q=modern+approaches+to+web+design HTTP/1.1 Host: www.google.com
Forms can also be sent using the method POST. However, not all forms should be sent by POST.
$ nc www.google.com 80 POST /search HTTP/1.1 Host: www.google.com Content-Length: 33 Content-Type: application/x-www-form-urlencoded q=modern+approaches+to+web+design
- Try the above two requests.
- Why does the second request fail? What is the (conceptual) difference between GET and POST? In which cases should you use GET to send a form? In which cases POST is the appropriate method? (Hint: [RFC2616, sections 9.3 and 9.5])
- The page at
http://webdesign.courses.matfyz.sk/demo/post.phpexpects a POST request with the parameters
passwordbut lacks a login form.
Create the POST request by hand and send it via
nc(any username works and the password is
- (1 point)
Create a request that correctly submits
50+&%50(without the quotes). All username characters must show in the output.
Proxies. The TRACE and CONNECT Methods
HTTP proxies are intermediaries between clients and servers. They forward requests from clients to servers, and relay back the responses. They can provide caching, anonymization, malware-protection, firewalling services to clients, and caching, firewalling, load-balancing services to servers.
Requests send to a client-side proxy contain full URLs:
$ nc proxy.uniba.sk 3128 HEAD http://webdesign.courses.matfyz.sk HTTP/1.1
The TRACE method asks the server to echo back the request send by the client. It can be used for debugging, or to find out the information send by a proxy to the server.
- Compare the results of the direct and proxied TRACE requests:
$ nc dai.fmph.uniba.sk 80 TRACE / HTTP/1.1 Host: dai.fmph.uniba.sk $ nc proxy.uniba.sk 3128 TRACE http://dai.fmph.uniba.sk/ HTTP/1.1
- (1 point) Find the correct server response headers that will ensure that your page will never get cached anywhere (on the proxy, in the browser…) even if the cache understands only HTTP/1.0.
- How can the CONNECT method be used? Hint: RFC2616.
Proxies and caching are good blogging topics.
REST. The Methods PUT and DELETE
The method PUT is intended to upload a resource to a server, and DELETE to delete a resource from the server. They are used in so called RESTful web applications to provide a consistent, predictable API to query and manipulate database objects (along with GET for querying and POST for updates).
PUT and DELETE methods, REST, WebDAV, CalDAV are good blogging topics.
See RFC2616, §9.2.