I have been looking over the HTTP Protocol version 1.1 (http://www.w3.org/Protocols/rfc2616/rfc2616.html) in order to try to get a sense of what parameters we should measure. The first thing that I have observed is that the data that the server sends back depends on how we make the request. Here are some examples from the Rice web server:
With no protocol specified, no headers are returned:
tbook@Athanasius:~$ telnet www.rice.edu 80
Trying 128.42.204.11...
Connected to www.netfu.rice.edu.
Escape character is '^]'.
GET /robots.txt
# Robot-exclusion file for chico.
With HTTP/1.0, we get a variety of headers
GET /robots.txt HTTP/1.0
HTTP/1.1 200 OK
Date: Wed, 10 Oct 2012 19:30:51 GMT
Server: Apache/2.2.12 (Unix)
Last-Modified: Thu, 27 May 2004 16:39:15 GMT
ETag: "2aa0f0-73f-3db6aa1a392c0"
Accept-Ranges: bytes
Content-Length: 1855
Vary: Accept-Encoding
X-Forwarded-Server: WWW1
Keep-Alive: timeout=5, max=98
Connection: Keep-Alive
Content-Type: text/plain
# Robot-exclusion file for chico.
With HTTP 1.1, we get an error (as expected) on our incomplete request
GET /robots.txt HTTP/1.1
HTTP/1.1 400 Bad Request
Date: Wed, 10 Oct 2012 19:31:59 GMT
Server: Apache/2.2.12 (Unix)
Vary: Accept-Encoding
Content-Length: 226
Cneonction: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
If we give a complete request, we receive (nearly) the same headers
telnet> toggle crlf
Will send carriage returns as telnet <CR><LF>.
telnet> open www.rice.edu 80
Trying 128.42.204.11...
Connected to www.netfu.rice.edu.
Escape character is '^]'.
GET /robots.txt HTTP/1.1
User-Agent: Telnet
Host: www.rice.edu
Accept: text/html
Connection: Keep-Alive
HTTP/1.1 200 OK
Date: Wed, 10 Oct 2012 20:04:41 GMT
Server: Apache/2.2.12 (Unix)
Last-Modified: Thu, 27 May 2004 16:39:15 GMT
ETag: "2aa0f0-73f-3db6aa1a392c0"
Accept-Ranges: bytes
Content-Length: 1855
Vary: Accept-Encoding
X-Forwarded-Server: WWW1
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/plain
# Robot-exclusion file for chico.
The Accept field seems to be ignored:
GET /robots.txt HTTP/1.0
Accept: text/html
HTTP/1.1 200 OK
Date: Wed, 10 Oct 2012 19:44:00 GMT
Server: Apache/2.2.12 (Unix)
Last-Modified: Thu, 27 May 2004 16:39:15 GMT
ETag: "2aa0f0-73f-3db6aa1a392c0"
Accept-Ranges: bytes
Content-Length: 1855
Vary: Accept-Encoding
X-Forwarded-Server: WWW1
Keep-Alive: timeout=5, max=97
Connection: Keep-Alive
Content-Type: text/plain
# Robot-exclusion file for chico.
Other parts of the protocol seem not to be implemented:
Connected to www.netfu.rice.edu.
Escape character is '^]'.
OPTIONS
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Server error!</title>
...
Here is an initial list of some things we may want to test:
- HTTP Version
- Content type
- Character set
- Partial and conditional gets
- Accept
- Expect
- TE (Transfer Encoding request)
- Upgrade
- HTTP GET / HEAD / OPTIONS / TRACE
- Response of servers to various / malformed requests (both HTTP 1.0 and 1.1)
- Behavior when a relative URL is requested; eg. GET /../etc/passwords
Other things would be interesting to test, but probably impractical, as they would require knowing a path to a resource of the appropriate type (which may not exist on the server)
- Content Coding
- Transfer Coding
- Http PUT / POST / DELETE