Wednesday, 17 February 2016

A brief history of the referer header


THE REFERER HEADER

The poor referer header. Misspelled and misused since its inception. 

Its typical use is thus: if I click on a link on a website, the referer header tells the landing page which source page I came from.





It's heavily used in marketing to analyse where visitors to a website came from, and also very useful for gathering data and statistics about reading habits and web traffic.

However, it presents a potential security risk if too much information is passed on.

In the referer header's original RFC (2616) [1], the specification lays out that:

"Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol"
That is, if our request goes from https to http, the referer header should not be present.

However, RFCs are not mandatory, and data can be leaked. Facebook fell foul of this a little while ago, when it turned out that in some cases the userid of the originating page was being passed in the referer header to advertisers when a user clicked on an advert [2].


Additionally, when traffic goes between two https site - as is increasingly common in the move towards ssl everywhere - the RFC does NOT require that the referer header is stripped.


THE META-REFERRER TAG

A potential solution to these two issues, and more, looks to be the meta-referrer tag. By adding the following tag to the source web page:
<meta name="referrer" content="origin">
the referer header can be edited to allow sites to see where their traffic has come from, but without leaking potentially sensitive data. 

The options for the content field are [3]:
  • no-referrer: omit the referer header from the request
  • no-referrer-when-downgrade: omit the referer header when moving from https to http
  • origin: set the referer header to be the origin only, that is, stripping the any path and parameters from the url
  • origin-when-cross-origin: if the request is to a different website or protocol, set the referer header to the origin
  • unsafe-url: set the referer header to be the full originating url regardless of target site or protocol, potentially leaking data.
To use a practical example, if facebook was to implement this tag as:

  <meta name="referrer" content="origin" id="meta_referrer" />

so when Mr Bobby Tables is logged into facebook, and on his homepage:

  https://www.facebook.com/bobbytables?f=nref


when he clicks on an external link and is taken to a different site, the referer header is reduced to


  referer=www.facebook.com



thus preserving his privacy. The target site registers that they've had a visitor from a facebook hit, but the name of the user is not passed on.


Google were the first to implement such a scheme [4], ostensibly to reduce latency from ssl sites, although one would suspect that being able to prove to clients that your site was the source of their traffic might be closer to the truth.

HANDLE WITH CAUTION

Whether the referer header is implemented with the new meta-referrer tag or not, it is prudent to approach it with a degree of caution.

Referer spam is still an issue [5] - an attacker can target a website using a specific referer header, which is reported by analytics tools to the website owner. Out of curiosity about where their traffic is coming from, the owner will often follow the link back to a malicious web page. 

The referer header also opens up potential for exploits and XSS attacks [6][7]. It is trivially easy to manipulate headers, so relying on the header for authorisation or authentication is heavily discouraged.

MISSING HEADERS

The referer header is omitted if:

  • the user entered the url in address bar
  • the user visited the site from a bookmark
  • the request moved from https to http
  • the request moved from https to different https url
  • security software (antivirus, firewall etc) stripped the request
  • a proxy stripped the request
  • a browser plugin stripped the request
  • the site was visited programatically (eg using curl) without setting a header
  • the meta-referrer tag disallows it
  • the meta-referrer tag allows it but the browser does not have meta-referrer support [8]
For websites that would rely on the referer header for certain advertising campaigns, the patchy and inconsistent usage of the header can be a real problem. Proxy rules allowing access for users originating from specific sites both have a high risk of not working at all depending on the user's browser or local setup, and are also vulnerable to abuse if the headers are manipulated.


TLDR

To sum up, the referer header was rather flakey, and is now slightly less flakey. It's often omitted either accidentally or deliberately, and easily faked. It can be a very useful tool in gathering data about web traffic, but probably best not to rely on it for anything especially important at this point.



References and further reading

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html

[2] https://www.facebook.com/notes/facebook-engineering/protecting-privacy-with-referrers/392382738919


[3] https://w3c.github.io/webappsec-referrer-policy/

[4] http://googlewebmastercentral.blogspot.co.uk/2012/03/upcoming-changes-in-googles-http.html

[5] https://en.wikipedia.org/wiki/Referer_spam

[6] http://www.gremwell.com/exploiting_xss_in_referer_header

[7] https://hiddencodes.wordpress.com/2015/05/29/angler-exploit-kit-breaks-referer-chain-using-https-to-http-redirection/

[8] http://caniuse.com/#feat=referrer-policy

[9] http://www.schemehostport.com/2011/11/referer-sic.html

[10] https://bugzilla.mozilla.org/show_bug.cgi?id=704320

[11] http://smerity.com/articles/2013/where_did_all_the_http_referrers_go.html

[12] https://moz.com/blog/meta-referrer-tag


[13] https://blog.mozilla.org/security/2015/01/21/meta-referrer/