Tuesday, 12 April 2016

Moving home

I'm moving from Blogger to Github pages for various reasons, but mostly because writing the kind of part-text, part-code articles that I do becomes a sisyphean task after about a page and a half. I write everything in markdown and put it in git anyway, so Github pages seems almost too easy.

New blog is here: http://annaken.github.io/ and I'll set up a redirect from this site to that in a couple of weeks.

Wednesday, 17 February 2016

A brief history of the referer header


The poor referer header. Misspelled and misused since its inception. 

Its typical use is thus: if I click on a link on a website, the referer header tells the landing page which source page I came from.

It's heavily used in marketing to analyse where visitors to a website came from, and also very useful for gathering data and statistics about reading habits and web traffic.

However, it presents a potential security risk if too much information is passed on.

In the referer header's original RFC (2616) [1], the specification lays out that:

"Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol"
That is, if our request goes from https to http, the referer header should not be present.

However, RFCs are not mandatory, and data can be leaked. Facebook fell foul of this a little while ago, when it turned out that in some cases the userid of the originating page was being passed in the referer header to advertisers when a user clicked on an advert [2].

Additionally, when traffic goes between two https site - as is increasingly common in the move towards ssl everywhere - the RFC does NOT require that the referer header is stripped.


A potential solution to these two issues, and more, looks to be the meta-referrer tag. By adding the following tag to the source web page:
<meta name="referrer" content="origin">
the referer header can be edited to allow sites to see where their traffic has come from, but without leaking potentially sensitive data. 

The options for the content field are [3]:
  • no-referrer: omit the referer header from the request
  • no-referrer-when-downgrade: omit the referer header when moving from https to http
  • origin: set the referer header to be the origin only, that is, stripping the any path and parameters from the url
  • origin-when-cross-origin: if the request is to a different website or protocol, set the referer header to the origin
  • unsafe-url: set the referer header to be the full originating url regardless of target site or protocol, potentially leaking data.
To use a practical example, if facebook was to implement this tag as:

  <meta name="referrer" content="origin" id="meta_referrer" />

so when Mr Bobby Tables is logged into facebook, and on his homepage:


when he clicks on an external link and is taken to a different site, the referer header is reduced to


thus preserving his privacy. The target site registers that they've had a visitor from a facebook hit, but the name of the user is not passed on.

Google were the first to implement such a scheme [4], ostensibly to reduce latency from ssl sites, although one would suspect that being able to prove to clients that your site was the source of their traffic might be closer to the truth.


Whether the referer header is implemented with the new meta-referrer tag or not, it is prudent to approach it with a degree of caution.

Referer spam is still an issue [5] - an attacker can target a website using a specific referer header, which is reported by analytics tools to the website owner. Out of curiosity about where their traffic is coming from, the owner will often follow the link back to a malicious web page. 

The referer header also opens up potential for exploits and XSS attacks [6][7]. It is trivially easy to manipulate headers, so relying on the header for authorisation or authentication is heavily discouraged.


The referer header is omitted if:

  • the user entered the url in address bar
  • the user visited the site from a bookmark
  • the request moved from https to http
  • the request moved from https to different https url
  • security software (antivirus, firewall etc) stripped the request
  • a proxy stripped the request
  • a browser plugin stripped the request
  • the site was visited programatically (eg using curl) without setting a header
  • the meta-referrer tag disallows it
  • the meta-referrer tag allows it but the browser does not have meta-referrer support [8]
For websites that would rely on the referer header for certain advertising campaigns, the patchy and inconsistent usage of the header can be a real problem. Proxy rules allowing access for users originating from specific sites both have a high risk of not working at all depending on the user's browser or local setup, and are also vulnerable to abuse if the headers are manipulated.


To sum up, the referer header was rather flakey, and is now slightly less flakey. It's often omitted either accidentally or deliberately, and easily faked. It can be a very useful tool in gathering data about web traffic, but probably best not to rely on it for anything especially important at this point.

References and further reading

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html

[2] https://www.facebook.com/notes/facebook-engineering/protecting-privacy-with-referrers/392382738919

[3] https://w3c.github.io/webappsec-referrer-policy/

[4] http://googlewebmastercentral.blogspot.co.uk/2012/03/upcoming-changes-in-googles-http.html

[5] https://en.wikipedia.org/wiki/Referer_spam

[6] http://www.gremwell.com/exploiting_xss_in_referer_header

[7] https://hiddencodes.wordpress.com/2015/05/29/angler-exploit-kit-breaks-referer-chain-using-https-to-http-redirection/

[8] http://caniuse.com/#feat=referrer-policy

[9] http://www.schemehostport.com/2011/11/referer-sic.html

[10] https://bugzilla.mozilla.org/show_bug.cgi?id=704320

[11] http://smerity.com/articles/2013/where_did_all_the_http_referrers_go.html

[12] https://moz.com/blog/meta-referrer-tag

[13] https://blog.mozilla.org/security/2015/01/21/meta-referrer/

Tuesday, 8 December 2015

Puppet in the Pipeline

I gave a talk at the recent PuppetConf called "Puppet in the Pipeline" - a round up of workflow planning, deployment pipelines, and integration points. I start out with a very basic setup, and walk through various stages of complexity, talking though technical options and things to consider. I can't seem to get it written down as a satisfactory blog post, but for now I will just link to the video and slides:

Video: https://www.youtube.com/watch?v=4jXGmxkEoeM

Slideshare: http://www.slideshare.net/AnnaKennedy11/puppet-in-the-pipeline-55953094

Thursday, 27 August 2015

Good ticket guide

I wrote this guide at $job-1 (where it seems to be still in circulation) and thought maybe it could be a useful thing to share further.


We love a good ticket. It makes the job easier, it gets done faster, and it keeps everyone a bit happier. Here's a quick guide to what makes good tickets.

Describe the problem - not the solution

  • Whilst ideas about what might be wrong can be helpful, the absolute best thing you can do is describe the problem in its entirety

A descriptive ticket title

  • so that we can find your ticket at a glance

Relevant information and examples

  • If something is broken, then please outline how we can test it out for ourselves. How does it normally work? What happens now? When did it break?
  • If you need something changing, what is currently configured, and what would you prefer?
  • If you need something new, is it like anything that currently exists?

What’s the timescale?

  • Any indication of due dates, or potential blockers are useful
  • Set the ticket priority as appropriate
  • If it’s urgent, it’s usually best to log a ticket and then come over in person

What’s the value?
  • If something is broken or not working correctly, what is the impact to the business?
  • If the ticket is for some new infrastructure, what project is it supporting? 
  • Can you justify your request? How does it fit into the organisational priorities?
  • This information helps us do the most important tickets first.

Other helpful things to include:

  • Error messages
  • Screenshots
  • Example URLs

What not to include:

  • A direct set of commands to run without a full explanation of the problem. 
  • You don’t need to say thank you! At least not on the ticket, as this re-opens closed tickets. You’re welcome to come and say thank you in person (or buy us a beer at the pub).

Friday, 31 July 2015

Automated server testing with Serverspec, output for Logstash, results in Kibana. Part 2: Logging

At the end of Part 1 we had a Serverspec installation running tests which were stored alongside our configs.
Command-line arguments passed in the name of the VM and a list of modules to be tested.

Next, we want to look carefully at the output generated by Serverspec so that we can track and visualise our tests. We need to track our data carefully so that we can cope with the results of many different VMs.

Serverspec outputs

Serverspec has a number of output options. The 'documentation' style is what we've seen printed to screen so far; there are also json and html reports. It is possible to get all of these formatting options at once by adding the following line to your Rakefile:

 t.rspec_opts = "--format documentation --format html --out /opt/serverspec/reports/#{$host}.html --format json --out /opt/serverspec/reports/#{$host}.json"

So now we have two files at /opt/serverspec/report: www.example.com.html and www.example.com.json.
The json file is the one we're going to pick up and turn into our log.

Logging format

If we inspect the contents of the www.example.com.json report, we can see that it is of the format:
    "examples": [
            "description": "should be installed",
            "file_path": "/opt/puppetcode/modules/ntp/serverspec/init_spec.rb",
            "full_description": "Package \"ntp\" should be installed",
            "line_number": 4,
            "run_time": 2.525189129,
            "status": "passed"
    "summary": {
        "duration": 2.609159102,
        "example_count": 1,
        "failure_count": 0,
        "pending_count": 0
    "summary_line": "4 examples, 0 failures"
Each test is an element in the 'examples' array, and at the end we have a summary and a summary_line.

We're going to pick up every test as a separate json object, insert some identifying metadata, and output each test as a line in /var/log/serverspec.log

Apart from the host and module identifiers, it might also be helpful if we knew, for example, that the OS version of the host was, which git branch it came from, and maybe a UUID unique to a test (which could encompass multiple VMs).

With this in mind, we re-write our /opt/serverspec/Rakefile as follows:

require 'rake'
require 'rspec/core/rake_task'
require 'json'

# Command line variables
$uuid   = ENV['uuid']
$host   = ENV['host']
$modulelist = File.readlines(ENV['filename']).map(&:chomp)
$branch = ENV['branch']
$osrel  = ENV['osrel']

task :spec => ["spec:#{$host}", "output"]

# Run the Serverspec tests
namespace :spec do
  desc "Running serverspec on host #{$host}"
  RSpec::Core::RakeTask.new($host) do |t|
    ENV['TARGET_HOST'] = $host
    t.pattern = '/opt/puppetcode/modules/{' + $modulelist.join(",") + '}/serverspec/*_spec.rb'
    t.fail_on_error = false
    t.rspec_opts = "--format documentation --format html --out /opt/serverspec/reports/#{$host}.html --format json --out /opt/serverspec/reports/#{$host}.json"

# Edit the serverspec json file to add in useful fields
task :output do
  File.open("/var/log/serverspec.log","a") do |f|
    # Read in the json file that serverspec wrote
    ss_json = JSON[File.read("/opt/serverspec2/reports/#{$host}.json")]
    puts "/opt/serverspec2/reports/#{$host}.json"
    ss_json.each do |key, val|
      if key=='examples'
        val.each { |test|
          modulename = test["file_path"].gsub(/\/opt\/puppetcode\/modules\//,"").gsub(/\/serverspec\/.*/,"")
          test["module"] = modulename

# Add in the rest of our useful data
def insert_metadata ( json_hash )
  json_hash["time"]   = Time.now.strftime("%Y-%m-%d-%H:%M")
  json_hash["uuid"]   = $uuid
  json_hash["hostip"] = $host
  json_hash["branch"] = $branch
  json_hash["osrel"]  = $osrel

Now we can run

rake spec host=www.example.com filename=/opt/serverspec/modulelist branch=dev osrel=7.1 uuid=12345

And see in /var/log/serverspec.log

{"description":"should be installed","full_description":"Package \"ntp\" should be installed","status":"passed","file_path":"/opt/puppetcode/modules/ntp/serverspec/init_spec.rb","line_number":4,"run_time":0.029166597,"module":"ntp","time":"2015-07-31-12:21","uuid":"12345","host":"www.example.com","branch":"dev","osrel":"7.1"}

This log can now be collected by Logstash, indexed by Elasticsearch, and visualised with Kibana.

Automated server testing with Serverspec, output for Logstash, results in Kibana. Part 1: Serverspec

Whether you're spawning VMs to cope with spikes in traffic, or you want to verify your app works on a range of operating systems, it's incredibly useful to have some automated testing to go with your automated VM creation and configuration.

This is a quick run-down of one way to implement such automated testing with Serverspec and get results back that are ultimately visualisable in Kibana. NB the orchestration of the following steps is beyond the scope of this article - maybe some CI tool like Jenkins, orchestration tool like vRO or some custom software.

  • Automagically create VMs (AWS, OpenStack, etc)
  • Configure the VMs with some config management tool (Puppet, Chef, etc)
  • Perform functional testing of VMs with Serverspec 
  • Output logs that are collected by Logstash
  • Visualise output in Kibana

The first two points are essentially prerequisites to this article: create some VMs and install them with whatever cloud and config magic you like. For the purposes of this article, it doesn't really matter. I'm just going to assume that your VMs are 'normal', ie running and contactable.

Functional testing with Serverspec

Serverspec, if you've not used it, is an rspec-based tool to perform functional testing. It's ruby-based, has quite an easy set-up, and doesn't require anything to be installed on the target servers, just that it is able to ssh into the target machines with an ssh key.

Install and set up a la the Serverspec documentation

# gem install serverspec
# mkdir /opt/serverspec
# cd /opt/serverspec
# serverspec-init

This will have created you a basic directory structure with some files to get you started.

Right now we have:

# ls /opt/serverspec


The default setup of Serverspec is that you define a set of tests for each and every server and then run the contents of each directory against the matching host. However this doesn't really fit the workflow we're setting up here.

Re-organise Serverspec from host-based to app-based layout

To get started, let's delete the www.example.com directory - we don't want to define a set of tests per host like this, we want to make an app-based layout.

In my opinion, one of the easiest ways to organise the layout for your functional tests is to store it alongside your config management code. With this in mind, let's write a simple ntp test.

Writing a Serverspec test

Our ntp Puppet config is found at, and looks like:

# cat /opt/puppetcode/modules/ntp/manifests/init.pp

class ntp {
  package { 'ntp':
  ensure => installed

So alongside this directory we can make a sister Serverspec directory, and put our first test in there:

# cat /opt/puppetcode/modules/ntp/serverspec/init_spec.rb

require 'spec_helper'
describe package('ntp') do
  it { should be_installed }

Making Serverspec run our test

Now we need to edit the Rakefile to reflect this restructuring:

# cat /opt/serverspec/Rakefile

require 'rake'
require 'rspec/core/rake_task'

$host       = 'www.example.com'
$modulelist = %w( ntp )

task :spec => "spec:#{$host}"

namespace :spec do
  desc "Running serverspec on host #{$host}"
  RSpec::Core::RakeTask.new($host) do |t|
    ENV['TARGET_HOST'] = $host
    t.pattern = '/opt/puppetcode/modules/{' + $modulelist.join(",") + '}/serverspec/*_spec.rb'
    t.fail_on_error = false

Yes, we did just hard-code the host name and modulelist to test. Don't worry, we'll switch these out in a bit.
Note that we provide a pattern path with a regex to the directory containing our tests. Essentially when we run this file, we will pick up every test that matches the pattern and run these tests against the desired host.

Run the test

Now, making sure we are standing in the /opt/serverspec/ directory, we can run
# rake spec
Package 'ntp'
  should be installed
Green means that the test ran, and the output was successful. So as it stands, we can test our one www.example.com host with our one ntp test. Great! 

Rewrite the Rakefile to take command-line options rather than hard-coding variables

Right now, our host identifier and our list of modules to test are hard-coded in the Rakefile. Let's rewrite so these are passed in on the command line.

# cat /opt/serverspec/Rakefile

require 'rake'
require 'rspec/core/rake_task'

$host       = ENV['host']
$modulelist = ENV['modulelist']

task :spec => "spec:#{$host}"

namespace :spec do
  desc "Running serverspec on host #{$host}"
  RSpec::Core::RakeTask.new($host) do |t|
    ENV['TARGET_HOST'] = $host
    t.pattern = '/opt/puppetcode/modules/{' + $modulelist.join(",") + '}/serverspec/*_spec.rb'
    t.fail_on_error = false

Now to run the tests we need to do 
# rake spec host=www.example.com modulelist=/opt/serverspec/modulelist
# cat /opt/serverspec/modulelist
The modulelist file can be one you write yourself, or generated from something like a server's /var/lib/puppet/classes.txt. It's a way to narrow down what tests are run against each server, as all modules are not necessarily implemented everywhere.

Part 2: Generate logs that can be collected by Logstash, indexed by Elasticsearch, and visualised in Kibana

Friday, 8 May 2015

Recovering from puppet cert clean --all

If you just did 'puppet cert clean --all' because reasons and now everything is broken like:
test-server:~# puppet agent -vt
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: Error 400 on SERVER: Could not retrieve facts for test-server.vm: Failed to find facts from PuppetDB at puppetmaster.example.com:8081: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed: [certificate revoked for /CN=puppetmaster.example.com]
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to submit 'replace facts' command for test-server.vm to PuppetDB at puppetmaster.example.com:8081: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed: [certificate revoked for /CN=puppetmaster.example.com]
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

STOP PANICKING: we can fix this.

If you have a backup of the puppetmaster's /var/lib/puppet directory, do a restore and hopefully all will be well.

If not, let's fix the puppetmaster (NB here I'm using a monolithic installation - if your puppetmaster and puppetdbs are on separate machines you'll have to adapt this a little bit).

Cleaning all the certificates means that the puppetmaster's own certificate is missing too, so re-generate it with

puppetmaster:~# puppet cert generate puppetmaster.example.com

Now the puppetmaster has new ssl bits and bobs but puppetdb has the old ones. Clean out the puppetdb ssl directory:

puppetmaster:~# rm -rf /etc/puppetdb/ssl/*

And use the handy ssl-setup script to copy the new ones to the right places

puppetmaster:~# puppetdb ssl-setup
PEM files in /etc/puppetdb/ssl are missing, we will move them into place for you
Copying files: /var/lib/puppet/ssl/certs/ca.pem, /var/lib/puppet/ssl/private_keys/puppetmaster.example.com.pem and /var/lib/puppet/ssl/certs/puppetmaster.example.com.pem to /etc/puppetdb/ssl
Setting ssl-host in /etc/puppetdb/conf.d/jetty.ini already correct.
Setting ssl-port in /etc/puppetdb/conf.d/jetty.ini already correct.
Setting ssl-key in /etc/puppetdb/conf.d/jetty.ini already correct.
Setting ssl-cert in /etc/puppetdb/conf.d/jetty.ini already correct.
Setting ssl-ca-cert in /etc/puppetdb/conf.d/jetty.ini already correct.

Restart all the things:

puppetmaster:~# service puppetmaster restart
puppetmaster:~# service puppetdb restart

Now, let's fix the nodes.

Start with a test node (preferably not in production), to verify all the steps so far worked as expected.
Remove the existing ssl certs with

test-server:~# rm -rf /var/lib/puppet/ssl/*

Now run puppet manually with

test-server:~# puppet agent -vt
Info: Creating a new SSL key for test-server.vm
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for test-server.vm
Info: Certificate Request fingerprint (SHA256): 92:A9:A6:B1:88:7B:DB:A7:65:00...
Info: Caching certificate for ca
Exiting; no certificate found and waitforcert is disabled

Sign the certificate on the master as usual

puppetmaster:~# puppet cert sign test-server.vm

Now your node should run as usual

test-server:~# puppet agent -vt
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for varnish4.vm
Info: Applying configuration version '1431084513'

The final step is to re-generate certificates for all the rest of your nodes.
Option 1: log into every server and repeat the above.
Option 2: automate option 2 - think ssh, clusterssh, etc

Good luck!

PS I lied - the final, final step is to set up proper backup and restore of your certificate store at /var/lib/puppet/ssl and delete the clean --all line from your command history so you can't accidentally run it again.

References: https://docs.puppetlabs.com/puppetdb/latest/install_from_source.html