Category Archives: open source

Census 2021: Who gets a letter and who gets a paper form?

This is an extended article, which was originally posted on The Conversation on 18/03/2021.

In England, Wales and Northern Ireland, we should all now have received either a letter with a 16 digit code or a paper form to fill in for the 2021 Census. There are lots of great reasons why we should respond to the census, aside from it being a legal requirement. Among other things, it’s a good way to help provide an accurate snapshot of your community, which means people will get the services they needs at a local level. The Conversation and the Royal Geographical Society (with IBG) have both posted more information about what the census is and why it is important.

The census is a fascinating data set that’s vital to many areas of research and government decision making. It provides us with a count of the population, but also a wide range of demographic data like age, gender, family relationships, socio-economic information, ethno-cultural background, health, and some voluntary questions, including religious identity and sexuality.

This is the first census that most people will be asked to complete online. However, some have received paper forms through the post, while others have just received a letter asking them to fill in the census online. Though the mechanics of the census may appear complex, the reasons why are actually quite straightforward.

So who gets a letter, who gets a form and why? The Office for National Statistics (ONS) (which is coordinating the census) has tried to determine who gets what by assessing which households are likely to find it impossible or more difficult to respond to the census online. These households (around 10% of all households) have been sent a paper form. Everyone else has received a letter with a code, asking them to complete the form online (however, it’s important to note that if you received a form, you can still respond online and if you got a letter, you can request a paper form if you want).

Online or by post?

There are a number of good reasons for filling out your form online – it saves the ONS time and money when collating the results and means we can get more accurate data.

The 2021 Census form

You might be thinking: “what about my Aunt Muriel who received a letter? She doesn’t use the internet, why hasn’t she got a form?” This is because the ONS doesn’t know who’s able and willing to submit the form online – they can only model this based on the data they have.

As statistician George Box said: “All models are wrong, but some are useful”. This means that while the ONS has modelled who will (and who will not) respond online, even if they get 95% of people in the right group, there will be some errors.

There’s a term for this in the field of Geographical Information Systems (often shortened to GIS, the systems and tools we use to manage and analyse location data) – an ecological fallacy. This means that there will be cases which contradict the ONS’s model. For those who the ONS has deemed unable or unwilling to complete the census form online, there will be some who don’t fit this criteria and vice versa. This is why the ONS has included a code on the forms. If you know someone who needs a form, but is having problems requesting one, you can request one on their behalf.

The hard to count index

How did the ONS model this information? The ONS created a “hard to count” index to measure who might not respond to the census (also used for the 2001 and 2011 censuses). However, the 2021 census is different as this is the first time it’s tried to do a census “online first”, which means the ONS also had to include the digitally excluded into its index.

The key data used to drive this was internet access data from Ofcom, mobile internet connectivity (also from Ofcom) and information on who has already interacted with government websites (such as via the DVLA and HMRC). This data was used to create an area-based model, with each area assessed as either being able to complete the census online, or needing paper forms. Each area contains about 1,500 people and are known by the ONS as LSOAs (lower layer super output areas). This was tested and refined together with many other aspects of the census in the ONS’s big rehearsal for the census in 2019. There are lots more details in their report EAP102 Hard to Count index for the 2021 Census

Internet User Classification

While ONS have not published their Hard to Count index, they have shared it with Local Authorities to help them target their census engagement work. A similar example looking at who is digitally excluded is the Internet User Classification, created by the Consumer Data Research Centre, open and freely available for anyone to use. Here, they looked at a range of factors (including internet connectivity and usage) and created a geodemographic classification identifying who uses the internet (e.g. e-Cultural Creators, e-Professionals) and who does not (e.g. Settled Offline Communities and e-Withdrawn). Geodemographics have some advantages over indices, in that they can help describe who doesn’t have internet access, and can be used to identify specific measures to help address this and/or used to identify individuals or groups with specific characteristics. 

CDRC Mapmaker, showing Internet User Classification data. 

However, we need to remember our ecological fallacy from earlier – not everyone in “e-Cultural Creators” (the group with the highest level of internet access/use) will have access to the internet and not everyone in “e-Withdrawn” (the group with the lowest level of internet access/use) will not have access to the internet. It is a model – a useful model, but a model nonetheless. If you are interested in Geodemographics, CDRC have a training course on Geodemographics, what they are and how you can create them (free to access, but you need to sign up for an account). 

One other thing to consider is what if the model is wrong? No model is 100% correct, so there will always be people who are incorrectly allocated to one group or another. When using the model, this needs to be remembered, and the suitable infrastructure needs to be in place to support this (i.e. being able to request a paper form if you want one). How much resource this should be given is a tricky question – and one that varies depending on the impact of getting someone in the wrong group. 

Hopefully this helps explain the why question. There are many more details on the ONS website, particularly in their papers documenting the methods used to run the Census at, particularly the “hard to count” (EAP102) and “maximising response” (EAP113) papers. Thanks very much to David Martin (University of Southampton) for pointing me to the resources in question, Tom Chadwin for his suggestions improving this article, and Kuba Shand-Baptise at The Conversation for her comments and input. If you are interested in GIS and the Ecological Fallacy, I can recommend GIS: Research Methods (first chapter free online).

This is an extended article, which was originally posted on The Conversation on 18/03/2021.

FOSS4G UK 2019: Open Source, Geospatial, Sun and Lego

Edinburgh view from Salisbury Crags, just above Dynamic Earth

I had a wonderful three days in Edinburgh attending the most recent FOSS4G UK 2019 conference, based at Dynamic Earth in Edinburgh. Edinburgh has never had better weather, and I was assured by the locals that this was not normal! FOSS4G conferences have a special vibe to them that makes them unique to any other sort of conference. Various people have already written about that vibe, much more eloquently that I can.

There was a great selection of workshops and talks, and I ended up attending primarily workshops, which is a first for me. I have a particular interest in collecting data in the field, and so went to the workshops in QField and Input; both mobile phone apps to provide an interface to collect data on your phone, and then synchronise this back with a QGIS project when you get back to the office.

The wonderful Kirsten Reilly from ThinkWhere hosted the workshop on QField, explaining how we could setup a project in QGIS, synchronise this with the app to go out into the field. We had some of the usual technical issues, but nothing unusual for a practical session.

I also attended the Input workshop, run by the skilled Saber Razmjooei of Lutra Consulting. Lutra have developed Input as a alternative to QField, re-creating the app from scratch, and ensuring that Input can be operated on iPhones as well (QField is currently Android only). There are a lot of similarities between the programs, with QField being a bit more developed (i.e. less buggy) but Input having a cleaner interface and slightly more features. We actually also got to go outside and test the app out, which was great. My phone (a Fairphone 2) was actually not very happy with either app and my experience wasn’t flawless (but your mileage may vary, as they say).

The key differences are:

  • QField only works on Andriod, Input works on Andriod and iOS.
  • QField uses a cable to transfer files from your computer to the phone and back, Input uses the cloud (a website called Mergin, developed by Lutra) to manage the synchronisation process.
  • One key feature that Input has (which QField lacks) is the ability to record tracks (or lines) logging the route you took, where as QField can only record points.
  • QField is relatively mature whereas Input is very new.

Overall I would say that Input just edged ahead of QField. If you are looking to use these in the field, try out both!

One great talk was from Mike Spencer, discussing the pros and cons of using R or QGIS for cartography. There are so many options out there, and his talk gave some great examples of amazing outputs from R and QGIS. There was a whole slew of talks that I would have liked to have attended, but couldn’t because things clashed. Fortunately all of the talks at FOSS4G UK 2019 were live streamed and recorded, which allows anyone to experience the conference.

I led a workshop on contributing to QGIS documentation, which was very well received with 10 participants. Contributing to documentation is a key element of open source software and is something that often gets neglected. We covered how QGIS documentation is structured, how to work with GitHub to make changes on the web, and how to work with documentation locally. The workshop was only 90 minutes long, so we didn’t have time to actually make any changes to the QGIS Documentation, but we did have great fun experimenting with the example repository I made for it. Thanks to denelius, Nikosvav, mikerspencer, SteveLowman, myquest87, hopkina, cearban and TBreure for attending and getting involved.

At the Community Sprint on Sat 21st, a group of 9 of us had a go at a variety of coding and documentation issues. I led a group of three experimenting with a number of QGIS Documentation issues. We all had a deep dive into GitHub and learnt a lot! We fixed a range of issues from unclear documentation to new features in the QGIS Master that needed to be added into the documentation. These included:

The organising committee put together a great conference and captured the unique feeling of a FOSS4G conference. Many thanks to all of them, and they even created a Lego video to celebrate the amazing conference. FOSS4G conferences happen all across the world, so keep your eyes open for one near you in the future!

Also posted with xyHt at


Spatial R – Moving from SP to SF

I recently ran my ‘Introduction to Spatial Data & Using R as a GIS’ course for the NCRM at the University of Southampton. This was the first time after I had updated the material from using the SP library to using the new SF library. The SF (or Simple Features) library is a big change in how R handles spatial data.

Working with RStudio at University of Southampton

Back in the ‘old days’, we used a package called SP to manage spatial data in R. It was initially developed in 2005, and was a very well-developed package that supported practically all GIS analysis. If you have worked with spatial data in R and used the syntax variable@data to refer to the attribute table of the spatial data, then you have used the SP package. The SP package worked well, but wasn’t 100% compatible with the R data frame, so when joining data (using merge() or match()) you had to be quite careful, and we usually joined the table of data to the variable@data element. For those in the know, it used S4 data types (something I discovered when I generated lots of error messages whilst trying to do some analysis!)

The SF library is relatively new (released Oct 2016) and uses the OGC (Open Geospatial Consortium) defined standard of Simple Features (which is also an ISO standard). This is a standardised way of recording and structuring spatial data, used by nearly every piece of software that handles spatial data. Using SF also allows us to work with the tidyverse series of packages which have become very popular, driven by growth in data science. Previously, tidyverse expected spatial data to be a data frame, which the SP data formats were not, and often created some interesting error messages!

The Geospatial Training Solutions ‘Introduction to R’ course is very well established, and I have delivered it 14 times to 219 students! However, it was due for a bit of a re-write, so I took the opportunity of moving from SP to SF to do restructure some of the material. I also changed from using the base R plot commands to using the tmap library. As a result, it is now much easier to get a map from R. In fact, one of the participants from my recent NCRM course in Southampton said:

“It was so quick to create a map in R, I thought it would be harder.”

Participant on Introduction to Spatial Data & Using R as a GIS, 27th March 2019, University of Southampton

They were blown away by how easy it was to create a map in R. With SF and tmap, you can get a map out in 2 lines (anything staring with # is a comment):

LSOA <- st_read("england_lsoa_2011.shp")  #read the shapefile 
qtm(LSOA) #plot the map

You can also get a nice looking finished map with customised colours and classification very easily:

tm_shape(LSOA) +
tm_polygons("Age00to04", title = "Aged 0 to 4", palette = "Greens", style = "jenks")
+ tm_layout(legend.title.size = 0.8)
Count of people aged 0 to 4 in Liverpool, 2011 Census Data.

However, unfortunately not all spatial analysis is yet supported in SF. This will come with time, as the functions develop and more features are added. In the practical I get the participants to do some Point in Polygon analysis, where they overlay some crime points (from with some LSOA boundaries. I couldn’t find out how to do a working point in polygon analysis* using this data and the SF library, so I kept my existing SP code to do this. This was also a useful pedagogical (teaching) opportunity to explain about SF and SP, as students are likely to come across both types of code!

*I know theoretically it should be possible to do a point-in-polygon with SF (there are many posts) but I failed to get my data to work with this. I need to have more of an experiment to see if I can get it working – if you would like to have a try with my data, please do!

The next course I am running is in Glasgow on 12th – 14th June where we will cover Introduction to Spatial Data & Using R as a GIS, alongside a range of other material over 3 days. Find out more info or sign up.

The material from this workshop is available under Creative Commons, and if you would like to come on a course, please sign up to the Geospatial Training Solutions mailing list.

Cross-posted from

ESRC Research Methods Festival 2018

During the amazingly sunny weather a few weeks ago, I managed to spend a couple of days indoors, hiding from the sun at the ESRC Research Methods Festival at the University of Bath. Every 2 years, the National Centre for Research Methods have organised this conference to showcase unique and new methods from across the social sciences. The conference covered everything from ‘Multi-scale measures of segregation data’ and ‘Quantitative methods pedagogy’ to ‘Do participatory visual methods give ‘voice’?’ and ‘Comics as a research method’.

It was also fantastic to meet a range of academics and researchers who I would not normally meet. I met a number of people who I had communicated regularly with on Twitter, but never met in person before!

I was presenting in a session on ‘Multiscale measures of segregation data‘, where we were discussing different approaches to how deprivation can be measured across different locations. One of the major characteristics of grouped spatial data is the MAUP (Modifiable Areal Unit Problem), where the method used to group your data will have an impact on the results of any analysis. The session was a great collection of presentations, all of us looking at similar issues but often taking quite different methods to approach them.

I showed how using variograms based on the PopChange data set to look at spatial segregation can help avoid some of the impacts of imposing scales on the data, and instead use the data to tell us at what scales the variations are taking place.

Across the whole conference there was a range of content using scripting languages, and R and Python featured significantly across the board, to the surprise of some of the participants, including me:

Like most conferences, there were so many interesting sessions and it was often difficult to choose which track to attend! The keynotes were all thought provoking. Danny Dorling presented a range of interesting information on current levels of inequality in the UK, and warned us that it is likely to get worse before it gets better. Donna Mertens called on all of us to think about how our research can change things, and if it doesn’t, why not?

It was a great methods conference, and reminded me about how many different methods are out there. If you would like a chat about how using GIS could help with your research or work, please do give me a call on 01209 808910 or email at

Cross-posted from

GISRUK 2018: A Return to Leicester

Last week I attended an amazingly sunny GISRUK (Geographic Information Science Research UK) conference in Leicester. I have fond memories of Leicester, as I completed my BSc Geography (2003-2006) and MSc GIS (2007 – 2008) there. Much of the university and city has changed, but an amazing amount is still the same – both in the Bennett building lecture theatres and certain well frequented take-aways!

University of Leicester – Attenborough Tower (L) and Charles Wilson Building (R)

I coordinated the Early Career workshops, where those early in their GIS careers (including, but not limited to, PhD and MSc students) came together for two half-day sessions to find out more about GIS as a career in academia and industry, to learn more and compare notes about their respective PhD/MSc experiences, and most importantly, to get to know each other before the main conference! We had a great variety of input from James Norris (Ordnance Survey / Group on Earth Observations / AGI), James Kendall (RGS), Dave Unwin (ex University of Leicester & Birkbeck), May Yuan (Editor-in-Cheif IJGIS, University of Texas at Dallas), Addy Popy (ESRI UK) and Katie Hall (ESRI UK).

Early Careers session in full flow

The main conference had a great selection of talks and presentations covering every application of GIS from archaeology, to crime, health, transport, and urban studies! It is always a challenge to work out which of the three parallel sessions to attend, and I can’t attend everything. Particularly of note for me was Alex Singleton’s keynote on ‘Why Open Data are Not Enough’, discussing some of the issues with open spatial data, particularly in terms of data longevity which very much reminds me of this XKCD comic, and still really hasn’t been solved for spatial data. This was rather well illustrated by the CDRC Data Store that has been developed through the Consumer Data Research Centre; there is no mechanism for ensuring this continues after the CDRC funding finishes, and this is the norm with many academic projects.

Alex Singleton: Why Open data are Not Enough

There was also a great presentation by Sam Cockings looking at how we can better model day time populations, from a variety of data sources. Integrating many real time data sources is going to be a key aspect of spatial data management in the future and I can see many projects using the skills and technologies Sam described.

Next year GISRUK 2019 will be in Newcastle University, and I look forward to seeing you there!

If you would like a chat about GIS Research, or GIS Training for small groups, please do email or give me a call on 01209 808910.

Cross-posted at

FOSS4G UK 2018: A success!

After 6 months or so of collaboration FOSS4G UK 2018 finally happened! I was a small part of the dedicated team who brought the conference together and it was an amazing experience. Thanks to James (@JamesLMilner), Tom (@tomchadwin), Isabel (@IsaUlitzsch), Sam (@SamRFranklin), Max (@GeospatialMax) and Dennis (@goldrydigital) as well as Jo Cook and Steve Feldman who gave us occasional nudges in the right direction with their experience from FOSS4GUK 2016 Southampton. Organising the conference felt a bit like organising a wedding(!) in that once we had picked the date, location, catering and sorted out the guest list, the rest more-or-less fell into place! Not that I intend to do either again in the near future!

FOSS4G UK 2018 Team Photo

Unfortunately I wasn’t around for the team photo on Friday, but I was there in spirit!

The conference itself went amazingly well and it was great to see so many people there who were so enthusiastic about open source geospatial software. Unfortunately I was only able to attend Thursday, but I managed to take part in some great workshops on pgRouting and Satellite Data, learn some new things, make some new contacts and baby sit the room-to-room live feed!

MacGyver putting in an appearance at FOSS4GUK 2018 in Mathilde Ørstavik’s Keynote talk on Extracting intelligent information from aerial images using machine learning.

It was a struggle to work out which stream to attend and I’ve seen from Twitter (#FOSS4GUK) that Tom Armitage went to town with the ‘May the FOSS be with you’ Star Wars theme, the highlight being a presentation using a light sabre rather than a laser pointer:


I still hope to have a run through of Tom’s workshop material when I get some time 🙂

FOSS4G UK 2018 Workshop

Everyone hard at work in the pgRouting, PostGIS and QGIS workshop.

We will post links to all the slides and material we can on the website – if yours are not there yet, send them over or submit a PR. I do hope we can do this again, and if people would like to volunteer for the next conference, please make yourself known!

If you’d like a chat about potential for OS Geo training for individuals or groups, please do send me an email or give me a call on 07717745715.