Creating choropleth maps in R with the darkest colour at the top

I’ve just been through the process of contributing to the source code of a package in R (in a very small way) so here’s a short piece on how easy it was, and why anyone can do it! I originally wrote this post in August last year, but waited to post it until the new version of maptools was released. I missed this (we are now at 0.8-39!) and have only just rediscovered this post. It’s all still relevant though!

I have been using the Maptools library extensively in my use of R as a GIS, as well as in my teaching material (hosted at https://github.com/nickbearman/intro-r-spatial-analysis). The default plot order in the legend is to have the darkest colour at the bottom of the legend, and the lightest colour at the top. This was just something I accepted, and to be honest, never really thought about before.

I recently delivered a training course on R to some staff at the ONS (Office for National Statistics, England & Wales) and they said that their best practice guidelines are to have the darkest colour at the top of the legend. They asked me how to do this, which I didn’t know!

After some fiddling about with an R script, I created a version which worked for them. I then thought it might be useful to integrate this into the Maptools library, and emailed the package author, Roger Bivand. He was very helpful, and I added the additional code to the sourcefiles for Maptools. These are now avaliable in version 0.8-37 (or later), which has recently be released. Running update.packages(“maptools”) should get you the new version.

To reverse the colours is a simple matter of changing the legend code in two places. Using the example from the helpfile, the original line:

legend(x=c(5.8, 7.1), y=c(13, 14.5), legend=leglabs(brks), fill=colours, bty="n")

The revised line:

legend(x=c(5.8, 7.1), y=c(13, 14.5), legend=leglabs(brks, reverse = TRUE), fill=rev(colours), bty="n")

To give you some nice visual examples:

Rplot Rplot_reverse

Or for those of you who have attended my R course:

normal-order reverse-order

The file I updated is at https://r-forge.r-project.org/scm/viewvc.php/pkg/R/colslegs.R?view=markup&root=maptools (this link shows the changes), and I also updated the helpfile. If you’ve done some R scripting, then it is not too difficult to do. Any questions, please post them here. Good luck!

 

Joining Clear Mapping Co

I’ve formally joined Clear Mapping Co, moving from academia to the commercial world. Below is a copy of their most recent blog post.

20131005_082255We are pleased to announce that Dr Nick Bearman has formally joined Clear Mapping Co as Senior GIS Analyst and Course Director. Nick joins us at a key point in the company’s development in our fifth year, as we continue to expand into international markets with our unique offer of cartographic design consultancy. Nick is continuing to work on cartographic projects, whilst also developing our new range of GIS Training Courses, available at different universities around the country.

He brings a wealth of experience having lectured and researched at University of Liverpool for nearly three years. We are really pleased to welcome him on board and hope you’ll get to meet him at future events.

Q1. What brings you to Cornwall?

My wife and her family live in Cornwall, so was keen to get back to Cornwall after lecturing and researching in Liverpool for nearly 3 years. I’ve always wanted to explore options outside academia and experience a different set of challenges.  Clear Mapping Company was a perfect opportunity for me, both in terms of applying and developing my skills, as well as spearheading their GIS training development.

Q2. Where did you study?

Studied at Leicester and UEA (Norwich) Universities and enjoyed everything about it. At school I really enjoyed geography and computer science and when I discovered the Geographic Information Systems (GIS), it was the ideal combination of the two! I still have connections with Leicester and UEA as well as Exeter and Bristol in the south west, but I’ll be pleased to be based in Cornwall.

Q3. What areas of GIS do you work with?

GIS is such a broad area that it is tricky to pin down exactly which areas I work in, and my recent projects have involved UK Census data (going back to 1971), transport routing analysis (using OpenStreetMap data) and web GIS (using the Google Maps API, among others). I use a range of different tools and techniques, depending on which is most appropriate for the project.

Q4. What do you find exciting about working in an international cartographic design consultancy?

Have been working with Caroline & Kirstin at Clear Mapping Co for the past five months and really enjoying it. I’ve been involved in a wide variety of projects, including campsite maps, 3D battlefield education maps and phase 1 ecology studies. The company is growing and we recently advertised for a Senior Graphic Designer. I have a significant input into the direction the company takes, which is both exciting and scary! We are also based in a beautiful location in Penryn, Cornwall with a lovely view looking out on to the Penryn river! Cornwall has some spectacular scenery and we’re really fortunate to have Cornwall as our base.

Q5. What do you regret about moving away from academia?

Leaving academia after spending many years studying & teaching does make me feel a little sad, but also excited to be working in a very different environment will all sorts of different opportunities and challenges. While there are quite a few differences, one thing that has slightly surprised me is the number of similarities.

In academia, particularly as a GIS specialist, I often had to liaise between different members of staff who wanted to perform some GIS analysis on their project, and explain what GIS could do and equally make it clear what GIS could not do. This was particularly important when I was working in interdisciplinary groups, ensuring we were all speaking the same language, particularly when using terms like rate analysis, difference and spatial scale. This is equally important in a commercial setting, as often the client doesn’t know exactly what they want and what can (or cannot) be done.

One of the biggest differences I have found is the time scales – in academia we are used to planning 6 months, 1 or 2 years ahead (REF2020 anyone?!) whereas in commercial work what we do changes on a weekly, daily or even hourly basis. New projects come in and often have to be dealt with immediately, and sometimes can be out the door the same day or the next day. Getting to grips with this has been one of the hardest things for me so far, but I am getting used to it!

Q6. Why does GIS Training give you a buzz?

I really enjoy teaching people about the power of GIS and to show them how it can be useful for their work. That ‘moment’ where people who haven’t used spatial data before see what it can do for them is amazing. Often many people on my courses have come across or used GIS before, but they don’t understand how to get the most from it. It’s really amazing to get them to use GIS on a regular basis, and can make a real difference to their work.

Q7. Who in the sector would you like to meet? And what would you say if you met them?

Would love to meet the development team of Google Earth (originally Keyhole, Inc.) and whoever in Google decided to buy and make Google Earth available across the planet. The public release of Google Earth completely transformed the public’s perception and understanding of spatial data, and is also probably a big factor in the release of a large number of spatial data sets through the Open Data movement. Aside from the obvious questions: “Did you know it was going to be that successful?” and “Did you get enough money from Google for it?(!)”, I’d like to ask how they would redesign it to make it a more successful educational tool to highlight the power of spatial data and GIS.

It would also be great to talk to Jack and Laura Dangermond, founders of ESRI, the company behind one of the biggest commercial GIS systems, ArcGIS. I’d like to ask them about how the explosion of spatial data has impacted their original plans for ArcGIS and how all sides of the geospatial community could work together to promote the use of teaching and the uptake of GIS.

If you would like any further information about the training courses Nick will be offering or his work as a Senior GIS analyst, then please contact Kirstin or Nick on 01326 337072 or email us at hello@clearmapping.co.uk.

R for Spatial Analysis Courses in Liverpool and London

This week I have run two courses on ‘Introduction to Using R for Spatial Analysis’ which have been very successful. Both courses sold out, with 15 people attending in Liverpool and 20 in London. We had people with a wide range of GIS and R experience, ranging from no experience in either GIS or R, to significant experience in one but little in the other.

2015-12-02 11.34.09We covered the basics of using R through the RStudio interface, which I find makes R easier to understand for newbies! I certainly found it much easier to learn R using RStudio, and still use it everyday for my R work (I’ve opened the native R interface maybe twice since I started using it!). We also looked projections and coordinate systems (which were at the bottom of a GIS problem a colleague had today) and at spatial data representation, particularly how to create a representative, truthful choropleth map, and I made use of a blog post about this very issue, which I recently tweeted.

2015-12-02 11.34.21We also had a number of very interesting discussions about the pros and cons of R vs other GIS software, such as ArcGIS or QGIS, as well as other languages, such as Python. Each has their own pros and cons, and in my work I regularly use a mix of these, depending on what I am trying to achieve.

 I am also in the process of developing an intermediate course that will focus more on spatial analysis. If you are interested in finding out more about when either the basic or the intermediate courses will be run again, please send me a message (using the contact form on this site) and I will add you to a list to hear about future courses.

All of the material from this course is freely available, and hosted on GitHub. Head over to http://github.com/nickbearman/intro-r-spatial-analysis and you can view the material yourself and work through it at your own pace. You can even use it to contribute to new teaching material, and if you do, please also make your material available through Creative Commons so others can benefit from it as well.

Cross-posted at http://geographicdatascience.com/blog/training/R-for-Spatial-Analysis-Courses-in-Liverpool-and-London/.

Using Google Docs to write a collaborative article

Update (25/02/2016): Article now publised at dx.doi.org/10.1080/03098265.2016.1144729

Google-docs-logoOriginal (30/11/2015): Just recently I have had an article accepted (but not yet published) that I wrote using Google Docs. It was a collaborative article from a writing retreat with 5 people contributing. We needed some way of all being able to contribute to the article and I had heard of people using Google Docs for this before, so I suggested we give it a go. We actually started using Google Docs to write notes and outlines during the writing retreat and then developed this into the final article.

Using Google Docs has a number of advantages over sending email attachments back and forth and bookmarking the page allowed me to have easy access to the article whenever I wanted it. It didn’t solve all of the problems of writing a joint article by any means, as we still needed a lead author to coordinate people, set deadlines and remind people to contribute by the deadlines!

One thing I observed was that is wasn’t very easy to tell different contributors apart – all of the text by default was the same colour, so we ended up changing the text colour manually for our contributions. Later on I discovered the “suggestions” option which did highlights changes in different colours. I didn’t find a way to put this on by default, but had to ask everyone to make sure they had that set before starting their contributions. Fortunately everyone did remember though! We also used the discussion option quite a bit to talk about specific changes. However you still needed someone to “accept” or “reject” the suggestions, which I took on as lead author.

Automatically, I received a notification every time a change was made, which was useful in some ways so I could see when people had been making changes, but I’m not sure I found it that useful. Being not completely trusting of Google, I did take regular backups (through export as Word doc) in case our text just disappeared on us, but we didn’t suffer any of these issues.

Overall, Google Docs was very useful for collaboration, allowing people to write whenever was convenient for them, without having to worry about different file versions. However we still needed someone to lead the paper (me in this case!) to encourage, remind and cajole co-authors to contribute and meet deadlines, like any other writing collaboration.

Discussion in THE GEES network leads to publication in Environment and Planning A!

Cross-posted from https://www.linkedin.com/grp/post/6509105-6011573350550302720

About 6 months ago Sarah Dyer suggested an e-reading group on a recently published paper – Peters K, Turner J, 2014, “Fixed-term and temporary: teaching fellows, tactics, and the negotiation of contingent labour in the UK higher education system” Environment and Planning A 46(10) 2317 – 2331 (http://www.envplan.com/abstract.cgi?id=a46294). The original post is at https://www.linkedin.com/grp/post/6509105-5943727838417948672. THE GEES is a closed group on LinkedIn, but if you would like to join, please just submit a join request. 

Sarah, Helen Walkington, Stephanie Wyse and I met up on Skype to discuss our thoughts on the paper and wrote our discussion up as a Letter to the Editor for Environment and Planning A, which has now been published (http://www.envplan.com/abstract.cgi?id=a4704l)! (The preprint is available on my publications page if you can’t access EPA). 

I really enjoyed the process and it didn’t take too much of our time. If there’s an article you think it would be interesting to discuss, post it up here and see who else is interested.

Thanks very much to Sarah for starting this off for us, and coordinated THE GEES group!

Catrograms

Cartograms are a great way of representing data that refers to people, and it allows you to give urban areas (which generally cover relative small areas) much more prominence than rural areas (which usually cover very large areas). The image below shows the usual geographic representation of the output areas, alongside the cartogram version. Note how the rural cluster (representing about 13% of the population) is very dominant in the ‘standard’ representation, but much less so in the cartogram representation.

Cartogram example

For my presentation at GISRUK2015 on TravelOAC (travel geodemographics) I was presenting a series of cluster data by 2011 Census output areas. Output areas are based around a standard population, with the result that many rural output areas are geographically large and many urban output areas are geographically small. When considering the classification data, it makes sense to give each output area equal consideration, so I decided to create a cartogram of the output area boundaries, based on the usual resident population.

I used a piece of software called ScapeToad which is a quick and easy way to create a cartogram from a custom data set. They have a good set of instructions on their website and the processing of all OAs in England and Wales (181,408 areas, 79mb shapefile) only took 49 seconds.

I was inspired by the cartograms used on the ONS Census Interactive website showing a range of variables. There are a number of ways of generating cartograms, and the ONS team used an approach based on http://lambert.nico.free.fr/tp/biblio/Dougeniketal1985.pdf where the browser does a lot of the heavy lifting. There is also an ArcScript available for ArcGIS at http://arcscripts.esri.com/details.asp?dbid=15638 which I used a few years ago and worked well then, but I’m not sure if it still does now!

P.S. Unfortunately I didn’t manage to see Chris’s presentation on cartogram methods (http://leeds.gisruk.org/abstracts/GISRUK2015_submission_83.pdf) as it was on at the same time as I was presenting!

GISRUK2015 and TravelOAC

I presented my work on TravelOAC at GISRUK this year, based at Leeds. The conference was great and it was a great opportunity to meet an incredible range of people involved in GIS, from engineers, historians, social scientists, spatial information scientists (as they like to be called!), mathematicians and, of course, geographers. We had a great crowd on Twitter as well (#GISRUK2015) who kept everyone up to date on proceedings, and I’d particularly like to mention @adjturner who has made his conference notes available online at . I was also involved in the GIS for Transport Applications workshop, which Robin has written up. Next year, we are at Greenwich, so see you there!

My slides and paper are available, and I have also written a post about how I created the cartograms I used in my work.

Introduction to Using R for Spatial Analysis

On Friday 23rd January 2015, I ran a one day workshop on an Introduction to Using R for Spatial Analysis. We had 18 participants (thanks for squeezing in, everyone!) from a wide variety of backgrounds in R, from never having used R to using R relatively regularly, but not used it as a GIS. The course ran really well, and I was very happy with it, given that it was the first time I had run this course in this format. If you are interested in attending this course in the future, please send me a message (using the contact form on this site) and I will add you to a list to hear about future courses.

I’ve attached the materials I used to this blog post (see below). My material available under the Creative Commons Attribution-ShareAlike 4.0 International License (seehttp://creativecommons.org/licenses/by-sa/4.0/deed.en for details), which means that the material I created for this training session is free for anyone to use, as long as you attribute the material to me, and make any material you derive from this available under the same license. I would also ask you to let me know when you use my material, as it’s useful for me to know how many people are using it, and what sort of courses they are using it for.

Introduction to QGIS: Understanding and Presenting Spatial Data

On Thursday 22nd January 2015, I ran a one day workshop on an Introduction to QGIS: Understanding and Presenting Spatial Data. We had 14 participants from a wide variety of backgrounds, academic areas and geographic locations. The course ran very well, and the participants seemed to enjoy taking the course as much as I enjoyed delivering it! If you are interested in attending this course in the future, please send me a message (using the contact form on this site) and I will add you to a list to hear about future courses.

I’ve attached the materials I used to this blog post (see below). My material available under the Creative Commons Attribution-ShareAlike 4.0 International License (see http://creativecommons.org/licenses/by-sa/4.0/deed.en for details), which means that the material I created for this training session is free for anyone to use, as long as you attribute the material to me, and make any material you derive from this available under the same license. I would also ask you to let me know when you use my material, as it’s useful for me to know how many people are using it, and what sort of courses they are using it for.

Modelling individual level routes and CO2 emissions for home to school

We have recently published a paper in the Journal of Transport and Health where we modelled the impact on CO2 emissions of an increased uptake of active travel for the home to school commute. The paper is freely available to anyone under Gold Open Access, with a CC-BY Attribution license.

One of the challenges in this paper, building upon (Singleton, 2014) was being able to model individual routes from home to school for all ~7.5 million school children in England. In addition to origin and destination locations, we also know what modes of travel are typically used to get to school, thanks to the School Census (also known as the National Pupil Database). While modelling a small number of routes is relatively straight forward to perform within a GIS, the challenge was to complete the routing for all 7.5 million records in the data set.

To calculate the route, we used a combination of two different pieces of software – Routino and pgRouting. Routino allows us to use OpenStreetMap data to derive a road-based route from given start and end points, using a number of different profiles for either car, walking, cycling or bus. The profile used is important, as it allows the software to take into account one-way streets (i.e. not applicable to walking, but applicable to driving), footpaths (i.e. applicable to walking only), cycle lanes, bus lanes, etc.. The screenshot below shows an example route, calculated by Routino.

Screenshot of routing within Routino

Example of the route calculated using Routino for a car travelling from Rosslyn Street (1) to Granby Street (2). © OpenStreetMap contributors, http://www.openstreetmap.org/copyright.

For railway, tram or tube travel, this was implemented using pgRouting from both Ordnance Survey and edited OSM data. The different networks were read into the PostgreSQL database, and routes calculated using the Shortest Path Dijkstra algorithm. This returned a distance for the route, which was stored alongside the original data.

Routino and pgRouting were called using R, which also managed the large amounts of data, subsequently calculated the CO2 emissions model, and created graphical outputs (see below).

Map of CO2 emissions (grouped by residence LSOA) for Norfolk.

Map of CO2 emissions (grouped by residence LSOA) for Norfolk.

To run the routing for each pupil for four years worth of data (we had data from 2007/8-2010/11, although we only used data from academic year 2010-2011 in the paper) took about 14 days on my 27″ iMac. We considered using a cloud solution to shorten the run times, but given we were using sensitive data this was deemed too problematic (see related blog post from Alex on this). This work highlights that it is possible to perform some types of big data analysis using a standard desktop computer, which allows us to perform this type of analysis on sensitive data without needing to make use of cloud or remote processing services, which are often not compatible with restrictions on sensitive data.

*As you would expect, the postcode unit is sensitive data and we had to apply to the Department of Education to use this data. Any postcodes or locations used in this blog post will be examples – e.g. L69 7ZQ is the postcode for my office!

Singleton, A. 2014. “A GIS Approach to Modelling CO2 Emissions Associated with the Pupil-School Commute.” International Journal of Geographical Information Science 28 (2): 256–73. doi:10.1080/13658816.2013.832765.

Cross-posted from http://geographicdatascience.com/r/2014/11/20/Home-School-Routes/