Title image of Extracting boundary data from OpenStreetMap

Extracting boundary data from OpenStreetMap

7 January 2024

·
C#

OpenStreetMap is an amazing project. It’s a free and open geographic database maintained by a huge community of volunteers. Building projects using geographic data is hard because the data is often expensive/impossible to obtain. OpenStreetMap gives us so much data that we didn’t have before. Without OpenStreetMap so many things would not be possible.

I recently built a project looking at house prices across the UK using OpenStreetMap. I wanted to show house prices broken down by each area of the UK. From OpenStreetMap I was able to extract the exact boundaries of the UK broken down to a really fine level. Around 12,000 boundaries were extracted from OpenStreetMap which became the basis of the entire project.

Extracting the boundaries from OpenStreetMap was a bit tricky. It took a while to understand the structure of the data and write the code to extract the boundaries. So I thought I would share what I’ve done. The code examples are written in C#, but adapting them to other languages should be easy.

Data structure

The first thing you will notice when looking at OpenStreetMap is the data is in XML. When writing the code I recommend going to https://www.openstreetmap.org/ and using the Export button to get a small piece of the data to work with. Then when your code is complete use https://download.geofabrik.de/ to download the OpenStreetMap for your target country.

There are three elements we need to extract to get the boundaries: Relations, Ways and Nodes. Relations are where the boundaries are defined and contain things like the Name and Type of the boundary. Relations also contain links to the ways that make up the boundary. Ways are basically the lines you see on the map. Each boundary is made up of a series of ways. Each way is then made up of a list of nodes which are the individual points on the map. Every Node has a latitude and longitude.

So to get all the points of a boundary we need to work down the hierarchy of Relations, Ways and Nodes.

Hierarchy of the OpenStreetMap data structure

Relations

Here is an example of what a Relation looks like:

<relation id="11014562" visible="true" version="3" changeset="83843935" timestamp="2020-04-20T22:14:22Z" user="Colin Smale" uid="30525">
  <member type="way" ref="793570882" role="outer"/>
  <member type="way" ref="793545584" role="outer"/>
  <member type="way" ref="793533785" role="outer"/>
  <member type="way" ref="793570883" role="outer"/>
  <member type="way" ref="793570881" role="outer"/>
  <member type="node" ref="737420402" role="admin_centre"/>
  <tag k="admin_level" v="10"/>
  <tag k="boundary" v="administrative"/>
  <tag k="council_name" v="Croesyceiliog and Llanyrafon Community Council"/>
  <tag k="council_name:cy" v="Cyngor Cymuned Croesyceiliog a Llanyrafon"/>
  <tag k="designation" v="community"/>
  <tag k="name" v="Llanyrafon"/>
  <tag k="name:cy" v="Llanyrafon"/>
  <tag k="ref:gss" v="W04000766"/>
  <tag k="type" v="boundary"/>
 </relation>

Relations have a lot of metadata that can be used like the name and type of the boundary. It’s important to remember Relations define all sorts of things in OpenStreetMap like waterways, car parks and bicycle routes. So the code needs to only extract relations with the type property of “boundary”.

All the member elements with the type “way” are the links to the ways of the boundary. The ref attribute is the id of the linked Way.

Ways

This is an example Way:

<way id="793570882" visible="true" version="1" changeset="83782566" timestamp="2020-04-19T18:58:18Z" user="Colin Smale" uid="30525">
  <nd ref="7419971382"/>
  <nd ref="7419971387"/>
  <nd ref="7419971218"/>
  <nd ref="7419971207"/>
  <nd ref="7419971197"/>
  <nd ref="7419971196"/>
  <nd ref="7419971168"/>
  <tag k="admin_level" v="10"/>
  <tag k="boundary" v="administrative"/>
  <tag k="source" v="OS_OpenData_BoundaryLine"/>
 </way>

Ways have less information that’s useful. The only thing that’s needed is the “nd” elements which are the links to the Nodes.

Nodes

Finally, this is an example of a Node:

<node id="7419971382" visible="true" version="1" changeset="83781279" timestamp="2020-04-19T18:13:08Z" user="Colin Smale" uid="30525" lat="51.6514220" lon="-3.0103913"/>

The end of the chain and the smallest element. Nodes finally give the latitude and longitude coordinates needed for the boundary outlines.

The Code

Parsing XML is not very fun and the code I’ve written to do it isn’t very nice. I’m not going to attempt to explain it step by step in this post. Instead, I’ve uploaded it to a repo if you want to use it https://github.com/Liam-Hunt/OSMBoundaries (It’s a C# project).

In a broad stroke, the program reads through the file three times to get each of the needed components. Relations, Ways and Nodes. It saves all of the tags associated with a boundary to a dictionary if you need any of them.

Output

This is what an extracted boundary looks like.

{
	"Name": "England",
	// Need a list of list because some boundaries have seperate islands not connected to main boundary
	"Boundaries": [{
			"Type": "outer",
			"Points": [{
					"Lat": 51.6344860,
					"Lon": -2.9993130
				}, {
					"Lat": 51.6356431,
					"Lon": -2.9992563
				}
			]
		}
	],
	"Tags": {
		"admin_level": "10",
		"boundary": "administrative",
		"designation": "community"
	}
}

I’m not expecting anyone to need this so I’m not going to write anymore 😀

But if you need any more details please leave a question below.