Wednesday 13 July 2011

Open Street Vector Map to SQL Server

 

I’ve already written several posts about loading spatial data into SQL Server – using OGR2OGR, or Shape2SQL, for example. In this post, I’m going to demonstrate how you can call OGR2OGR from a PowerShell script in order to loop through and load the entire set of OS VectorMap layers into SQL Server in a few simple lines of script. Note that you’ll need the latest version of OGR2OGR (1.8) in order to play along at home, since I’ll be using the direct MSSQLSpatial driver introduced in that version.

Create the Tables

OS Vectormap data is supplied in a number of different layers, including roads, natural features, settlement areas, railway lines etc. Some of these layers represent point data, some polylines, and some polygons. (Note that there are no multi_xxx features, and a given layer contains only features of one type). Unlike some spatial formats (such as ESRI shapefiles), SQL Server lets you mix ‘n’ match different types of geometry within the same column. You could, if you wanted to, merge all the OS VectorMap features into a single table. However, since there are different additional attributes associated with each type of feature, it generally makes sense to define separate tables for each feature layer.

OGR2OGR can create tables for each feature layer in SQL Server directly, creating geometry or geography columns to hold spatial data itself and additional columns to hold any non-spatial attributes in the source data file. However, I find this to be quite unreliable; sometimes, the field lengths of the columns in the table created by OGR2OGR are set too small (e.g. numeric(5,2)), causing attribute data inserted into the table to be truncated or lead to an overflow error. Sometimes, column field lengths are set too large, which is wasteful (i.e. using nvarchar(max) to store a fixed-length two-byte string). And, sometimes, the datatype chosen for a column is just plain wrong. i.e. creating a float column to store a set of purely integer values.

So, I’d always recommend that you create the destination tables for each layer manually, specifying the correct column datatypes yourself before using OGR2OGR to load data into it. The following script can be used to create individual tables for each of the feature layers in the OS Vector Map dataset – road/railway/settlement layer/heritage/community services etc. – including all associated attribute values. Notice that I’m using the geometry datatype, and I’ve named the spatial column geom27700, indicating that all the data inserted in this column will be projected data using the EPSG:27700 spatial reference system (the National Grid of Great Britain, as used by all Ordnance Survey data):
CREATE TABLE dbo.administrativeboundary(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(255) NULL,
 featcode int NULL,
 CONSTRAINT PK_administrativeboundary PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.road_line(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 name varchar(254) NULL,
 number varchar(30) NULL,
 CONSTRAINT PK_road_line PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.railway_line(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 CONSTRAINT PK_railway_line PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.communityservices(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 name varchar(254) NULL,
 CONSTRAINT PK_communityservices PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.height(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 height varchar(254) NULL,
 CONSTRAINT PK_height PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.heritage(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 name varchar(254) NULL,
 CONSTRAINT PK_heritage PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.naturalfeature_area(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 CONSTRAINT PK_naturalfeature_area PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.naturalfeature_line(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 CONSTRAINT PK_naturalfeature_line PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));
CREATE TABLE dbo.railway_point(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 name varchar(30) NULL,
 CONSTRAINT PK_railway_point PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.settlement_line(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 CONSTRAINT PK_settlement_line PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.settlement_area(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 CONSTRAINT PK_settlement_area PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.text(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 name varchar(254) NULL,
 xml_name varchar(254) NULL,
 fonttype varchar(150) NULL,
 fontcolour varchar(30) NULL,
 fontheight varchar(30) NULL,
 orientatio numeric(30, 1) NULL,
 CONSTRAINT PK_text PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

CREATE TABLE dbo.tidalboundary(
 ogr_fid int IDENTITY(1,1) NOT NULL,
 geom27700 geometry NULL,
 featdesc varchar(254) NULL,
 featcode int NULL,
 CONSTRAINT PK_tidalboundary PRIMARY KEY CLUSTERED
(
 ogr_fid ASC
));

Loading the geometry data with Powershell

The OS VectorMap dataset is cut into grid squares 100km x 100km across, denoted by a two letter combination, e.g. TG. Each of these squares are further subdivided into 100 10km x 10km squares, denoted by a two letter and two number combination e.g. TG20. So the directory structure of the OS VectorMap data looks like this:

image

To load the complete set of data, you need to loop through all subdirectories starting from a given base directory, then call OGR2OGR to append the data in each shapefile found therein into the appropriate feature table based on its filename. Note that not every feature layer will be found in every subdirectory (you won’t, for example, find many tidal features in Derbyshire).

And here’s a powershell script to do that loading for you. Just change the name of the base directory from which to search (c:osdata), the path to ogr2ogr.exe (c:warmerdabldbin), and the connection string to your SQL Server instance as appropriate:
get-childitem c:osdata -include *.shp -recurse | foreach ($_) {
  echo "Now loading: " $_.fullname
  C:warmerdabldbinogr2ogr -progress -append -f "MSSQLSpatial" "MSSQL:server=zangiefSQLEXPRESS;database=OSVectorMap;trusted_connection=yes;" $_.fullname -a_srs "EPSG:27700" -lco "GEOM_TYPE=geometry" -lco "GEOM_NAME=geom27700"
}
The layer creation options (-lco) supplied to ogr2ogr specify that this is geometry data in the EPSG:27700 coordinate reference system, and should be loaded into the geom27700 of each table. The other attribute values of each feature will be copied straight into the corresponding columns. Execute the script and, if all goes to plan, you should see the following as the files start to load:

image

Note that loading the full set of OS Vectormap will take some time so, at this point, I’d go and get a coffee or browse facebook or something. Then, when you return, you should be in possession of one table containing the combined data for each OS VectorMap feature layer, a bit like this:

image

No comments:

Post a Comment

Share This Post