How to load geographic data to BigQuery easily and without errors
A short tutorial for loading Geo-spatial data to Google’s Big Query in common and open source tools.
Modern GIS and Spatial SQL GIS are the present and future of GIS.
I believe the future of GIS is in the cloud.
More and more companies and organizations are moving to the power of the cloud.
Whether it is an organization that is already using spatial abilities with its data or wants to do it. It is only common sense to do it all in one place, where the organization can reach it, with no need for special software.
This short tutorial will explain from my experience how to load geographic data to Google Big Query, easily and without errors.
Google Cloud’s Big Query has the largest amount of spatial functions
(as far as I have tested). Comparing to similar cloud-basd services. This means that one can add any tabular data to the cloud, add geo-spatial geometry, and start solving problems with spatial functions that is unrestricted by compute power, storage space, or licences.
Google, Amazon, and Microsoft have vast amount of open datasets, both spatial/non-spatial, which can be integrated and used in processing. For example, a continuously updated global coverage of “OpenStreetMap”, and much more. Furthermore, you can easily open a web browser and run some spatial queries on vast datasets without having to download and build infrastructure locally.
The combination of: compute power, hardware infrastructure, datasets , spatial SQL functions and web maps are turning the cloud into the present and future (spatial) data platform.
While it sounds promising, when it comes to using your geo-data or data sources that are not already in the cloud, the process of loading your spatial data to Google Big Query (and to other cloud services) is not as easy as it sounds.
In this blog, I will present best practices and tips on how to prepare and load spatial data with common and open source tools, while overcoming some of the frustrating message errors.
The first phase is to open a Google Cloud Project (and a Google Account if you don’t have one). Google requires your credit card for using the API, but will not charge you if you don’t run over their generous free tier. It is one place to use all of their APIs. This blog post assumes that you already have a google account and a Project. There are a lot of tutorials and help available on how to do this.
The recommended data format for Google Big Query to ingest your data is a simple text file. Convert your GIS data to CSV with a WKT column for the geometry. That’s it!
Reproject the data to WGS84 (ESPG:4326). Big Query currently supports the WGS84 datum. If you are not sure whether your data is already in WGS84 or not, simply reproject it to WGS84 anyway. Use the “Reproject tool” in QGIS or better, use GDAL’s OGR2OGR command line (see below)
Verify that the file is encoded in UTF8 (but not BOM encoding). If you want to check it or convert it, you can use Notepad++ (free). Omit all unnecessary columns (such as OBJECTID/area/length) – you don’t need them and it just enlarges your cloud storage usage.
How to convert the GIS(Shapefile, Geopackage, etc.) file to CSV with WKT?
Use your favourite GIS package. I recommend working with GDAL’s ogr2ogr command line,which is: open source, light, fast, reliable and can be automated/batch. Alternatively, use the QGIS interface: just right-click a layer-> export ->to CSV format with “geometry as WKT”.
Or in one simple line in GDAL’s ogr2ogr:
reproject any GIS format from any coordinate system, convert to CSV with WKT column and optionally remove fields and filter data. It’s also pretty simple to create a shell script to automate it on multiple files.
For example, export, reproject, and from a shapefile to CSV ready for Big Query:
ogr2ogr -t_srs EPSG:4326 -f CSV output.csv Input.shp -lco GEOMETRY=AS_XYZ
Loading data manually to Big Query via the web console:
- Open the Google cloud console and select your Project
Expand the Explorer, and click on the side menu (dots) “view actions”.
Click on “create data set”.
The data set is like a folder or a workspace. It is used for arranging tables by category.
Now click on the created data set, and click Create table (from the side menu or the button on the right tab)
Source/”Upload”, then browse and select your CSV file.
The manual process is limited to a 100MB file size, so you will need other advanced methods, which I will discuss later.
Destination/Table: name your table
Schema: Check the “AutoDetect”
I recommend checking the Advanced: ‘“Quoted newlines” for preventing errors in some of the files.
Common problems loading Geo CSV to Google Big Query, and how to solve them:
Error: ‘Missing close double quote (“) character’.
Solution: Verify you had checked Quoted newlines in the advanced lines, usually in Geometry data there are new lines as inside each geometry which make google big query reject it.
Error: While loading with schema auto-detect, Big Query rejects loading the data with errors about fields expected to be an integer or other errors regarding the schema.
Solution: uncheck auto-detect and add the fields one by one with their types (or copy them from the text editor/sheet)
If it still won’t help, change all columns type to string
Problem: CSV loading succeeded but for an unknown reason the field names in Big Query are shown with some default names such as string_field_0, string_field_1 …
Solution: Same as in 2 above. Add field names manually.
- Problem: field with date/time format is not imported or is wrong.
Solution: check the format. DATE/TIME must be in YYYY-MM-DD HH:MM: SS[.SSSSSS] format. You will need to change it in your Database/sheet/GIS software.
Test your Geo-spatial data in a Google Big Query:
After data is loaded to Big Query, verify all columns are there with schema/preview tabs. Especially your WKT column.
The column type is a string, you can use it as is or convert it to geography when you have big data or want to improve performance. (can be done directly while loading but it needs to specify the whole of the schema)
To test the geography just open a query tab for the table. Google will add a default select query in which you can leave only the table name and the limit.
The syntax is pretty simple:
ST_GEOGFROMTEXT(WKT_Column) AS WKT,*
Viewing and analyzing Big Query results (on map)
After successfully running the query, Big Query will open the results in a table. You can preview it and see how long it took to process.
To view results on a map, just open the menu “Explore data” and select “Explore with GeoViz”.
It will open a GeoViz map. You will need to authorize it in the first time.
Now you can play with various map visualizations such as thematic maps and more.
Other options to explore the data in live connections are: “Looker Studio” where you can build dashboards (also with maps) and share insights.
You can also explore the data in Google Sheets which offers automatic options to analyze your data
- Exporting your Big Query data to a file format that can be imported to almost any GIS or other platform for analysis is easy: