10. Output Formats

In order to support further data processing with different applications, Survey2GIS can produce a variety of output formats.

For archival purposes, it is recommended to always store copies of the original raw input data files.

10.1 ESRI Shapefile

Option:

-f shp

The “ESRI Shapefile” format is the default output format for Survey2GIS. Although it is not a real standard in the sense that it has been openly developed and documented by an independent organisation, it is the most widely used vectorial data format in the GIS world, due to the fact that its inventor ESRI has released a detailed specification (https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf).

Technically speaking, a “Shapefile” is not a single file but a set of files (three or more) with the same base name and different extensions. The files generated by Survey2GIS will have the extensions “.shp”, “.shx” and “.dbf”. The latter is a file in DBase format that holds the table of attribute data for each geometry stored in the “.shp” file.

The Shapefile format is subject to a number of limitations, many of which arise from the use of DBase as attribute table format:

The maximum file size is limited by the size of the (32 bit integer) index in the “.shx” file. Typical limits are 2 or 4 GB, depending on the GIS’ capabilities.
Field names can not be longer than 10 characters (DBase limit).
A stored decimal attribute value can have a maximal total length of 18 places. When storing values from very large numerical ranges, the decimal precision may have to be limited (DBase limit).
A stored text attribute value can not be longer than 254 characters (DBase limit).
The attribute table in DBase format does not store any information about the text encoding. If text fields use special characters that are not part of the ASCII set, then the user must take care that all software used to process the data uses the right encoding.
A single Shapefile dataset can only store either points, lines or polygons. For this reason, Survey2GIS will automatically split the output into separate Shapefiles as required.

10.1.1 ESRI Shapefile Label Layers

When exporting a label layer (see 5) for Shapefile output, an additional points Shapefile will be produced with the following attributes:

labeltext This is a string (text) type field that contains the text to be used as a label at each point.

fonttype A text type field that contains the name of the font to be used for the labels. This is set to “Arial” by Survey2GIS.

fontstyle This field contains the font style as an integer code. It is set to “0” (plain) by Survey2GIS. Commonly used values are: “0” (plain), “1” (bold) and “2” (italics).

fontcolor The font color to use for labels is encoded using an integer value. The coding follow the rules of the Java class java.awt.Color (see https://docs.oracle.com/javase/7/docs/api/java/awt/Color.html#getRGB() for details). The value exported by survey2gis is “-16777216” (black).

fontsize This is a double type field that contains the font size for the label text (“10.0” as set by Survey2GIS).

fontrotate This is a double type field that contains clockwise rotation of the font size (“0.0” as set by Survey2GIS).

geomtype This integer type field encodes the geometry type for which a label has been generated. Possible values are: “0” (point), “1” (line) and “2” (polygon).

Note that this attribute field structure has been chosen for compatibility with the “Annotations” functionality of gvSIG CE (http://gvsigce.org). In addition, a “.gva” label settings file will be produced (this can be ignored by users of other GIS). Label layers produced by Survey2GIS can be loaded as annotation layers into gvSIG CE and modified using that GIS’ special annotation tools.

10.1.2 Null (“no data”) in DBF Attribute Tables

The following is the default behavior if the user has not specified a “no data” value as part of the parser description (see option “no data” in 4.3):

Attribute values that are “null” (no data) will be represented by setting all characters of the corresponding field to spaces in the DBF attribute table. The interpretation of this depends on the GIS. Commonly, text attributes will be read as empty strings, numeric attributes as “0”.

10.2 Drawing Exchange Format (DXF)

Warning 1: Do not rely on DXF as exclusive output format for long-term data storage and archiving! For details, see the discussion below.

Warning 2: Data in DXF files is not suitable for spatial analysis that relies on the topological correctness of data! For details, see the discussion below.

Option:

-f dxf

DXF (Drawing Exchange Format) is a file format that can be processed by most CAD applications and (as opposed to e. g. DWG) is relatively simple in structure, with format documentation being available from AutoDesk (https://www.autodesk.com/techpubs/autocad/acad2000/dxf/). It allows storage of all geometry types as separate layers in a single file. However, it is not suitable for storing attribute data, since it does not support relational data structures. The DXF file itself can be used to store a handle for each drawing object (point, line or polygon) which can act as a “primary key” to join additional attribute data to the geometries in a GIS (see also 10.2.4).

The DXF output of Survey2GIS has the following characteristics:

The file format version used is “AC1015” (AutoCAD Release 15/AutoCAD 2000).
Points, “raw” points, lines and polygons are each stored in a separate layer. Raw points are labeled with their original coordinates.
Each drawing object (except for “raw” points) has an integer type handle that corresponds to the “geometry ID” assigned to each object by Survey2GIS and is also stored in a separate DXF layer.
In addition, each attribute field is written to a separate DXF layer.
Handles and field values are plotted at point coordinates, at the centres of areas/polygons, and at the central vertices of lines (one for each part).
Most of these layers are switched to “invisible” by default. However, not all CAD interpret this correctly and as a result switch all layers to “visible”, regardless.
Polygons are degraded to polylines, unless 2D output (option “-z”) is produced (see 10.2.3).
The topological quality aspects “correct order of boundary vertices” and “holes” are not supported (see 9).
Full attribute data is written to a separate, simple text file (see 10.2.4). The values in the first column (“geom_id”) correspond to the DXF object handles.

The DXF output option in Survey2GIS exists solely for the purpose of allowing CAD-based publishing workflows. The use of DXF for storing and/or processing topographical data is subject to severe limitations.

The DXF file produced by Survey2GIS always uses a simple point (“.”) as decimal separator for coordinate values, ignoring the “–decimal-point=” option, as well as any operating system settings.

10.2.1 DXF and Data Archiving

Regarding long-term storage/archival of data, it must be noted that DXF is (like all proprietary CAD formats) not independently standardized and is in constant flux. DXF is a registered trademark of AutoCAD producer AutoDesk. New versions of the DXF specification are released alongside each new version of AutoCAD (this is necessarily so, as DXF is a file-based, sequential representation of AutoCAD’s internal, hierarchical object database). Most problematically, the format specification explicitly encourages a proprietary use of some of its elements. Programmers of “add-ons” or “plug-ins” are free to invent new classes and decide whether or not to include them into the DXF output, or even document them. All of this means that compatibility of this format with current software cannot be sustained in the long term and that DXF is not suitable for long term data archival. However, Survey2GIS uses only a small subset of the format and produces a relatively simple DXF ASCII file that allows reconstruction of the original data.

For long-term data storage, make sure to archive the original raw data and use a simple, well-documented GIS format, such as Shapefile (s. 10.1).

10.2.2 DXF and Topological Data

The import of DXF-stored data into a GIS for subsequent spatial data analysis is a common practice that must nevertheless be strongly discouraged. DXF is ignorant of topological data quality (s. 9) and, most importantly, does not have a simple and useful (in a GIS context) representation for areal objects (polygons). There is no reasonable way to represent properties such as holes and multi-parts in DXF-stored polygon. This will inevitably result in difficulties when attempting to link spatial data (geometries) with attribute data.

For GIS-based processing, let Survey2GIS create its output in a GIS format, such as Shapefile (s. 10.1).

10.2.3 DXF and Planar Polygons vs. Polylines

Note: Planarization methods are not yet implemented in Survey2GIS!

As mentioned above, AutoCAD (and thus DXF) does not have a simple representation of an areal object, i. e. a polygon. Instead, such objects are modelled as hatch patterns that are set into the boundaries of polyline objects. However, the AutoCAD/DXF hatch pattern is defined to be a planar object. What this means is that all vertices on the boundary of the hatch pattern (i. e. the polyline) have to lie within the same X-Y plane. This restriction cannot be met by real-world surveying data, which will have variation in the Z coordinates, due to the geometry of the underlying natural surface. As a result, the following options are possible for DXF output:

Export all polygons as polylines (default behaviour).
Produce 2D output on export, thus flattening all Z coordinate values to “0.0”.
Planarize the vertices of each polygon so that they lie in the same X-Y plane and the object’s plane is parallel to the X-Y plane of the world coordinate system. The latter restriction is to make sure that AutoCAD will actually draw the hatched area when looking onto the objects from a top-down perspective.

Note that, in theory, the same problem exists for GIS and polygons with Z data (the mathematical definition of a polygon being that all of its vertices lie one the same plane). In practice however, non-planarity of polygons is simply ignored by 2D GIS.

More information on the subject of planarization of polygons can be found in ??.

10.2.4 DXF and attribute Data

There is no perfect way of associating attribute data records with drawing objects in a DXF file. Survey2GIS offers a simple work-around: An additional text file (with extension “.txt”) is produced that stores one attribute table record per line.

The lines below are an example of what an attribute table text file might look like (excerpt):

geom_id;const1;idx;planum;type;extra;number;coorx;coory;coorz
0;123.450000;67;1;"GR";"W";0;3513037.664000;5279881.392000;399.563000
1;123.450000;1;1;"LO";"W";0;3513041.874000;5279875.482000;399.025000 
2;123.450000;15;1;"LO";"
...

There are only a few formatting conventions that apply:

Fields are separated by “;” (semicolon).
The first line contains the names of the fields.
The contents of text fields are surrounded by quotes (“).

The first field is always the integer type field geom_id. This field contains a primary key that matches the handles for all drawing objects in the exported DXF file and can be used to associate CAD objects and table records.

No attribute data is written for “raw point” geometries (vertices).

The attribute text file produced by Survey2GIS always uses a point (“.”) as decimal separator for coordinate values, ignoring the “–decimal-point=” option, as well as any operating system settings.

10.2.5 Null (“no data”) in DXF Attributes

The following is the default behavior if the user has not specified a “no data” option value as part of the parser description (see 4.3):

The representation of attribute values that are “null” (no data) depends on the attribute type: text fields will be written as empty strings, numeric fields as “0”.

10.2.6 DXF Label Layers

Labels (see 5) are stored as a separate layer inside the DXF.

10.3 GeoJSON

The GeoJSON output is a single plain text file that contains a JavaScript style object representation of all features produced by Survey2GIS. GeoJSON is not very space-efficient and not a typical end-user format. It is more commonly used for exchanging data between spatial databases and WebGIS applications. The GeoJSON file produced by Survey2GIS follows the official specification at http://geojson.org closely (but mind the note on coordinate references, below!) and uses human-readable formatting.

Note: Similar to KML (see 10.4), GeoJSON was designed for large-area data coverage with geographic (latitude/longitude) coordinates. In fact, since its RFC 7946 revision, the GeoJSON standard no longer supports any coordinate reference system other than lat/lon data with WGS 84 datum (equivalent to EPSG 4326). However, since GeoJSON is useful for direct integration of Survey2GIS into a survey data processing pipeline or database backend, this software will continue to allow output of non-conforming GeoJSON with other coordinate system types, unless running in “strict” mode (see 2.3). Be aware that there is no way of storing SRS information in GeoJSON output and that strictly standard-conforming software might not be able to process such output correctly.

Features are sorted by geometry type (to minimize occlusions when importing the data as a multi-geometry GIS layer): first raw vertices, then points, then lines and finally polygons, and stored in a comma-separated list that contains plain, human-readable entries. Feature geometries are stored in the “geometry” part and attributes in “properties”:

{ "type": "FeatureCollection",
"features": [
{ "type": "Feature", "id": 0,
"geometry": {
"type": "Point",
"coordinates": [3513040.585000, 5279881.854000, 399.102000]
},
"properties": {
"geom_id": 0,
"const1": 123.450000,
"idx": 256,
"level": 1,
"type": "Gold",
"aux": "FZ",
"_id": 0,
"coorx": 3513040.585000,
"coory": 5279881.854000,
"coorz": 399.102000
}
},
...

A “geom_id” property of type integer is always added and represents a key value that is unique within the scope of the GeoJSON object file.

Note that there is no explicit attribute type description. Whether a field is of type “text” (e. g. “type” in the example output above), “double” (e. g. “coorx”) or “integer” (e. g. “level”) is simply determined by the format of the value.

The GeoJSON file produced by Survey2GIS always uses a simple point (“.”) as decimal separator for coordinate values and numeric attribute values, ignoring the “–decimal-point” option, as well as any operating system settings.

10.3.1 GeoJSON Geometry Types

The GeoJSON output produced by Survey2GIS uses strongly typed GeoJSON geometries. What this means is that the most simple type is chosen which is capable of representing the feature produced by Survey2GIS:

Points are stored as GeoJSON type “Point”.
Polygons are stored as GeoJSON type “Polygon” if the consist of only one part; they are stored as type “MultiPolygon” if they consists of more than one part.
Lines are stored as GeoJSON type “LineString” if the consist of only one part; they are stored as type “MultiLineString” if they consists of more than one part.

Note: Since features of Survey2GIS types “point” and “raw point” share the same GeoJSON geometry type (“Point”), it will not be possible to differentiate between the two in the GeoJSON output file! If the latter is an issue for your workflow, then consider using a geometry type selection (see 6) and exporting to separate output files.

10.3.2 GeoJSON Primary Keys (Field “id”)

The GeoJSON standards reserves a field named “id” for use as primary key. If the input data already contains a field of that name, then it cannot be written to the GeoJSON “properties” member without clashing with the primary key.

In such case, Survey2GIS will attempt to resolve the problem by renaming the user-defined “id” field to “_id” (underscore plus original field name). In case a user-defined “_id” field also exists, this resolution will fail and the program will abort with an error message.

Note that, theoretically, the above should not be an issue, since the GeoJSON specification states that “id” should be part of the feature object itself, not its “properties” member. However, it seems that some GeoJSON drivers (notably http://www.gdal.org, which is used in most open source GIS) do not make this distinction (Survey2GIS does place “id” into the “type” member of each feature).

10.3.3 Null (“no data”) in GeoJSON Objects

The following is the default behavior if the user has not specified a “no data” value as part of the parser description (see 4.2):

The representation of attribute values that are “null” (no data) depends on the attribute type: text fields will be written as empty strings, numeric fields as “0” (integer) or “0.0” (double).

10.3.4 Label Layers in GeoJSON

There is only limited support for labels in GeoJSON output.

Label properties (location, font settings, etc.: see 5) are stored as part of the “properties” (see above) section of the labelled features; labels are not stored as separate point features (as opposed to e. g. Shapefile output: see 10.1).

Since GeoJSON allows only one “properties” member per feature, there will also only be one set of label X and Y coordinates per feature. These coordinates will represent the label position of the first part of multi-part line and polygon geometries.

10.4 Keyhole Markup Language (Google KML)

The Keyhole Markup Language (Google KML) is an XML-based format that was popularized for use with Google Earth (https://www.google.com/earth/). Accordingly, the sole purpose of KML is to provide data for flexible 3D visualization in Google Earth. KML sacrifices accuracy for speed and simplicity and is therefore not a suitable format for subsequent data processing and archival.

Note: The most relevant limitation of KML is that it supports (or rather: assumes) strictly latitude/longitude coordinate data (in decimal notation) with an assumed WGS 84 datum (which is really Web Mercator: see 8.1.4 for details). Therefore, KML is only available for survey data that uses lat/lon coordinates (e. g. GPS data) or data that can be reprojected to lat/lon coordinates, such as UTM data (see 8.1). In the latter case, a suitable output SRS must be set (see 8.1).

The KML produced by Survey2GIS has been designed to keep large survey datasets manageable in Google Earth.

KML uses “Placemarks” to represent features (points, lines and polygons plus attributes) and these can in turn be grouped into “Folders” which can be used to collectively turn Placemarks on or off in Google Earth. The KML produced by Survey2GIS contains the following Folders:

Points
Lines
Polygons
Vertices
Labels

Most of the above should be self-explanatory. The folder “Vertices” contains the raw measurements for each point, line or polygon vertex, and exists only if raw data output is chosen (option “–raw-data”, see 2.3); the contents of this folder will not be visible by default. The folder “Labels” contains user-defined labels (if chosen, see 5, also 10.4.2). The drawing order is determined by Google Earth.

Attribute data is stored using the “Extended data” feature of KML. This allows Survey2GIS to define a data schema and store exact representations of the attribute data. All attribute fields and their types will be visible when clicking on a Placemark in Google Earth.

Note that Google Earth has a reputation for producing display problems with complex polygons. Presumably, the reason for this is that its 3D tessellation algorithms are not sufficiently accurate and over-optimized for display speed.

The KML file produced by Survey2GIS always uses a simple point (“.”) as decimal separator for coordinate values and numeric attribute values, ignoring the “–decimal-point=” option, as well as any operating system settings.

10.4.1 KML Geometry Types and Encoding

Coordinates are written into the KML file in this order: longitude (decimal degrees), latitude (decimal degrees), elevation (always assumed to be in meters).

The geometry types supported by KML Placemarks are:

Polygon
LineString
Point

Multi-part geometries are represented by storing more than one Polygon or LineString per Placemark element.

Google Earth expects the vertices of polygons (outer boundaries as well as holes) to be written in counter clockwise order (GoogleEarth uses this to determine the direction of 3D faces for its artificial lighting). This order is automatically enforced by Survey2GIS when exporting the vertex data. Elevation (Z) data can be interpreted in various ways by Google Earth, but Survey2GIS always sets the KML element <altitudeMode> to “absolute”, meaning that elevation data is taken to be in meters above sea level.

Note that there may well be local mismatches in the accuracy between the Google Earth elevation model (DEM) and the measured Z coordinates in the survey data, which can lead to some visual artefacts in Google Earth.

10.4.2 Label Layers in KML

The KML output will contain a dedicated folder “Labels” which contains the labelled Placemarks for all features, with label locations and contents defined by the user (see 5 for details).