GIS data is what makes a GIS map more than a simple reference map. Data expands the richness of a map: giving the user a deeper, more insightful view of an area or project.
As a category, GIS data is quite broad, with considerable variation in terms of:
Remember, as a concept, GIS can be defined as the intersection of data and location.
Just jumping in? Check out part one: What is GIS?
In the last chapter, we covered the location component: mapping. However, GIS mapping requires more than coordinates.
GIS mapping requires data.
In this chapter, we’ll cover the two basic data types (vector and raster), common GIS file formats, and our top five GIS data sources.
For a quick overview of how maps and data come together with GIS, check out our simple guide Intro to GIS: What It Is and How to Get Started.
Vector data is, essentially, a list of coordinates: one that provides instructions on how an image should be rendered.
Vector images are high-fidelity graphical representations of an image or shape.
This graphical property means that vector images are infinitely scalable. They can be enlarged or reduced with no quality loss.
This makes them the preferred file type for web logos and large-scale prints.
Vector images can only be created and manipulated with a computer program like Adobe Illustrator or Sketch. You cannot, for example, use a camera to capture a vector image.
Vector images consist of three basic components: points, lines, and polygons.
Vector points are basically x,y coordinates. They don’t have dimensions and usually represent single data points.
In GIS mapping, vector points illustrate features too small to be drawn at scale.
For example, if you're creating a map of a specific city, you'd use lines to draw the city's boundaries. However, create a map of the entire country and that boundary is no longer visible - so a labeled point is used instead.
This image of Colorado illustrates this concept perfectly: the state capital being represented as a labeled point.
Vector lines are a series of connected vector points.
They have distinct start and end points and, though they can intersect with one another, a single line will not intersect with itself.
Vector lines are used to represent linear features such as rivers, roads, and trails.
Color, thickness, and line type (solid or dashed) are used to denote unique features, or unique attributes of the same feature.
For example, a heavily trafficked highway might be drawn with a thick line, whereas the residential roadway would be much thinner. Moreover, streets could be solid black lines, while the river might be dotted and blue.
Stylistic choices like these are at the map makers discretion, but can add depth and visual interest to the map.
Polygons are lines in which the first point is also the last: creating a shape.
Polygons represent features with distinct boundaries: states, property lines, lakes, etc.
Though they're most frequently used to represent perimeter, with modern GIS, polygons can also be used to measure a feature’s area.
Where vector data (coordinates that create an image) is somewhat abstract, raster data is quite literal.
Raster data is grid or pixel based.
Commonly found as aerial surveys, topographic maps, and satellite imagery, raster file extensions include TIFF, PNG, and JPEG.
In GIS mapping, raster data generally represents surfaces.
Unlike vector data, raster data cannot be scaled infinitely. Enlarge it too much and it becomes fuzzy and pixelated. Stretch too much in one direction and the features distort.
Despite these limitations, raster data does have advantages; chiefly, it provides a level of detail not possible with vectors.
Take digital photographs as an example.
Photographs provide an immense level of contextual detail and represent the subleties of light and color quite accurately.
Consider the images shown here. The first depicts vector images of trees, the other is a raster photograph.
Both images depict trees accurately. However, the raster photograph is not only more detailed, but is more visually nuanced.
In terms of GIS mapping, raster data comes in two types: discrete and continuous.
Discrete data can only take specific values, whereas continuous data can take any value within a range. For example:
The number of people in a room is a discrete value. You can have any number of people, but you can’t have half a person. You’re limited to whole numbers: no decimals or percents.
Continuous data is more flexible, including values such as height, weight, and length. A person's height can be any value within the range of human heights. In fact, most people’s height is not exact to the inch or foot.
Continuous and discrete data are complementary, but do have different applications.
The map on top illustrates discrete raster data.
Each value is assigned a different color, while each cell has only one data type and one color: there’s no gradation of either.
In contrast, the map on the bottom represents continuous data.
Each grid cell contains some level of gradation. Continuous rasters are often used to represent data that experiences gradual change: temperature, population, elevation, etc.
Shapefiles are, by far, the most common GIS file type.
Developed by GIS powerhouse ESRI, shapefiles are a simple way to store and share GIS vector data.
Shapefiles combine non-topological data with associated attributes.
To breakdown what that means, let’s return to our original definition of GIS: the intersection of data and location.
Non-topological data is the location. It consists of x,y coordinates and does not include a third dimension (the z coordinate).
Examples of non-topological data include street, state, or area maps.
Associated attributes are the data.
Consider an elevation map. The non-topological data (x,y coordinates) illustrate the base terrain, while the associated attributes (z coordinates) represent the elevation profile.
Topological data isn’t just limited to linear data such as elevation.
Drought conditions in the United States are a good example of a map that could be stored and shared as a shapefile. The map of the United States would be the non-topological data, while the drought conditions data would be the attributes.
Though shapefile sounds singular, there is actually a minimum of three file types that must be present in order to render a shapefile correctly.
Main (.SHP) - Contains the shape coordinates: essentially describing all the basic shapes within the file.
Index (.SHX) - The spatial file, which helps the GIS software to find features more quickly within the main SHP file.
dBase (.DBF) - Contains all the attribute data for the features within the first two file extensions.
There are over 60 GIS file types, each with unique characteristics and use cases.
The sheer number of geospatial file formats can be overwhelming. That said, many of these file types are specialized and/or only supported by one GIS system - limiting their everyday use.
Below we cover five of the most common and widely used GIS file types:
GeoJSON is vector file format that encodes geographical data using Javascript Object Notation (JSON), a data formatting language.
Compared to other web-based languages, JSON is lightweight and fairly straightforward.
JSON files generally contain two elements:
GeoJSON files contain those elements, as well as a geometry component.
These files store coordinates as text, but render in a visual format.
TIFF files are raster image files: most closely related to JPEG, PNG, and GIF file types.
Unlike other raster file types, they don’t compress to decrease file size. As such, they're not optimal for use on websites.
That said, they do offer the most flexibility in terms of editing and adding transparency, tags, and layers.
GeoTIFFs are TIFF files that contain location metadata. The metadata acts as instructions on how to locate the file on the map.
Supported by most platforms, GeoTIFF files are the industry-standard for satellite imagery and other GIS image files.
File geodatabases allow users to store all thematically related data in a single database. Each database can organize and store vector and raster files, relationship classes, attribute tables, and spatial data.
Users can create multiple thematic databases as needed.
Like Shapefiles, geodatabases are a proprietary format created by ESRI.
Geodatabases and Shapefiles can achieve similar goals. However, geodatabases offer significant advantages:
There are actually two types of geodatabases: file (GDB) and personal (MBD). Personal geodatabases were the precursor to file databases and are the default for Microsoft Access.
To learn more about personal databases, as well as the differences between the two database types check out the article below.
Learning resource: What is a Geodatabase? Personal vs File Geodatabase
KML stands for Keyhole Markup Language. As the default file format for Google Earth, it’s likely the best known GIS file type outside of professional GIS circles.
KMZ is the compressed version of KML, signifying KML-Zipped.
KML files contain both geometry and attribute data.
They also contain a variety of configuration options that, though they add significant value to Google Earth as an application, limit the use of KML files elsewhere.
This format was originally developed by Keyhole Inc, which was later bought by Google.
CSV stands for comma separated value file.
As the name suggests, CSV files are a list of data points (values) separated by commas.
As text files, they are easily the simplest file format here, making them ideal for transferring data between programs.
Though not technically a mapping format, CSV files are frequently used to create point layers in GIS platforms. For this to be successful, the CSV file must have columns for both x and y coordinates.
Originating as the combination of the words 'radar' and 'light,' the term LiDAR is now used as an acronym for 'light detection and ranging.'
LiDAR is a surverying method that employs lasers to measure distance.
Laser light pulses leave the LiDAR system, bounce off the ground or other objects, and return to the sensor. Distance is measured by tracking how long a pulse takes to return.
Light moves incredibly fast and in all directions simultaneously. This means that LiDAR devices can create point clouds: complex scans made of millions of individual points.
Though point cloud is an accurate description, that terminology doesn’t really reflect the awesome reality.
Point clouds are highly detailed 3D maps, illustrating everything from a downtown core to a national forest.
Unlike radar and sonar, LiDAR is not necessarily inhibited by object interference. One LiDAR emission can complete multiple returns, meaning it will bounce multiple times between the LiDAR system and any objects it meets.
This makes LiDAR particularly useful for mapping vegetated areas. For example, when surverying a national forest, the LiDAR emissions won't stop at the top of the tree canopy: they will make returns until hitting the ground.
The two most common LiDAR maps are digital elevation models (DEM) and canopy height models (CHM): both made possible by the multiple returns property.
For DEMs, you would take a full LiDAR scan and then filter for the last return: remembering that the last return generally represents ground points. With these filtered data points you can then create a bare earth map, one that excludes all but the surface of the Earth itself.
For CHMs, the idea is similar. Filter for the first return (in this case, the top of the tree) and then subtract the final return (the ground). This leaves the height of each tree in the area, allowing you to create a full canopy height map.
GIS data comes in many forms and from a huge variety of sources.
In an ideal world, you’d either have the tools to collect the data yourself, or access to the appropriate databases.
In reality, you’ll often need to source the data yourself.
Luckily, there is a massive amount of open-source map data online. A few well-placed Google searches can unearth an abundance of valuable resources.
Many counties maintain databases of their own GIS data, the majority of which is available for free download. There are also several open-source databases that are a great starting point for people looking to find a specific data type.
Below you’ll find the top 5 sources for free GIS data, as well as resources for further research.
Best for: Cultural, physical, and basemap data
Natural Earth Data (NED) is an especially excellent resource for cartographers, topping several lists for best open-source GIS database.
NED offers a combination of both vector and raster data sets, most of which are available in three different size scales.
Supported by the North American Cartographic Information Society, NED hosts data on a global scale and should be your first stop for beautiful GIS map making.
Best for: High spatial resolution vector data
OpenStreetMap (OSM) offers high spatial resolution vector data. What differentiates OSM from other GIS data sources, is that all the data on OSM is crowdsourced from cartographers and other GIS map makers.
This means there is a massive amount of detailed information available.
The downside of the crowdsource format is that nothing is vetted beforehand. Verifying data accuracy is difficult and some data sets are incomplete.
That said, most anecdotal evidence points to a high degree of accuracy. As the data is coming primarily from GIS professionals, it’s in everyone’s interest to upload quality data that increases the quality of the database as a whole.
Best for: Remote sensing data
USGS EarthExplorer is easily one of the most comprehensive sources for remote sensing data i.e. data from satellites or other high-flying aircraft.
With one of the more user-friendly search functions and the ability to download in bulk, USGS is an invaluable resource for cartographers in need of satellite or aerial data.
Best for: LiDAR data
OpenTopography is one of few online resources where you can download full LiDAR datasets. It is not globally comprehensive, with around 90% of the resources focusing on the United States, Canada, Australia, Brazil, Haiti, Mexico, and Puerto Rico.
That said, in the world of GIS, LiDAR data is a scarce and precious resource. So despite the limitations, these data sets can be invaluable for people whose projects focus on those countries.
Best for: Global satellite
Nasa Earth Observations (NEO) is another resource that focuses on remote sensing data. This resource is unique because the data is climate and environment centered, making atmosphere, land, oceans, energy, and human life data more accessible.
In addition, these resources are updated quite consistently (ensuring greater accuracy) and are available in a multitude of formats: JPEG, PNG, KML, and GeoTIFF.