[go: nahoru, domu]

Skip to content

Commit

Permalink
episode 2
Browse files Browse the repository at this point in the history
  • Loading branch information
fnattino committed Jul 25, 2023
1 parent 81d5efe commit e99670e
Showing 1 changed file with 72 additions and 65 deletions.
137 changes: 72 additions & 65 deletions episodes/02-intro-vector-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,17 @@
title: "Introduction to Vector Data"
teaching: 10
exercises: 5
questions:
- "What are the main attributes of vector data?"
objectives:
- "Describe the strengths and weaknesses of storing data in vector format."
- "Describe the three types of vectors and identify types of data that would be stored in each."
keypoints:
- "Vector data structures represent specific features on the Earth's surface along with attributes of those features."
- "Vector objects are either points, lines, or polygons."
---

:::questions
- What are the main attributes of vector data?
:::

:::objectives
- Describe the strengths and weaknesses of storing data in vector format.
- Describe the three types of vectors and identify types of data that would be stored in each.
:::

## About Vector Data

Vector data structures represent specific features on the Earth's surface, and
Expand All @@ -20,10 +21,7 @@ locations (x, y values) known as vertices that define the shape of the spatial
object. The organization of the vertices determines the type of vector that we
are working with: point, line or polygon.

![Types of vector objects](../fig/E02-01-pnt_line_poly.png)

Image Source: National Ecological Observatory Network (NEON)
{: .text-center}
![Types of vector objects (Image Source: National Ecological Observatory Network (NEON))](../fig/E02-01-pnt_line_poly.png)

* **Points:** Each point is defined by a single x, y coordinate. There can be
many points in a vector point file. Examples of point data include: sampling
Expand All @@ -38,36 +36,40 @@ vertex that has a defined x, y location.
closed. The outlines of survey plot boundaries, lakes, oceans, and states or
countries are often represented by polygons.

> ## Data Tip
>
> Sometimes, boundary layers such as states and countries, are stored as lines
> rather than polygons. However, these boundaries, when represented as a line,
> will not create a closed object with a defined area that can be filled.
{: .callout}

> ## Identify Vector Types
>
> The plot below includes examples of two of the three types of vector
> objects. Use the definitions above to identify which features
> are represented by which vector type.
>
> ![Vector Type Examples](../fig/E02-02-vector_types_examples.png)
>
> > ## Solution
> > State boundaries are polygons. The Fisher Tower location is
> > a point. There are no line features shown.
> {: .solution}
{: .challenge}

Vector data has some important advantages:
* The geometry itself contains information about what the dataset creator thought was important
* The geometry structures hold information in themselves - why choose point over polygon, for instance?
* Each geometry feature can carry multiple attributes instead of just one, e.g. a database of cities can have attributes for name, country, population, etc
* Data storage can be very efficient compared to rasters

:::callout
## Data Tip

Sometimes, boundary layers such as states and countries, are stored as lines
rather than polygons. However, these boundaries, when represented as a line,
will not create a closed object with a defined area that can be filled.
:::

:::challenge
## Identify Vector Types

The plot below includes examples of two of the three types of vector
objects. Use the definitions above to identify which features
are represented by which vector type.

![Vector Type Examples](../fig/E02-02-vector_types_examples.png)

::::solution
## Solution

State boundaries are polygons. The Fisher Tower location is
a point. There are no line features shown.
::::
:::

Vector data has some important advantages:
* The geometry itself contains information about what the dataset creator thought was important
* The geometry structures hold information in themselves - why choose point over polygon, for instance?
* Each geometry feature can carry multiple attributes instead of just one, e.g. a database of cities can have attributes for name, country, population, etc
* Data storage can be very efficient compared to rasters

The downsides of vector data include:
* Potential loss of detail compared to raster
* Potential bias in datasets - what didn't get recorded?
* Potential loss of detail compared to raster
* Potential bias in datasets - what didn't get recorded?
* Calculations involving multiple vector layers need to do math on the
geometry as well as the attributes, so can be slow compared to raster math.

Expand All @@ -82,13 +84,13 @@ their features to real-world locations.

Like raster data, vector data can also come in many different formats. For this
workshop, we will use the Shapefile format. A Shapefile format consists of multiple
files in the same directory, of which `.shp`, `.shx`, and `.dbf` files are mandatory. Other non-mandatory but very important files are `.prj` and `shp.xml` files.
files in the same directory, of which `.shp`, `.shx`, and `.dbf` files are mandatory. Other non-mandatory but very important files are `.prj` and `shp.xml` files.

- The `.shp` file stores the feature geometry itself
- The `.shp` file stores the feature geometry itself
- `.shx` is a positional index of the feature geometry to allow quickly searching forwards and backwards the geographic coordinates of each vertex in the vector
- `.dbf` contains the tabular attributes for each shape.
- `.dbf` contains the tabular attributes for each shape.
- `.prj` file indicates the Coordinate reference system (CRS)
- `.shp.xml` contains the Shapefile metadata.
- `.shp.xml` contains the Shapefile metadata.

Together, the Shapefile includes the following information:

Expand All @@ -105,26 +107,31 @@ individual shapefile can only contain one vector type (all points, all lines
or all polygons). You will not find a mixture of point, line and polygon
objects in a single shapefile.

> ## More Resources on Shapefiles
>
> More about shapefiles can be found on
> [Wikipedia.](https://en.wikipedia.org/wiki/Shapefile) Shapefiles are often publicly
> available from government services, such as [this page from the US Census Bureau][us-cb] or
> [this one from Australia's Data.gov.au website](https://data.gov.au/data/dataset?res_format=SHP).
{: .callout}

> ## Why not both?
>
> Very few formats can contain both raster and vector data - in fact, most are
> even more restrictive than that. Vector datasets are usually locked to one
> geometry type, e.g. points only. Raster datasets can usually only encode one
> data type, for example you can't have a multiband GeoTIFF where one layer is
> integer data and another is floating-point. There are sound reasons for this -
> format standards are easier to define and maintain, and so is metadata. The
> effects of particular data manipulations are more predictable if you are
> confident that all of your input data has the same characteristics.
{: .callout}
:::callout
## More Resources on Shapefiles

More about shapefiles can be found on
[Wikipedia.](https://en.wikipedia.org/wiki/Shapefile) Shapefiles are often publicly
available from government services, such as [this page from the US Census Bureau][us-cb] or
[this one from Australia's Data.gov.au website](https://data.gov.au/data/dataset?res_format=SHP).
:::

:::callout
## Why not both?

Very few formats can contain both raster and vector data - in fact, most are
even more restrictive than that. Vector datasets are usually locked to one
geometry type, e.g. points only. Raster datasets can usually only encode one
data type, for example you can't have a multiband GeoTIFF where one layer is
integer data and another is floating-point. There are sound reasons for this -
format standards are easier to define and maintain, and so is metadata. The
effects of particular data manipulations are more predictable if you are
confident that all of your input data has the same characteristics.
:::

[us-cb]: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html

{% include links.md %}
:::keypoints
- Vector data structures represent specific features on the Earth's surface along with attributes of those features.
- Vector objects are either points, lines, or polygons.
:::

0 comments on commit e99670e

Please sign in to comment.