US National Address Database? #7244

jwass · 2024-05-21T20:31:11Z

Opening this issue to explore ingesting the US National Address Database (NAD) - https://www.transportation.gov/gis/national-address-database.

Open Addresses might currently have most (or all?) of the underlying sources that make up the NAD but there are reasons to ingest it and for downstream users to want to use it:

It is offered under a single license which reduces the burden of checking each individual source
It normalizes fields into a common schema rather than each source having its own and it populates some fields that underlying sources may be missing. For example some towns don't populate their "city" field so it needs to be added on a dataset-by-dataset basis, which the NAD has already done
If we can align more people on using the NAD it can hopefully entice more organization to participate and use it (okay, I admit this one is mostly vibes)

I think it's a 32 GB (or around there) CSV. I'm not sure if that would pose a problem to existing batch infrastructure, but I'm happy to get the data in here. But I figured I'd open this issue to see if Open Addresses folks are aligned before going too far down that path.

iandees · 2024-05-23T04:32:32Z

Thanks for bringing this up, @jwass.

In the past, I've resisted including NAD in OpenAddresses for a couple reasons:

OpenAddresses prefers data from "primary sources", not aggregations. We want to get the raw data as much as possible so that any data manipulation that happens (beyond what is documented by the OpenAddresses source file/conform) is done by the consumer of OpenAddresses.
Most of the data included in NAD is already available in OpenAddresses and would add duplication, potentially making life more difficult for data consumers. There is already duplication between local and state-level sources in OA, but it's relatively limited.

Your points about simplified licensing and extra data enrichment are pushing me in the other direction, though. I think we should add it to OA.

I can get started building a source file later this week unless you're interested in trying it yourself.

jwass · 2024-05-23T14:01:49Z

@iandees Thanks. Would be great if I can give it a shot - I'm sure I'll have questions.

andrewharvey · 2024-06-04T04:36:31Z

This sounds similar to the GNAF source we have for au/countrywide, which is aggregated national data from each state/territory in Australia. Indeed some regions in Australia we have overlapping data from 3 levels of government loaded into OA.

Overall I think it's much better to include everything in OA, even if it's duplicated.

I'm not a consumer of the OA global build, and not sure if most users are consuming the global build or picking out individual sources to consume, but if there is demand we could consider producing a "best" global build that skips some of the duplicated region sources in parallel to a "full" global build that includes everything and leaves deduplication to data consumers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

US National Address Database? #7244

US National Address Database? #7244

US National Address Database? #7244

US National Address Database? #7244

Comments