[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

US National Address Database? #7244

Open
jwass opened this issue May 21, 2024 · 3 comments
Open

US National Address Database? #7244

jwass opened this issue May 21, 2024 · 3 comments

Comments

@jwass
Copy link
Contributor
jwass commented May 21, 2024

Opening this issue to explore ingesting the US National Address Database (NAD) - https://www.transportation.gov/gis/national-address-database.

Open Addresses might currently have most (or all?) of the underlying sources that make up the NAD but there are reasons to ingest it and for downstream users to want to use it:

  • It is offered under a single license which reduces the burden of checking each individual source
  • It normalizes fields into a common schema rather than each source having its own and it populates some fields that underlying sources may be missing. For example some towns don't populate their "city" field so it needs to be added on a dataset-by-dataset basis, which the NAD has already done
  • If we can align more people on using the NAD it can hopefully entice more organization to participate and use it (okay, I admit this one is mostly vibes)

I think it's a 32 GB (or around there) CSV. I'm not sure if that would pose a problem to existing batch infrastructure, but I'm happy to get the data in here. But I figured I'd open this issue to see if Open Addresses folks are aligned before going too far down that path.

@iandees
Copy link
Member
iandees commented May 23, 2024

Thanks for bringing this up, @jwass.

In the past, I've resisted including NAD in OpenAddresses for a couple reasons:

  1. OpenAddresses prefers data from "primary sources", not aggregations. We want to get the raw data as much as possible so that any data manipulation that happens (beyond what is documented by the OpenAddresses source file/conform) is done by the consumer of OpenAddresses.
  2. Most of the data included in NAD is already available in OpenAddresses and would add duplication, potentially making life more difficult for data consumers. There is already duplication between local and state-level sources in OA, but it's relatively limited.

Your points about simplified licensing and extra data enrichment are pushing me in the other direction, though. I think we should add it to OA.

I can get started building a source file later this week unless you're interested in trying it yourself.

@jwass
Copy link
Contributor Author
jwass commented May 23, 2024

@iandees Thanks. Would be great if I can give it a shot - I'm sure I'll have questions.

@andrewharvey
Copy link
Contributor

This sounds similar to the GNAF source we have for au/countrywide, which is aggregated national data from each state/territory in Australia. Indeed some regions in Australia we have overlapping data from 3 levels of government loaded into OA.

Overall I think it's much better to include everything in OA, even if it's duplicated.

I'm not a consumer of the OA global build, and not sure if most users are consuming the global build or picking out individual sources to consume, but if there is demand we could consider producing a "best" global build that skips some of the duplicated region sources in parallel to a "full" global build that includes everything and leaves deduplication to data consumers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants