[go: nahoru, domu]

Skip to content

Automated retrieval of consistent geographical metadata for microbiome research!

License

Notifications You must be signed in to change notification settings

m-crown/OMEinfo

Repository files navigation

GitHub release (with filter) GitHub Static Badge Docker Pulls DOI

OMEinfo Logo

OMEinfo is an open-source bioinformatics tool designed to automate the retrieval of consistent geographical metadata for microbiome research. It provides an easy-to-use interface for researchers to obtain geographical metadata, including Köppen-Geiger climate classification, degree of rurality, population density, and fossil fuel CO2 emissions from user-provided location data. The tool aims to facilitate cross-study comparisons and promote reproducibility in microbiome research by adhering to the principles of FAIR and Open data.

Publication available now at Bioinformatics Advances: OMEinfo: Global Geographic Metadata for -omics Experiments (Note: Due to issues with the rendering of regex in the article, the current paper regular expression does not reflect the intended regular expressions for latitude and longitude. The correct expressions are: Latitude: (^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?)$) and Longitude: (^[-+]?((1[0-7]\d(\.\d+)?)|([1-9]?\d(\.\d+)?)|180(\.0+)?)$))

See here for a walkthrough of using OMEinfo with test data.

Features

  • Dash web application for user-friendly data upload and visualization
  • Custom Cloud Optimized GeoTIF file hosted on Figshare for efficient data access
  • Integration with open data sources, such as Global Human Settlement Layer (GHSL)
  • Portable and lightweight Docker container for easy deployment
  • Adheres to FAIR and Open data principles for better reproducibility and collaboration

Table of Contents

Installation

OMEinfo is provided as a Docker container and command line tool, which can be easily set up in a local environment or on cloud-based platforms. OMEinfo has been tested to work using Rocky Linux 8.8, Windows 10 22H2 (via WSL) and MacOS 13.2.

Pre-built docker image

  1. Install Docker on your machine following the official installation guide. NOTE: If running on Windows, Docker will also require Windows Subsystem for Linux to be installed - see the documentation here. You may also need to disable or allow WSL access to the internet in your firewall.
  2. Pull the Docker image from Docker-Hub: docker pull mattcrown/omeinfo:latest or docker pull mattcrown/omeinfo:1.1.0
  3. Run the Docker container: docker run -p 8050:8050 mattcrown/omeinfo:latest or docker run -p 8050:8050 mattcrown/omeinfo:1.1.0 (see Usage for for more parameters when running the docker container).

Build image from Source

  1. Install Docker on your machine following the official installation guide. NOTE: If running on Windows, Docker will also require Windows Subsystem for Linux to be installed - see the documentation here. You may also need to disable or allow WSL access to the internet in your firewall.
  2. Clone this repository: git clone https://github.com/m-crown/OMEinfo.git
  3. Navigate to the project app directory: cd OMEinfo/OMEinfo
  4. Build the Docker image: docker build -t omeinfo . Note: you may need to prefix this command with sudo.
  5. Run the Docker container: docker run -p 8050:8050 omeinfo (see Usage for more details)

Command Line Tool

  1. Install mamba.
  2. Clone this repository: git clone https://github.com/m-crown/OMEinfo.git
  3. cd OMEinfo/OMEinfo
  4. Create a mamba environment using the .yml file: conda_cli_requirements.yml: mamba env create --file conda_cli_requirements.yml Note The file conda_requirements.yml is used in the Docker container and writes the base environment. It is not recommended to use this file for CLI usage.
  5. Activate the conda environment: mamba activate omeinfo
  6. Copy OMEinfo to the environment bin: cp omeinfo.py $CONDA_PREFIX/bin/
  7. Copy Rurality and Koppen-Geiger legends to bin: cp *.txt $CONDA_PREFIX/bin/

(see Usage for more details)

Usage

Dash app walkthrough with test data

  1. Run the Docker container:
    • For default mode: docker run -p 8050:8050 omeinfo or docker run -p 8050:8050 mattcrown/omeinfo:latest if you pulled the image from Docker Hub.
    • To specify a specific OMEinfo data version: docker run -p 8050:8050 -e OMEINFO_VERSION data_version omeinfo where data version may be 1.0.0 or 2.0.0
  2. Open the OMEinfo web application in your browser at http://localhost:8050.
  3. Upload a CSV or TSV file containing geolocation data (latitude and longitude) using the provided interface. A test addresses file is distributed with the OMEinfo GitHub repo, OMEinfo/test_data/test_addresses.tsv, which provides example locations covering a variety of possible annotations. Download this file or clone the repo to use it within the Docker app (or CLI). NOTE: if downloading the file, use this link for the raw file, and be aware that some browsers may add a .txt suffix to the file. Be sure to upload CSV or TSV files with a .csv or .tsv extension for compatibility.
  4. The application will retrieve the geographical metadata for the uploaded locations and display the results on a map and in a histogram.
  5. You can choose to display metadata features as the colour coding on the map and as the histogram's x-axis.
  6. A table with the processed data is also provided for further analysis.
  7. When finished using OMEinfo app, stop the Docker container using docker stop <container_id_or_name> where <container_id_or_name> is the path of your container instance e.g. omeinfo if built locally or mattcrown/omeinfo:latest if running an image from Docker Hub. You can list running containers in Docker using docker ps.

The OMEinfo Dash App

Command Line Tool walkthrough with test data

Running the command line tool requires only a single command. Assuming you want to analyse the test addresses file from the GitHub repo, and are currently in the directory containing this file, run the following command:

omeinfo.py --location_file test_addresses.tsv

Upon running the command, a summary of the samples to be processed, and the versions of CLI tool and data packet being used will be presented. Upon completion, a table of the first 10 samples analysed will be shown, and a file with annotated metadata will be saved to the current directory, together with a BibTeX citation file of all citations necessary for crediting data authors.

The full command line parameters are presented below:

usage: omeinfo.py [-h] [--location_file LOCATION_FILE] [--location LOCATION] [--data_version DATA_VERSION] [--source_data SOURCE_DATA] [--output_file OUTPUT_FILE] [--n_samples N_SAMPLES] [--quiet QUIET]

The OMEinfo command-line tool enables users to annotate geographical metadata, including Koppen climate classification, degree of rurality, population density, and fossil fuel CO2 emissions, from user-provided location data. The tool
offers options for selecting the data version and the data source. Annotations are stored in a specified output file in TSV format.

options:
  -h, --help            show this help message and exit
  --location_file LOCATION_FILE
                        file containing locations
  --location LOCATION   location in latitude,longitude EPSG:4326 format, input string in format 'sample,latitude,longitude'
  --data_version DATA_VERSION
                        version of data to use
  --source_data SOURCE_DATA
                        url to data or filepath to local version
  --output_file OUTPUT_FILE
                        name of output file
  --n_samples N_SAMPLES
                        number of output summary table samples to show in command line
  --quiet QUIET         suppress console output

GIF of OMEinfo CLI processing data

Image of OMEinfo CLI on completion

Running OMEinfo with locally stored geoTIFF files

By default, OMEinfo runs analyses with a version of the data packet stored in the cloud (currently, via Figshare). It is also possible to run OMEinfo using a locally stored version of the data packet, should the remote version become unavailable.

For the Dash app, build as normal, or download from Docker hub, and change directory to the location where the local version of the data packet is stored. On execution add the following parameters:

docker run -p 8050:8050 mattcrown/omeinfo:latest -v $PWD:/data/ -e OMEINFO_URL=/data/[DATA_PACKET_HERE] omeinfo

$PWD can also be replaced with the fully resolved path to the directory in which the data packet is stored on your machine. Replace [DATA_PACKET_HERE] with the filename(s) of the data packet files necessary for analysis. For example, if running the OMEinfo v2 data packet locally (a single file) the command would look like this:

docker run -p 8050:8050 -v $PWD:/data/ -e OMEINFO_URL=/data/omeinfo_v2.tif -e OMEINFO_VERSION=2.0.0 mattcrown/omeinfo:latest

and from the CLI tool:

omeinfo.py --data_version 2.0.0 --source_data omeinfo_v2.tif --location_file test_addresses.tsv

If running the OMEinfo v1 data packet, it would instead look like this:

docker run -p 8050:8050 -v $PWD:/data/ -e OMEINFO_URL=/data/rurpopkop_v1_cog.tif,/data/co2_v1_cog.tif,/data/no2_v1_cog.tif -e OMEINFO_VERSION=1.0.0 mattcrown/omeinfo:latest

and from the CLI tool (assuming you are currently in the directory with the data files and test data file):

omeinfo.py --data_version 1.0.0 --source_data rurpopkop_v1_cog.tif,co2_v1_cog.tif,no2_v1_cog.tif --location_file test_addresses.tsv

With the v1 data packet, it is important to specify files as a single comma-separated string, in the order RurPopKop file, CO2 file, NO2 file.

Data Sources

Current: OMEinfo V2 dataset

File Name File URL Description
omeinfo_v2.tif Figshare All data sources unified in a single WGS84 COG. Additionally includes relative deprivation on top of V1 data sources.

Past: OMEinfo V1 dataset

File Name File URL Description
co2_v1_cog.tif Figshare Fossil Fuel CO2 Emissions
rurpopkop_v1_cog.tif Figshare Rurality, Population Density, and Koppen-Geiger Climate Classification
no2_v1_cog.tif Figshare Tropospheric NO2 Emissions

For details on the process for the creation of the current data sources, see the explanation here

Spatial Extents

Data Type Spatial Extents
Rurality Upper Left: -179.999, 89.091
Lower Left: -179.999, -89.094
Upper Right: 179.997, 89.091
Lower Right: 179.997, -89.094
Population Density Upper Left: -179.999, 89.091
Lower Left: -179.999, -89.094
Upper Right: 179.997, 89.091
Lower Right: 179.997, -89.094
Koppen Geiger Climate Classification Upper Left: -180.00, 90.00
Lower Left: -180.00, -90.00
Upper Right: 180.00, 90.00
Lower Right: 180.00, -90.00
Fossil Fuel CO2 Emissions Upper Left: -180.00, 90.00
Lower Left: -180.00, -90.00
Upper Right: 180.00, 90.00
Lower Right: 180.00, -90.00
Tropospheric NO2 Emissions Upper Left: -180.00, 90.00
Lower Left: -180.00, -90.00
Upper Right: 180.00, 90.00
Lower Right: 180.00, -90.00
Relative Deprivation Upper Left: -180.00, 82.183
Lower Left: -180.00, -55.983
Upper Right: 179.816, 82.183
Lower Right: 179.816, -55.983
OMEinfo v2 Data Packet Combined Upper Left: -180.00, 90.00
Lower Left: -180.00, -89.998
Upper Right: 179.996, 90.00
Lower Right: 179.996, -89.998

Citations

Data Source Citation DOI
Fossil Fuel CO2 emissions data Tomohiro Oda, Shamil Maksyutov (2015), ODIAC Fossil Fuel CO2 Emissions Dataset (Version name: ODIAC2020b), Center for Global Environmental Research, National Institute for Environmental Studies 10.17595/20170411.001
Köppen-Geiger Climate Classification Beck, H., Zimmermann, N., McVicar, T. et al. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci Data 5, 180214 (2018) 10.1038/sdata.2018.214
Population Density Schiavina, Marcello; Freire, Sergio; MacManus, Kytt (2019): GHS population grid multitemporal (1975, 1990, 2000, 2015) R2019A. European Commission, Joint Research Centre (JRC) European Commission
Rurality Pesaresi, Martino; Florczyk, Aneta; Schiavina, Marcello; Melchiorri, Michele; Maffenini, Luca (2019): GHS settlement grid, updated and refined REGIO model 2014 in application to GHS-BUILT R2018A and GHS-POP R2019A, multitemporal (1975-1990-2000-2015), R2019A. European Commission, Joint Research Centre (JRC) European Commission
Tropospheric NO2 Emissions data Romahn, Pedergnana, Loyola, Apituley, Sneep and Veefkind (2022): Sentinel-5 Precursor/TROPOMI Level 2 Product User Manual: Cloud Properties ESA Sentinel 5P
Relative Deprivation Index NASA Socioeconomic Data and Applications Center (SEDAC) (2022) SEDAC

Download the current citations in BibTeX format.

Past citations can be found in BibTeX format in the citations directory of OMEinfo.

License

OMEinfo is released under the MIT License. By using OMEinfo, you agree to the terms and conditions of this license. See the LICENSE file in this repo for more information.

Support

If you encounter any issues or have questions about using OMEinfo, please create an issue on this repo.