[go: nahoru, domu]

Skip to content

Indexing scripts

Ernesto Lowy edited this page May 4, 2021 · 9 revisions

This page shows how to run the scripts to generate two types of indexes:

Sequence and analysis indexes

Sequence indexes

The script used to create a sequence index is named create_seq_index.py, it is run in the following way:

create_seq_index.py --studies SRP000031 --output 1000genomes_pilot1.sequence.index -s settings.ini --analysis_group malaria_low

Where:

  • --studies: is the ENA study accession id (or ids if multiple comma-separated study ids are passed) that will be included in the index
  • --output: is the file name given to the new index
  • -s: file with the configuration settings to run this script
  • --analysis_group: analysis_group name (will appear in the last column of the index)

Note:

If you see a warning similar to:

INFO:__main__:No population defined for SAMEA2031116. Will be set to 'NA'

It means that this particular sample accession id (SAMEA2031116) does not have a population defined in the ENA and that the population column will be set to NA for this index record. The population information for these samples can be added later using a helper script named add_missing_pop.py.

This script is run by doing:

add_missing_pop.py -i 1000genomes_pilot1.sequence.index --host mysql-igsr-web -u g1kro -P 4641 -d igsr_website_v2 --output 1000genomes_pilot1.sequence.new.index

Where:

  • -i: is the sequence index generated using create_seq_index.py without population information
  • --host , -u, -P and -d are the connection details for the MYSQL IGSR website database containing the population information for the relevant samples
  • --output: name of the new index file with population information

Analysis index

The script used to create an analysis index is named create_analysis_index, it is run in the following way:

create_analysis_index.py --studies ERP124807 --output bionano.analysis.index -s settings.ini

Where:

  • --studies: is the ENA study accession id (or ids if multiple comma-separated study ids are passed) that will be included in the index
  • --output: is the file name given to the new index
  • -s: file with the configuration settings to run this script
Clone this wiki locally