Event Timeline
Traceback (most recent call last): File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 250, in _run raw_ret = runner.run() File "/srv/deployment/spicerack/cookbooks/sre/wdqs/data-reload.py", line 268, in run self.preparation_step.run() File "/srv/deployment/spicerack/cookbooks/sre/wdqs/data-reload.py", line 458, in run self._extract_from_hdfs(tmpdir) File "/srv/deployment/spicerack/cookbooks/sre/wdqs/data-reload.py", line 415, in _extract_from_hdfs size = self._get_dump_size_from_hdfs() File "/srv/deployment/spicerack/cookbooks/sre/wdqs/data-reload.py", line 408, in _get_dump_size_from_hdfs return int(re.sub(r"^(\d+)\s+.*$", next(lines), r"\1")) ValueError: invalid literal for int() with base 10: '\\1'
@RKemper for testing I created a smaller folder at hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ it has only two chunks so I hope it might help iterate a bit faster on this, the command should become:
cookbook sre.wdqs.data-reload \ --task-id T349069 \ --reason "Test wdqs reload based on HDFS" \ --reload-data wikidata_full \ --from-hdfs hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ \ --stat-host stat1009.eqiad.wmnet \ wdqs2023.codfw.wmnet
I checked out your changes in my home directory on cumin2002, to be able to test them.
brouberol@cumin2002:~$ test-cookbook --change 1038904 INFO:__main__:Change exists in project operations/cookbooks with latest patch set being 24 INFO:__main__:Setting up Cookbooks change 1038904 patch set 24 for testing INFO:__main__:Checkout of change 1038904 not found, cloning the repo INFO:__main__:Executing command /usr/bin/git clone --depth 10 https://gerrit.wikimedia.org/r/operations/cookbooks /home/brouberol/cookbooks_testing/cookbooks-1038904 Cloning into '/home/brouberol/cookbooks_testing/cookbooks-1038904'... remote: Counting objects: 296, done remote: Finding sources: 100% (296/296) remote: Getting sizes: 100% (257/257) remote: Compressing objects: 100% (642787/642787) remote: Total 296 (delta 54), reused 147 (delta 29) Receiving objects: 100% (296/296), 310.23 KiB | 2.75 MiB/s, done. Resolving deltas: 100% (54/54), done. INFO:__main__:Executing command /usr/bin/git -C /home/brouberol/cookbooks_testing/cookbooks-1038904 status --porcelain INFO:__main__:No local modification found, fetching change from Gerrit INFO:__main__:Executing command /usr/bin/git -C /home/brouberol/cookbooks_testing/cookbooks-1038904 fetch https://gerrit.wikimedia.org/r/operations/cookbooks refs/changes/04/1038904/24 remote: Counting objects: 8552, done remote: Finding sources: 100% (8552/8552) remote: Getting sizes: 100% (1572/1572) remote: Compressing objects: 100% (29460/29460) remote: Total 8552 (delta 5684), reused 8537 (delta 5679) Receiving objects: 100% (8552/8552), 1.90 MiB | 9.47 MiB/s, done. Resolving deltas: 100% (5684/5684), done. From https://gerrit.wikimedia.org/r/operations/cookbooks * branch refs/changes/04/1038904/24 -> FETCH_HEAD INFO:__main__:Executing command /usr/bin/git -C /home/brouberol/cookbooks_testing/cookbooks-1038904 rev-parse --verify change-1038904-24 fatal: Needed a single revision INFO:__main__:Checking out the patch set into branch change-1038904-24 INFO:__main__:Executing command /usr/bin/git -C /home/brouberol/cookbooks_testing/cookbooks-1038904 checkout -b change-1038904-24 FETCH_HEAD Switched to a new branch 'change-1038904-24' INFO:__main__:================================================== INFO:__main__:Executing: sudo cookbook -c /home/brouberol/cookbooks_testing/config.yaml INFO:__main__:================================================== #--- cookbooks args=[] ---# [0/137] sre: SRE Cookbooks q - Quit h - Help >>> q brouberol@cumin2002:~$ sudo cookbook -c /home/brouberol/cookbooks_testing/config.yaml sre.wdqs.data-reload --task-id T349069 --reason "Test wdqs reload based on HDFS" --reload-data wikidata_full --from-hdfs hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ --stat-host stat1009.eqiad.wmnet wdqs2023.codfw.wmnet Acquired lock for key /spicerack/locks/cookbooks/sre.wdqs.data-reload:wdqs2023.codfw.wmnet: {'concurrency': 1, 'created': '2024-06-12 08:14:58.809865', 'owner': 'brouberol@cumin2002 [3637379]', 'ttl': 2419200} START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet) Creating stat1009.eqiad.wmnet:/srv/analytics-search/wdqs_reload_temp_folder and setting analytics-search as owner ----- OUTPUT of 'mkdir -p /srv/an.../dumps_from_hdfs' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.39hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mkdir -p /srv/an.../dumps_from_hdfs'. ----- OUTPUT of 'chown -R analyti...load_temp_folder' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.37hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'chown -R analyti...load_temp_folder'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Extracting dumps from hdfs hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ to stat1009.eqiad.wmnet:/srv/analytics-search/wdqs_reload_temp_folder/reload.3637379.1718180098/dumps_from_hdfs ----- OUTPUT of 'sudo -u analytic...k-test-T349069/"' ----- 289435916 hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069 ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:02<00:00, 2.44s/hosts] FAIL | | 0% (0/1) [00:02<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -u analytic...k-test-T349069/"'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. ----- OUTPUT of 'set -o pipefail;..._hdfs' | tail -1' ----- 19123612299264 3971724787712 ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.40hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'set -o pipefail;..._hdfs' | tail -1'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. ----- OUTPUT of 'sudo -u analytic...dumps_from_hdfs"' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:07<00:00, 7.46s/hosts] FAIL | | 0% (0/1) [00:07<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -u analytic...dumps_from_hdfs"'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Cleaning/creating target data wdqs2023.codfw.wmnet:/srv/dump/dumps_from_hdfs ----- OUTPUT of 'rm -rf /srv/dump/dumps_from_hdfs' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 2.98hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'rm -rf /srv/dump/dumps_from_hdfs'. ----- OUTPUT of 'mkdir -p /srv/dump' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 3.77hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mkdir -p /srv/dump'. ----- OUTPUT of 'test -d /srv/dump' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 3.83hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'test -d /srv/dump'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Copying dumps from stat1009.eqiad.wmnet:/srv/analytics-search/wdqs_reload_temp_folder/reload.3637379.1718180098/dumps_from_hdfs to wdqs2023.codfw.wmnet:/srv/dump/dumps_from_hdfs About to transfer /srv/analytics-search/wdqs_reload_temp_folder/reload.3637379.1718180098/dumps_from_hdfs from stat1009.eqiad.wmnet to ['wdqs2023.codfw.wmnet']:['/srv/dump'] (289440003 bytes) Cleaning up.... Cleaning up stat1009.eqiad.wmnet:/srv/analytics-search/wdqs_reload_temp_folder/reload.3637379.1718180098/dumps_from_hdfs ----- OUTPUT of 'find /srv/analyt...*.gz' | xargs rm' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.28hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'find /srv/analyt...*.gz' | xargs rm'. ----- OUTPUT of 'rmdir /srv/analy.../dumps_from_hdfs' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.40hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'rmdir /srv/analy.../dumps_from_hdfs'. ----- OUTPUT of 'rmdir /srv/analy...37379.1718180098' ----- ================ PASS |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.40hosts/s] FAIL | | 0% (0/1) [00:00<?, ?hosts/s] 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'rmdir /srv/analy...37379.1718180098'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. Exception raised while executing cookbook sre.wdqs.data-reload: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 250, in _run raw_ret = runner.run() File "/home/brouberol/cookbooks_testing/cookbooks/cookbooks/sre/wdqs/data-reload.py", line 270, in run self.preparation_step.run() File "/home/brouberol/cookbooks_testing/cookbooks/cookbooks/sre/wdqs/data-reload.py", line 492, in run self._transfer_dump(tmpdir) File "/home/brouberol/cookbooks_testing/cookbooks/cookbooks/sre/wdqs/data-reload.py", line 467, in _transfer_dump ret = transfer.run() File "/usr/lib/python3/dist-packages/transferpy/Transferer.py", line 584, in run port = firewall_handler.open(self.source_host, self.options['port']) KeyError: 'port' Released lock for key /spicerack/locks/cookbooks/sre.wdqs.data-reload:wdqs2023.codfw.wmnet: {'concurrency': 1, 'created': '2024-06-12 08:14:58.809865', 'owner': 'brouberol@cumin2002 [3637379]', 'ttl': 2419200} END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
[extract_kafka_timestamp_from_sparql] found null Exception raised while executing cookbook sre.wdqs.data-reload: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 250, in _run raw_ret = runner.run() File "/home/brouberol/cookbooks_testing/cookbooks/cookbooks/sre/wdqs/data-reload.py", line 274, in run self._reload_wikibase() File "/home/brouberol/cookbooks_testing/cookbooks/cookbooks/sre/wdqs/data-reload.py", line 317, in _reload_wikibase self.postload_step.run() File "/home/brouberol/cookbooks_testing/cookbooks/cookbooks/sre/wdqs/data-reload.py", line 344, in run timestamp = self._extract_kafka_timestamp_from_sparql() File "/home/brouberol/cookbooks_testing/cookbooks/cookbooks/sre/wdqs/data-reload.py", line 366, in _extract_kafka_timestamp_from_sparql return parse_iso_dt(timestamp) File "/home/brouberol/cookbooks_testing/cookbooks/cookbooks/sre/wdqs/data-reload.py", line 612, in parse_iso_dt dt = datetime.fromisoformat(timestamp) ValueError: Invalid isoformat string: 'null'
@RKemper I think we should now do a full import to measure the time it takes in order to have a rough estimation to answer T367409
To have a full run we need to re-enable the updater on wdqs2023 (which I think will be done with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1042965)
The command to run should be (using the latest dumps):
cookbook sre.wdqs.data-reload \ --task-id T349069 \ --reason "Test wdqs reload based on HDFS" \ --reload-data wikidata_full \ --from-hdfs hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ \ --stat-host stat1009.eqiad.wmnet \ wdqs2023.codfw.wmnet
we really want to use wdqs2023 because currently it's the sole machine where I have deployed a quick backport of a problem in the loadData.sh script (https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/1042254)