[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process all basic blocks to make the sighash checks 'greedy' #267

Open
wants to merge 76 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
511b60e
Enable message ordering for pubsub exporter
medvedev1088 Dec 6, 2020
48f11fc
Add export to GCS and message ordering in pubsub
medvedev1088 Jan 9, 2021
ecc4484
Enable message ordering if topic name contains sorted
medvedev1088 Jun 22, 2021
513c124
make the contract service greedy for function sighashes
andrewmccall Aug 4, 2021
3a12dc8
Merge branch 'blockchain-etl:develop' into develop
andrewmccall Aug 4, 2021
af740dd
I think the greedy processing means that this limitation is removed
andrewmccall Aug 4, 2021
55a9371
Change log level to debug in eth_token_service.py
medvedev1088 Aug 8, 2021
90afaab
Suppress warning Symbolic Execution not available: No module named 'm…
medvedev1088 Aug 8, 2021
135a475
Merge pull request #268 from blockchain-etl/change_log_level_in_expor…
medvedev1088 Aug 8, 2021
104576d
Bump version
medvedev1088 Aug 8, 2021
cf80415
Add EIP-1559 columns
psych0xpomp Aug 9, 2021
260202b
Adding the actual fix...
andrewmccall Aug 10, 2021
42b96bc
adds support for non-mainnet in etl stream
drewwells Aug 13, 2021
25fc768
Merge pull request #269 from psych0xpomp/eip1559_columns
medvedev1088 Aug 15, 2021
629aed5
Update link to Travis CI
medvedev1088 Sep 26, 2021
e27abcb
make the contract service greedy for function sighashes
andrewmccall Aug 4, 2021
691ea84
I think the greedy processing means that this limitation is removed
andrewmccall Aug 4, 2021
3903119
Adding the actual fix...
andrewmccall Aug 10, 2021
29edffb
tests I wrote ages ago
andrewmccall Oct 11, 2021
1e7c12a
Fixed last broken one
andrewmccall Oct 11, 2021
d1822c5
Merge branch 'develop' of https://github.com/andrewmccall/ethereum-et…
andrewmccall Oct 11, 2021
6a2072d
Removed debug lines
andrewmccall Oct 11, 2021
fedf6e6
Export Contracts: Fix cli args
kunalmodi Nov 12, 2021
c2f24c6
Merge pull request #283 from kunalmodi/export_contracts_param
medvedev1088 Nov 12, 2021
54d9220
Bump version
medvedev1088 Nov 12, 2021
589cb06
Add note about states to docs
medvedev1088 Nov 26, 2021
e0ca8f9
Merge remote-tracking branch 'origin/develop' into develop
medvedev1088 Nov 26, 2021
de4380f
Added exporter for kafka
ayush3298 Dec 17, 2021
8f93376
Merge branch 'develop' into feature/pubsub_message_ordering
medvedev1088 Dec 19, 2021
b040858
Fix output validation in stream command
medvedev1088 Dec 19, 2021
967c1ad
Allow path in GCS item exporter
medvedev1088 Dec 19, 2021
eefffb0
Parameterize pubsub item exporter for batch params
medvedev1088 Dec 19, 2021
289b900
Update docs
medvedev1088 Dec 19, 2021
c4c9207
Merge pull request #290 from blockchain-etl/feature/pubsub_message_or…
medvedev1088 Dec 19, 2021
a2b6781
Bump version
medvedev1088 Dec 19, 2021
8df7d90
Resolved Conflicts
ayush3298 Dec 22, 2021
f593053
Added param helper for kafka
ayush3298 Dec 22, 2021
28acabe
Made kafka generic for output, now it can be in format of kafka/127.0…
ayush3298 Dec 23, 2021
0667b68
Fix tests
medvedev1088 Dec 23, 2021
1b9c078
Remove Python 3.5 support
medvedev1088 Dec 23, 2021
257da16
Fixed file name typo and used exporters
ayush3298 Dec 23, 2021
a582f73
Remove Python 3.5 support
medvedev1088 Dec 23, 2021
38c2c1b
Merge pull request #292 from blockchain-etl/fix_tests
medvedev1088 Dec 23, 2021
6bb0fff
Merge pull request #291 from ayush3298/develop
medvedev1088 Dec 24, 2021
e3b8363
Update docs
medvedev1088 Dec 24, 2021
75847dd
Bump version
medvedev1088 Dec 24, 2021
dba7adf
Add python 3.9 to tests
medvedev1088 Dec 24, 2021
2a17fb6
Merge pull request #294 from blockchain-etl/add_python39_to_tests
medvedev1088 Dec 24, 2021
114cd60
Fix travis ci timeout
medvedev1088 Dec 27, 2021
c6fbd10
Fix travis ci timeout
medvedev1088 Dec 27, 2021
7d47dd3
Merge pull request #295 from blockchain-etl/fix_timeout_travisci
medvedev1088 Dec 27, 2021
167b38b
Merge pull request #271 from numonedad/bugfix/poachain
medvedev1088 Jan 6, 2022
31fb4ef
Add POA support
medvedev1088 Jan 6, 2022
be1892d
Merge pull request #299 from blockchain-etl/poa_support
medvedev1088 Jan 6, 2022
2a9e468
Bump version
medvedev1088 Jan 6, 2022
69bb6f9
Update error message for tracing
medvedev1088 Jan 12, 2022
b772ec7
Update error message for tracing
medvedev1088 Jan 12, 2022
87f5e45
Update docs
medvedev1088 Jan 12, 2022
baa79e7
Add pip install --upgrade pip to travis
medvedev1088 Jan 17, 2022
37d89e9
Fix broken build
medvedev1088 Jan 20, 2022
5dea830
Move kafka dependency to extras
medvedev1088 Jan 20, 2022
0beebb1
Merge pull request #303 from blockchain-etl/fix_tests2
medvedev1088 Jan 20, 2022
a068973
bumpb python-dateutil
emlazzarin Feb 11, 2022
ed31940
Merge pull request #311 from emlazzarin/develop
medvedev1088 Feb 11, 2022
e82a86c
Limit python-dateutil major version in case of breaking changes
medvedev1088 Feb 11, 2022
65feed5
Merge pull request #313 from blockchain-etl/lib_version_upgrade
medvedev1088 Feb 11, 2022
c135afc
Bump version
medvedev1088 Feb 11, 2022
359c8f5
make the contract service greedy for function sighashes
andrewmccall Aug 4, 2021
1f508bd
I think the greedy processing means that this limitation is removed
andrewmccall Aug 4, 2021
1452399
Adding the actual fix...
andrewmccall Aug 10, 2021
dbead50
tests I wrote ages ago
andrewmccall Oct 11, 2021
baf338b
Fixed last broken one
andrewmccall Oct 11, 2021
c698c9f
make the contract service greedy for function sighashes
andrewmccall Aug 4, 2021
d561e32
Adding the actual fix...
andrewmccall Aug 10, 2021
5f49063
Removed debug lines
andrewmccall Oct 11, 2021
89f28bc
Merge branch 'develop' of https://github.com/andrewmccall/ethereum-et…
andrewmccall Feb 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,6 @@ coverage.xml
.venv
venv/
ENV/

# etl
/last_synced_block.txt
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ language: python
dist: xenial
matrix:
include:
- python: "3.5"
env: TOX_POSARGS="-e py35"
- python: "3.6"
env: TOX_POSARGS="-e py36"
- python: "3.7"
env: TOX_POSARGS="-e py37"
- python: "3.8"
env: TOX_POSARGS="-e py38"
- python: "3.9"
env: TOX_POSARGS="-e py39"
install:
- travis_retry pip install tox
script:
- tox $TOX_POSARGS
- travis_wait tox $TOX_POSARGS
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Ethereum ETL

[![Build Status](https://travis-ci.com/blockchain-etl/ethereum-etl.png)](https://travis-ci.com/blockchain-etl/ethereum-etl)
[![Build Status](https://app.travis-ci.com/blockchain-etl/ethereum-etl.svg?branch=develop)](https://travis-ci.com/github/blockchain-etl/ethereum-etl)
[![Join the chat at https://gitter.im/ethereum-eth](https://badges.gitter.im/ethereum-etl.svg)](https://gitter.im/ethereum-etl/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Telegram](https://img.shields.io/badge/telegram-join%20chat-blue.svg)](https://t.me/joinchat/GsMpbA3mv1OJ6YMp3T5ORQ)
[![Discord](https://img.shields.io/badge/discord-join%20chat-blue.svg)](https://discord.gg/wukrezR)
Expand Down
111 changes: 111 additions & 0 deletions blockchainetl/jobs/exporters/gcs_item_exporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# MIT License
#
# Copyright (c) 2020 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import json
import logging
from collections import defaultdict

from google.cloud import storage


def build_block_bundles(items):
blocks = defaultdict(list)
transactions = defaultdict(list)
logs = defaultdict(list)
token_transfers = defaultdict(list)
traces = defaultdict(list)
for item in items:
item_type = item.get('type')
if item_type == 'block':
blocks[item.get('number')].append(item)
elif item_type == 'transaction':
transactions[item.get('block_number')].append(item)
elif item_type == 'log':
logs[item.get('block_number')].append(item)
elif item_type == 'token_transfer':
token_transfers[item.get('block_number')].append(item)
elif item_type == 'trace':
traces[item.get('block_number')].append(item)
else:
logging.info(f'Skipping item with type {item_type}')

block_bundles = []
for block_number in sorted(blocks.keys()):
if len(blocks[block_number]) != 1:
raise ValueError(f'There must be a single block for a given block number, was {len(blocks[block_number])} for block number {block_number}')
block_bundles.append({
'block': blocks[block_number][0],
'transactions': transactions[block_number],
'logs': logs[block_number],
'token_transfers': token_transfers[block_number],
'traces': traces[block_number],
})

return block_bundles


class GcsItemExporter:

def __init__(
self,
bucket,
path='blocks',
build_block_bundles_func=build_block_bundles):
self.bucket = bucket
self.path = normalize_path(path)
self.build_block_bundles_func = build_block_bundles_func
self.storage_client = storage.Client()

def open(self):
pass

def export_items(self, items):
block_bundles = self.build_block_bundles_func(items)

for block_bundle in block_bundles:
block = block_bundle.get('block')
if block is None:
raise ValueError('block_bundle must include the block field')
block_number = block.get('number')
if block_number is None:
raise ValueError('block_bundle must include the block.number field')

destination_blob_name = f'{self.path}/{block_number}.json'

bucket = self.storage_client.bucket(self.bucket)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string(json.dumps(block_bundle))
logging.info(f'Uploaded file gs://{self.bucket}/{destination_blob_name}')

def close(self):
pass


def normalize_path(p):
if p is None:
p = ''
if p.startswith('/'):
p = p[1:]
if p.endswith('/'):
p = p[:len(p) - 1]

return p
39 changes: 25 additions & 14 deletions blockchainetl/jobs/exporters/google_pubsub_item_exporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,19 @@

class GooglePubSubItemExporter:

def __init__(self, item_type_to_topic_mapping, message_attributes=('item_id', 'item_timestamp')):
def __init__(self, item_type_to_topic_mapping, message_attributes=(),
batch_max_bytes=1024 * 5, batch_max_latency=1, batch_max_messages=1000,
enable_message_ordering=False):
self.item_type_to_topic_mapping = item_type_to_topic_mapping
self.publisher = create_publisher()

self.batch_max_bytes = batch_max_bytes
self.batch_max_latency = batch_max_latency
self.batch_max_messages = batch_max_messages

self.enable_message_ordering = enable_message_ordering

self.publisher = self.create_publisher()

self.message_attributes = message_attributes

def open(self):
Expand All @@ -46,7 +56,7 @@ def export_items(self, items):
# details = "channel is in state TRANSIENT_FAILURE"
# https://stackoverflow.com/questions/55552606/how-can-one-catch-exceptions-in-python-pubsub-subscriber-that-are-happening-in-i?noredirect=1#comment97849067_55552606
logging.info('Recreating Pub/Sub publisher.')
self.publisher = create_publisher()
self.publisher = self.create_publisher()
raise e

@timeout_decorator.timeout(300)
Expand All @@ -66,7 +76,8 @@ def export_item(self, item):
topic_path = self.item_type_to_topic_mapping.get(item_type)
data = json.dumps(item).encode('utf-8')

message_future = self.publisher.publish(topic_path, data=data, **self.get_message_attributes(item))
ordering_key = 'all' if self.enable_message_ordering else ''
message_future = self.publisher.publish(topic_path, data=data, ordering_key=ordering_key, **self.get_message_attributes(item))
return message_future
else:
logging.warning('Topic for item type "{}" is not configured.'.format(item_type))
Expand All @@ -80,15 +91,15 @@ def get_message_attributes(self, item):

return attributes

def close(self):
pass

def create_publisher(self):
batch_settings = pubsub_v1.types.BatchSettings(
max_bytes=self.batch_max_bytes,
max_latency=self.batch_max_latency,
max_messages=self.batch_max_messages,
)

def create_publisher():
batch_settings = pubsub_v1.types.BatchSettings(
max_bytes=1024 * 5, # 5 kilobytes
max_latency=1, # 1 second
max_messages=1000,
)
publisher_options = pubsub_v1.types.PublisherOptions(enable_message_ordering=self.enable_message_ordering)
return pubsub_v1.PublisherClient(batch_settings=batch_settings, publisher_options=publisher_options)

return pubsub_v1.PublisherClient(batch_settings)
def close(self):
pass
54 changes: 54 additions & 0 deletions blockchainetl/jobs/exporters/kafka_exporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import collections
import json
import logging

from kafka import KafkaProducer

from blockchainetl.jobs.exporters.converters.composite_item_converter import CompositeItemConverter


class KafkaItemExporter:

def __init__(self, output, item_type_to_topic_mapping, converters=()):
self.item_type_to_topic_mapping = item_type_to_topic_mapping
self.converter = CompositeItemConverter(converters)
self.connection_url = self.get_connection_url(output)
print(self.connection_url)
self.producer = KafkaProducer(bootstrap_servers=self.connection_url)

def get_connection_url(self, output):
try:
return output.split('/')[1]
except KeyError:
raise Exception('Invalid kafka output param, It should be in format of "kafka/127.0.0.1:9092"')

def open(self):
pass

def export_items(self, items):
for item in items:
self.export_item(item)

def export_item(self, item):
item_type = item.get('type')
if item_type is not None and item_type in self.item_type_to_topic_mapping:
data = json.dumps(item).encode('utf-8')
print(data)
return self.producer.send(self.item_type_to_topic_mapping[item_type], value=data)
else:
logging.warning('Topic for item type "{}" is not configured.'.format(item_type))

def convert_items(self, items):
for item in items:
yield self.converter.convert_item(item)

def close(self):
pass


def group_by_item_type(items):
result = collections.defaultdict(list)
for item in items:
result[item.get('type')].append(item)

return result
42 changes: 42 additions & 0 deletions blockchainetl/jobs/exporters/multi_item_exporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# MIT License
#
# Copyright (c) 2018 Evgeny Medvedev, evge.medvedev@gmail.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.


class MultiItemExporter:
def __init__(self, item_exporters):
self.item_exporters = item_exporters

def open(self):
for exporter in self.item_exporters:
exporter.open()

def export_items(self, items):
for exporter in self.item_exporters:
exporter.export_items(items)

def export_item(self, item):
for exporter in self.item_exporters:
exporter.export_item(item)

def close(self):
for exporter in self.item_exporters:
exporter.close()
2 changes: 2 additions & 0 deletions blockchainetl/logging_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ def logging_basic_config(filename=None):
logging.basicConfig(level=logging.INFO, format=format, filename=filename)
else:
logging.basicConfig(level=logging.INFO, format=format)

logging.getLogger('ethereum_dasm.evmdasm').setLevel(logging.ERROR)
8 changes: 6 additions & 2 deletions docs/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,11 +207,15 @@ You can tune `--batch-size`, `--max-workers` for performance.
- This command outputs blocks, transactions, logs, token_transfers to the console by default.
- Entity types can be specified with the `-e` option,
e.g. `-e block,transaction,log,token_transfer,trace,contract,token`.
- Use `--output` option to specify the Google Pub/Sub topic or Postgres database where to publish blockchain data,
- Use `--output` option to specify the Google Pub/Sub topic, Postgres database or GCS bucket where to publish blockchain data,
- For Google PubSub: `--output=projects/<your-project>/topics/crypto_ethereum`.
Data will be pushed to `projects/<your-project>/topics/crypto_ethereum.blocks`, `projects/<your-project>/topics/crypto_ethereum.transactions` etc. topics.
- For Postgres: `--output=postgresql+pg8000://<user>:<password>@<host>:<port>/<database_name>`,
e.g. `--output=postgresql+pg8000://postgres:admin@127.0.0.1:5432/ethereum`.
e.g. `--output=postgresql+pg8000://postgres:admin@127.0.0.1:5432/ethereum`.
- For GCS: `--output=gs://<bucket_name>`. Make sure to install and initialize `gcloud` cli.
- For Kafka: `--output=kafka/<host>:<port>`, e.g. `--output=kafka/127.0.0.1:9092`
- Those output types can be combined with a comma e.g. `--output=gs://<bucket_name>,projects/<your-project>/topics/crypto_ethereum`

The [schema](https://github.com/blockchain-etl/ethereum-etl-postgres/tree/master/schema)
and [indexes](https://github.com/blockchain-etl/ethereum-etl-postgres/tree/master/indexes) can be found in this
repo [ethereum-etl-postgres](https://github.com/blockchain-etl/ethereum-etl-postgres).
Expand Down
2 changes: 1 addition & 1 deletion docs/dockerhub.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Uploading to Docker Hub

```bash
ETHEREUMETL_VERSION=1.7.2
ETHEREUMETL_VERSION=1.10.1
docker build -t ethereum-etl:${ETHEREUMETL_VERSION} -f Dockerfile .
docker tag ethereum-etl:${ETHEREUMETL_VERSION} blockchainetl/ethereum-etl:${ETHEREUMETL_VERSION}
docker push blockchainetl/ethereum-etl:${ETHEREUMETL_VERSION}
Expand Down
6 changes: 4 additions & 2 deletions docs/exporting-the-blockchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ export the data ~40 times faster, you will need to set up a local Ethereum node:
Make sure it downloaded the blocks that you need by executing `eth.syncing` in the JS console.
You can export blocks below `currentBlock`,
there is no need to wait until the full sync as the state is not needed (unless you also need contracts bytecode
and token details; for those you need to wait until the full sync).

and token details; for those you need to wait until the full sync). Note that you may need to wait for another day or
two for the node to download the states. See this issue https://github.com/blockchain-etl/ethereum-etl/issues/265#issuecomment-970451522.
Make sure to set `--txlookuplimit 0` if you use geth.

1. Install Ethereum ETL: `> pip3 install ethereum-etl`

1. Export all:
Expand Down
3 changes: 0 additions & 3 deletions docs/limitations.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
# Limitation

- In case the contract is a proxy, which forwards all calls to a delegate, interface detection doesn’t work,
which means `is_erc20` and `is_erc721` will always be false for proxy contracts and they will be missing in the `tokens`
table.
- The metadata methods (`symbol`, `name`, `decimals`, `total_supply`) for ERC20 are optional, so around 10% of the
contracts are missing this data. Also some contracts (EOS) implement these methods but with wrong return type,
so the metadata columns are missing in this case as well.
Expand Down
6 changes: 5 additions & 1 deletion ethereumetl/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from blockchainetl.logging_utils import logging_basic_config
logging_basic_config()

import click

from ethereumetl.cli.export_all import export_all
Expand All @@ -44,7 +48,7 @@


@click.group()
@click.version_option(version='1.7.2')
@click.version_option(version='1.10.1')
@click.pass_context
def cli(ctx):
pass
Expand Down
Loading