Kafka-based storage for Zipkin.
+----------------------------*zipkin*----------------------------------------------
| [ dependency-storage ]--->( dependencies )
| ^ +-->( autocomplete-tags )
( collected-spans )-|->[ partitioning ] [ aggregation ] [ trace-storage ]--+-->( traces )
via http, kafka, | | ^ | ^ | +-->( service-names )
amq, grpc, etc. +-------|--------------------|----|---------|------|-------------------------------
| | | | |
----------------------------|--------------------|----|---------|------|-------------------------------
+-->( spans )--------+----+---------| |
| | |
*kafka* +->( traces ) |
topics | |
+->( dependencies )
-------------------------------------------------------------------------------------------------------
Spans collected via different transports are partitioned by
traceId
and stored in a partitioned spans Kafka topic. Partitioned spans are then aggregated into traces and then into dependency links, both results are emitted into Kafka topics as well. These 3 topics are used as source for local stores (Kafka Stream stores) that support Zipkin query and search APIs.
A limitation of zipkin-dependencies module, is that it requires to be scheduled with a defined frequency. This batch-oriented execution causes out-of-date values until processing runs again.
Kafka-based storage enables aggregating dependencies as spans are received, allowing a (near-)real-time calculation of dependency metrics.
To enable this, other components could be disabled. There is a profile prepared to enable aggregation and search of dependency graphs.
This profile can be enable by adding Java option: -Dspring.profiles.active=kafka-only-dependencies
Docker image includes a environment variable to set the profile:
MODULE_OPTS="-Dloader.path=lib -Dspring.profiles.active=kafka-only-dependencies"
To try out, there is a Docker compose configuration ready to test.
If an existing Kafka collector is in place downstreaming traces into an existing storage, another Kafka consumer group id can be used for zipkin-storage-kafka
to consume traces in parallel. Otherwise, you can forward spans from another Zipkin server to zipkin-storage-kafka
if Kafka transport is not available.
To build the project you will need Java 8+.
make build
And testing:
make test
If you want to build a docker image:
make docker-build
To run locally, first you need to get Zipkin binaries:
make get-zipkin
By default Zipkin will be waiting for a Kafka broker to be running on localhost:19092
.
Then run Zipkin locally:
make run-local
To validate storage make sure that Kafka topics are created so Kafka Stream instances can be initialized properly:
make kafka-topics
make zipkin-test
This will start a browser and check a traces has been registered.
It will send another trace after a minute (trace timeout
) + 1 second to trigger
aggregation and visualize dependency graph.
If you have Docker available, run:
make run-docker
And Docker image will be built and Docker compose will start.
To test it, run:
make zipkin-test-single
# or
make zipkin-test-distributed
- Single-node: span partitioning, aggregation, and storage happening on the same containers.
- Distributed-mode: partitioning and aggregation is in a different container than storage.
- Only-dependencies: only components to support aggregation and search of dependency graphs.
This project is inspired in Adrian Cole's VoltDB storage https://github.com/adriancole/zipkin-voltdb
Kafka Streams images are created with https://zz85.github.io/kafka-streams-viz/
All artifacts publish to the group ID "io.zipkin.contrib.zipkin-storage-kafka". We use a common release version for all components.
Releases are at Sonatype and Maven Central
Snapshots are uploaded to Sonatype after commits to master.
Released versions of zipkin-storage-kafka are published to GitHub Container Registry as
beta.zipkin.io/openzipkin-contrib/zipkin-storage-kafka
. See docker for details.