[go: nahoru, domu]

Skip to content

SharpData/SharpETL

Repository files navigation

Sharp ETL

Sharp ETL is a ETL framework that simplifies writing and executing ETLs by simply writing SQL workflow files. The SQL workflow file format is combined your favourite SQL dialects with just a little bit of configurations.

Getting started

Let's run a sharp etl mysql db first

docker run --name sharp_etl_db -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root -e MYSQL_DATABASE=sharp_etl mysql:5.7

build from source or download jar from releases

./gradlew buildJars -PscalaVersion=2.12 -PsparkVersion=3.3.0 -PscalaCompt=2.12.15

take a look at hello_world.sql

cat spark/src/main/resources/tasks/hello_world.sql

you will see the following contents:

-- workflow=hello_world
--  loadType=incremental
--  logDrivenType=timewindow

-- step=define variable
-- source=temp
-- target=variables

SELECT 'RESULT' AS `OUTPUT_COL`;

-- step=print SUCCESS to console
-- source=temp
-- target=console

SELECT 'SUCCESS' AS `${OUTPUT_COL}`;

run and check the console output

spark-submit --master local --class com.github.sharpdata.sharpetl.spark.Entrypoint spark/build/libs/sharp-etl-spark-standalone-3.3.0_2.12-0.1.0.jar single-job --name=hello_world --period=1440 --default-start-time="2022-07-01 00:00:00" --once --local

And you will see the output like:

== Physical Plan ==
*(1) Project [SUCCESS AS RESULT#17167]
+- Scan OneRowRelation[]
root
 |-- RESULT: string (nullable = false)

+-------+
|RESULT |
+-------+
|SUCCESS|
+-------+

Versions and dependencies

The compatible versions of Spark are as follows:

Spark Scala
2.3.x 2.11
2.4.x 2.11 / 2.12
3.0.x 2.12
3.1.x 2.12
3.2.x 2.12 / 2.13
3.3.x 2.12 / 2.13

License

FOSSA Status