-
hadcom.utils Public
Advanced common functionality for hadoop
-
Taxi360 Public
Simple Example of HBase, SolR, and Kudu for Entity 360 using NY taxi data
-
-
IngestProcessStoreInNRT Public
This is a demo/training application. Used to show how easy it is to do operations like ingestion, aggregation, and change data capture. Using tools like Kafka, Spark Streaming, Flume, Kudu, SolR, H…
-
SparkUnitTestingExamples Public
This project is a collection of Spark Unit Tests Examples to help new Spark users have good examples on how to unit start their code for Spark Core, Spark SQL, and Spark Streaming
-
CopybookInputFormat Public
Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...
-
Spark.TableStatsExample Public
Simple Spark example of generating table stats for use of data quality checks
-
kairosdb Public
Forked from kairosdb/kairosdbFast scalable time series database
Java Apache License 2.0 UpdatedJan 22, 2017 -
spark.mergesort.example Public
An example of how to do a merge sort
-
CleanUpEmptyFilesTool Public
This tool is designed to look through your HDFS folders to ether identify files with no data in them or delete files with no data in them.
-
node-scale Public
A tool to figure out when to grow or shrink a cluster
Java Apache License 2.0 UpdatedJul 12, 2016 -
SparkOnKudu Public
Based off the design of SparkOnHBase. This Repo will support Spark, Spark Streaming, and Spark SQL integration with Kudu.
-
-
-
HBase.MCC Public
HBase.MCC (HBase Multi Cluster Client). The goal is to support aways up solutions with HBase through multiple clusters
-
FunHBaseLoaderExamples Public
Just for Fun do not use in the real world. :)
-
-
This is an example of how to do window analysis with Spark
-
NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBase
-
Directed.ReBalancing Public
The ability to rebalance on clusters that have HBase by selecting folders to rebalance
Java Apache License 2.0 UpdatedOct 8, 2014 -
SparkStreamingSeqSink Public
Support to write Seq Files with Spark Streaming with similar functionality as Flume HDFS Sink with Seq Files
-
flume-ng-kafka-source Public
Forked from frankyaorenjie/flume-ng-kafka-sourceJava Apache License 2.0 UpdatedSep 5, 2014 -
spark Public
Forked from apache/sparkMirror of Apache Spark
Scala Apache License 2.0 UpdatedAug 1, 2014 -
FixedLengthInputFormat Public
This is a FixedLengthInputFormat for Hadoop map reduce.
-
Spark..Unique.Seq.Generator Public
This is an example of how to make Unique Sequences in a distributed way with Spark (No dups, No Skips)
-
Spark.GraphX.Examples Public
Just some example of using GraphX
-
Giraph.TreeRooter.Example Public
A simple example of using Giraph to root nodes in a tree
-
FileIngestor Public
A simple program to put files from a directory into HDFS with the added functionality and defining how that action will happen
-
UnbalancedBucketMergeJoin Public
This will do a Merge Join of absolute Sorted data any number of files of ether side.
-
HBase-FastTableCopy Public
This will contain implementations that will copy records from a table with less regions then the final table.