[go: nahoru, domu]

Skip to content

Commit

Permalink
committing streaming
Browse files Browse the repository at this point in the history
  • Loading branch information
antje committed May 3, 2021
1 parent dcb1c95 commit 0fea3bc
Show file tree
Hide file tree
Showing 1,030 changed files with 132,843 additions and 0 deletions.
4 changes: 4 additions & 0 deletions 11_stream/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
DeliverKinesisAnalyticsToCloudWatch.zip
InvokeSageMakerEndpointFromKinesis.zip
PushNotificationToSNS.zip
spark-2.4.6-bin-pipelineai/
203 changes: 203 additions & 0 deletions 11_stream/00_Overview.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Continuous Analytics and Machine Learning over Streaming Data\n",
"\n",
"Streaming technologies provide you with the tools to collect, process, and analyze data streams in real time. AWS offers a wide range of streaming technology options including Amazon Managed Streaming for Apache Kafka (Amazon MSK), and the [Amazon Kinesis](https://aws.amazon.com/kinesis/) family of services. \n",
"\n",
"With Kinesis Data Firehose, you can prepare and load the data continuously to a destination of your choice. With Kinesis Data Analytics, you can process and analyze the data as it arrives. And with Kinesis Data Streams, you can manage the ingest of data streams for custom applications. \n",
"\n",
"In this section, we move from our customer reviews training dataset into a real-world scenario. Customer feedback about products appear in all of a company's social media channels, on partner websites, in customer support messages etc. We need to capture this valuable customer sentiment about our products as quickly as possible to spot trends and react fast.\n",
"\n",
"We will focus on analyzing a continuous stream of product review messages that we collect from all available online channels. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"img/online_reviews_architecture.png\" width=\"100%\" align=\"left\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a first step, we analyze the sentiment of the customer, so we can identify which customers might need high-priority attention. \n",
"\n",
"Next, we run continuous streaming analytics over the incoming review messages to capture the average sentiment per product category. We visualize the continuous average sentiment in a metrics dashboard for the line of business owners. The line of business owners can now detect sentiment trends quickly, and take action. \n",
"\n",
"We also calculate an anomaly score of the incoming messages to detect anomalies in the data schema or data values. In case of a rising anomaly score, we can alert the application developers in charge to investigate the root cause. \n",
"\n",
"As a last metric, we also calculate a continuous approximate count of the received messages. This number of online messages could be used by the digital marketing team to measure effectiveness of social media campaigns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## _Kinesis Data Firehose vs. Kinesis Data Streams_\n",
"\n",
"### Kinesis Data Firehose\n",
"* Amazon Kinesis Data Firehose is the easiest way to load streaming data into data stores and analytics tools. \n",
"* It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. \n",
"* It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.\n",
"\n",
"### Kinesis Data Streams\n",
"* Amazon Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs. \n",
"* You can continuously add various types of data such as clickstreams, application logs, and social media to an Amazon Kinesis data stream from hundreds of thousands of sources. \n",
"* Within seconds, the data will be available for your Amazon Kinesis Applications to read and process from the stream."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ingest Streaming Data Using Kinesis Data Firehose"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## _Transform Data in Kinesis Data Firehose delivery stream_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"img/kinesis_firehose_transform.png\" width=\"90%\" align=\"left\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## _Preprocess streaming data in Kinesis Data Analytics_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"img/kinesis-analytics-transformed_data.png\" width=\"90%\" align=\"left\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Analyze Streaming Data with Kinesis Data Analytics\n",
"\n",
"## _Calculating AVG Star Rating_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"img/use_case_1_analytics.png\" width=\"80%\" align=\"left\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## _Detect Anomalies of Streaming Data_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"img/use_case_2_anomaly.png\" width=\"82%\" align=\"left\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## _Calculate Approxmimate Counts of Streaming Data_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"img/use_case_3_count.png\" width=\"80%\" align=\"left\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Release Resources"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%html\n",
"\n",
"<p><b>Shutting down your kernel for this notebook to release resources.</b></p>\n",
"<button class=\"sm-command-button\" data-commandlinker-command=\"kernelmenu:shutdown\" style=\"display:none;\">Shutdown Kernel</button>\n",
" \n",
"<script>\n",
"try {\n",
" els = document.getElementsByClassName(\"sm-command-button\");\n",
" els[0].click();\n",
"}\n",
"catch(err) {\n",
" // NoOp\n",
"} \n",
"</script>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%javascript\n",
"\n",
"try {\n",
" Jupyter.notebook.save_checkpoint();\n",
" Jupyter.notebook.session.delete();\n",
"}\n",
"catch(err) {\n",
" // NoOp\n",
"}"
]
}
],
"metadata": {
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "Python 3 (Data Science)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading

0 comments on commit 0fea3bc

Please sign in to comment.