Organizations are experiencing a proliferation of data.
AWS - Analytics Automation

Organizations are experiencing a proliferation of data. This data includes logs, sensor data, social media data, and transactional data, and resides in the cloud, on premises, or as high-volume, real-time data feeds. It is increasingly important to analyze this data: stakeholders want information that is timely, accurate, and reliable. This analysis ranges from simple batch processing to complex real-time event processing. Automating workflows can ensure that necessary activities take place when required to drive the analytic processes.
With Amazon Simple Workflow (Amazon SWF), AWS Data Pipeline, and, AWS Lambda, you can build analytic solutions that are automated, repeatable, scalable, and reliable. In this post, I show you how to use these services to migrate and scale an on-premises data analytics workload.
Workflow basics
A business process can be represented as a workflow. Applications often incorporate a workflow as steps that must take place in a predefined order, with opportunities to adjust the flow of information based on certain decisions or special cases.
The following is an example of an ETL workflow:

A workflow decouples steps within a complex application. In the workflow above, bubbles represent steps or activities, diamonds represent control decisions, and arrows show the control flow through the process. This post shows you how to use Amazon SWF, AWS Data Pipeline, and AWS Lambda to automate this workflow.
Overview
SWF, Data Pipeline, and Lambda are designed for highly reliable execution of tasks, which can be event-driven, on-demand, or scheduled. The following table highlights the key characteristics of each service.
| Feature | Amazon SWF | AWS Data Pipeline | AWS Lambda |
|---|---|---|---|
| Runs in response to | Anything | Schedules | Events from AWS services/direct invocation |
| Execution order | Orders execution of application steps | Schedules data movement | Reacts to event triggers / Direct calls |
| Scheduling | On-demand | Periodic | Event-driven / on-demand / periodic |
| Hosting environment | Anywhere | AWS/on-premises | AWS |
| Execution design | Exactly once | Exactly once, configurable retry | At least once |
| Programming language | Any | JSON | Supported languages |
Let’s dive deeper into each of the services. If you are already familiar with the services, skip to the section below titled “Scenario: An ecommerce reporting ETL workflow.”
Technologies users:
- AWS
- Terraform
- Ansible
- Python
- EC2
- VPC
- RDS
- Docker
- Bash
- S3
- KMS
- Cloudwatch
- NLB
- ECR