AWS Glue vs Cloud Data Fusion

Last Updated:

Our analysts compared AWS Glue vs Cloud Data Fusion based on data from our 400+ point analysis of ETL Tools, user reviews and our own crowdsourced data from our free software selection platform.

Product Basics

AWS Glue is a fully managed, event-driven serverless computing platform that extracts, cleanses and organizes data for insights. Automatic code generation ensures citizen data scientists and power users can create and schedule integration workflows. An event-driven architecture enables setting triggers to launch data integration processes.

A common data catalog with automatic schema generation ensures data is unique and easily accessible. With streaming data integration, it catalogs assets from datastores like Amazon S3, making it available for querying with Amazon Athena and Redshift Spectrum. Developers can access readymade endpoints to edit and test code.

Pros
  • Serverless & Scalable
  • Easy Visual Workflow
  • Built-in Data Connectors
  • Pay-per-Use Pricing
  • AWS Ecosystem Integration
Cons
  • Complex Transformations
  • Limited On-Premise Data
  • Python & Scala Only
  • Potential Cost Overruns
  • AWS Lock-in Concerns
read more...
Cloud Data Fusion streamlines data integration for businesses by building and managing pipelines that extract, transform, and load data from diverse sources. Its visual interface simplifies data movement for both technical and non-technical users, making it ideal for organizations seeking a user-friendly ETL solution. Key benefits include pre-built transformations, reusable components, and real-time data processing capabilities. Compared to other ETL offerings, users praise its intuitive design, scalability, and ease of collaboration, while acknowledging potential limitations in handling highly complex data transformations. Pricing scales with pipeline usage, offering flexibility for growing data needs.

Pros
  • Visual drag-and-drop
  • Pre-built connectors
  • Collaboration features
  • Scalable infrastructure
  • Real-time processing
Cons
  • Potential cost for large data
  • Limited custom coding
  • Learning curve for setup
  • Complex transformation challenges
  • Troubleshooting can be difficult
read more...
$0.44/M-DPU-Hour
Free Trial is unavailable →
Get a free price quote
Tailored to your specific needs
$10 Monthly, Freemium
Free Trial is unavailable →
Get a free price quote
Tailored to your specific needs
Small 
i
Medium 
i
Large 
i
Small 
i
Medium 
i
Large 
i
Windows
Mac
Linux
Android
Chromebook
Windows
Mac
Linux
Android
Chromebook
Cloud
On-Premise
Mobile
Cloud
On-Premise
Mobile

Product Assistance

Documentation
In Person
Live Online
Videos
Webinars
Documentation
In Person
Live Online
Videos
Webinars
Email
Phone
Chat
FAQ
Forum
Knowledge Base
24/7 Live Support
Email
Phone
Chat
FAQ
Forum
Knowledge Base
24/7 Live Support

Product Insights

  • Effortless Data Integration: Streamline data movement across diverse sources like databases, applications, and cloud storage with pre-built connectors and automated schema discovery.
  • Simplified Data Preparation: Clean, transform, and enrich data with a visual drag-and-drop interface and built-in transformations, eliminating the need for complex coding.
  • Serverless Scalability: Forget infrastructure management! Glue seamlessly scales to handle massive data volumes without upfront provisioning or ongoing maintenance.
  • Cost-Effective Flexibility: Pay-per-use pricing based on actual resource consumption makes Glue ideal for both small and large data pipelines, optimizing your costs.
  • Seamless AWS Integration: Leverage the power of the AWS ecosystem! Glue effortlessly integrates with S3, Redshift, and other AWS services, creating a unified data pipeline within your existing infrastructure.
  • Improved Data Accessibility: Deliver prepared data to data lakes, data warehouses, and analytics platforms, democratizing access for data scientists, analysts, and business users.
  • Enhanced Collaboration: Share data pipelines and workflows with other users and teams, fostering collaboration and streamlining data-driven workflows.
  • Centralized Data Catalog: Maintain a single source of truth for your data assets with Glue Data Catalog, ensuring data consistency and discoverability.
  • Continuous Monitoring and Optimization: Track job performance, identify bottlenecks, and optimize your pipelines for efficiency with built-in monitoring and logging tools.
  • Future-Proof Data Infrastructure: Stay ahead of the curve with Glue's serverless architecture and cloud-native approach, adapting to your evolving data needs with ease.
read more...
  • Accelerated Time to Insights: Streamline data preparation and integration processes to quickly access, clean, and deliver actionable data for analysis, leading to faster insights and decision-making.
  • Enhanced Collaboration and Governance: Foster teamwork and maintain data quality standards with a centralized platform that enables shared pipeline creation, version control, and access management.
  • Simplified Data Integration: Empower both technical and non-technical users with a visual, code-free interface that eliminates the need for extensive programming expertise in building and managing ETL pipelines.
  • Reduced Operational Costs: Minimize infrastructure management and maintenance overhead with a fully managed cloud-based solution, allowing resources to focus on strategic data initiatives.
  • Scalability and Flexibility: Handle growing data volumes and evolving business needs seamlessly with a scalable architecture that accommodates both batch and real-time processing.
  • Enhanced Data Quality and Reliability: Implement comprehensive data validation, cleansing, and standardization processes to ensure the accuracy and integrity of data for downstream analysis and decision-making.
  • Improved Data Governance and Compliance: Enforce data security and privacy regulations with built-in controls for access management, data masking, and auditing capabilities.
  • Faster Time to Market: Accelerate the delivery of data-driven products and services by streamlining data preparation and integration, enabling quicker insights and innovation.
  • Empowered Business Users: Democratize data access and analysis by providing a user-friendly platform for self-service data exploration and integration, fostering data-driven decision-making across the organization.
  • Improved Decision Making: Gain a holistic view of business operations and make better-informed decisions based on accurate, integrated, and timely data insights.
read more...
  • Console: Discover, transform and make available data assets for querying and analysis. Builds complex data integration pipelines; handles dependencies, filters bad data and retries jobs after failures. Monitor jobs and get task status alerts via Amazon Cloudwatch. 
  • Data Catalog: Gleans and stores metadata in the catalog for workflow authoring, with full version history. Search and discover desired datasets from the data catalog, irrespective of where they are located. Saves time and money – automatically computes statistics and registers partitions with a central metadata repository. 
  • Automatic Schema Discovery: Creates metadata automatically by gleaning schema, quality and data types through built-in datastore crawlers and stores it in the Data Catalog. Ensure up-to-date assets – run crawlers on a schedule, on-demand or based on event triggers. Manage streaming data schemas with the Schema Registry. 
  • Event-driven Architecture: Move data automatically into data lakes and warehouses by setting triggers based on a schedule or event. Extract, transform and load jobs with a Lambda function as soon as new data becomes available. 
  • Visual Data Prep: Prepare assets for analytics and machine learning through Glue DataBrew. Automate anomaly filtering, convert data to standard formats and rectify invalid values with more than 250 pre-designed transformations – no need to write code. 
  • Materialized Views: Create a virtual table from multiple different data sources by using SQL. Copies data from each source data store and creates a replica in the target datastore as a materialized view. Ensures data is always up-to-date by monitoring data in source stores continuously and updating target stores in real time. 
read more...
  • Visual Drag-and-Drop Interface: Build and manage data pipelines without coding, using a user-friendly interface with pre-built components for data ingestion, transformation, and loading.
  • Extensive Connector Library: Connect to a wide range of data sources, including databases, cloud storage, SaaS applications, and streaming data sources, using over 150+ pre-built connectors.
  • Pre-built Transformations: Streamline data preparation with common transformations like cleansing, filtering, joining, aggregating, and pivoting, easily applied within the visual interface.
  • Real-Time and Batch Processing: Handle both streaming data and large-scale batch processing, enabling real-time analytics and decision-making.
  • Scalable Infrastructure: Dynamically scale resources to accommodate growing data volumes and processing demands, ensuring performance and cost-effectiveness.
  • Collaboration and Version Control: Foster teamwork and maintain pipeline history with features for shared workspaces, version control, and access management.
  • Data Lineage Tracking: Track data movement and transformations throughout the pipeline, enabling data quality auditing, troubleshooting, and compliance.
  • Data Quality and Validation: Ensure data accuracy and consistency with built-in data profiling, validation, and cleansing capabilities.
  • Integration with Google Cloud: Seamlessly integrate with other Google Cloud services like BigQuery, Dataproc, Dataflow, and Cloud Storage for end-to-end data management and analytics.
  • Security and Compliance: Protect sensitive data with enterprise-grade security measures, including access controls, encryption, and auditing.
read more...

Product Ranking

#9

among all
ETL Tools

#41

among all
ETL Tools

Find out who the leaders are

Analyst Rating Summary

88
93
100
100
92
84
62
88
Show More Show More
Data Delivery
Performance and Scalability
Platform Capabilities
Platform Security
Workflow Management
Data Delivery
Performance and Scalability
Platform Security
Data Transformation
Metadata Management

Analyst Ratings for Functional Requirements Customize This Data Customize This Data

AWS Glue
Cloud Data Fusion
+ Add Product + Add Product
Data Delivery Data Quality Data Sources And Targets Connectivity Data Transformation Metadata Management Platform Capabilities Workflow Management 100 92 62 90 96 100 100 100 84 88 98 96 86 95 0 25 50 75 100
100%
0%
0%
100%
0%
0%
85%
8%
7%
69%
23%
8%
36%
0%
64%
79%
0%
21%
88%
0%
12%
96%
4%
0%
90%
0%
10%
90%
0%
10%
100%
0%
0%
86%
0%
14%
100%
0%
0%
90%
0%
10%

Analyst Ratings for Technical Requirements Customize This Data Customize This Data

100%
0%
0%
100%
0%
0%
100%
0%
0%
100%
0%
0%

User Sentiment Summary

Great User Sentiment 165 reviews
Excellent User Sentiment 139 reviews
85%
of users recommend this product

AWS Glue has a 'great' User Satisfaction Rating of 85% when considering 165 user reviews from 3 recognized software review sites.

92%
of users recommend this product

Cloud Data Fusion has a 'excellent' User Satisfaction Rating of 92% when considering 139 user reviews from 2 recognized software review sites.

4.0 (46)
5.0 (2)
4.4 (109)
4.6 (137)
3.9 (10)
n/a

Awards

SelectHub research analysts have evaluated AWS Glue and concluded it earns best-in-class honors for Workflow Management.

Workflow Management Award

Cloud Data Fusion stands above the rest by achieving an ‘Excellent’ rating as a User Favorite.

User Favorite Award

Synopsis of User Ratings and Reviews

Cost-Effective & Serverless: Pay only for resources used, eliminates server provisioning and maintenance
Simplified ETL workflows: Drag-and-drop UI & auto-generated code for easy job creation, even for non-programmers
Data Catalog: Unified metadata repository for seamless discovery & access across various data sources
Flexible Data Integration: Connects to diverse data sources & destinations (S3, Redshift, RDS, etc.)
Built-in Data Transformations: Apply pre-built & custom transformations within workflows for efficient data cleaning & shaping
Visual Data Cleaning (Glue DataBrew): Code-free data cleansing & normalization for analysts & data scientists
Scalability & Performance: Auto-scaling resources based on job needs, efficient Apache Spark engine for fast data processing
Community & Support: Active user community & helpful AWS support resources for problem-solving & best practices
Show more
Visual Interface: Build and manage pipelines with a drag-and-drop interface, eliminating the need for extensive coding knowledge.
Scalability: Handle large and complex datasets with ease, seamlessly scaling to meet changing demands.
Integrations: Connect to a wide range of data sources and destinations, both on-premises and in the cloud.
Pre-Built Templates: Streamline development with ready-to-use templates for common ETL tasks, accelerating time to value.
Collaboration: Foster teamwork with features for sharing pipelines and managing access, promoting efficiency and knowledge transfer.
Automation: Schedule and automate pipeline execution, ensuring timely data updates and reducing manual effort.
Security: Leverage Google Cloud's robust security infrastructure for data protection and compliance.
Show more
Limited Customization & Control: Visual interface and pre-built transformations may not be flexible enough for complex ETL needs, requiring manual coding or custom Spark jobs.
Debugging Challenges: Troubleshooting Glue jobs can be complex due to limited visibility into underlying Spark code and distributed execution, making error resolution time-consuming.
Performance Limitations for Certain Workloads: Serverless architecture may not be optimal for latency-sensitive workloads or large-scale data processing, potentially leading to bottlenecks.
Vendor Lock-in & Portability: Migrating ETL workflows from Glue to other platforms can be challenging due to its proprietary nature and lack of open-source compatibility.
Pricing Concerns for Certain Use Cases: Pay-per-use model can be expensive for long-running ETL jobs or processing massive datasets, potentially exceeding budget constraints.
Show more
Learning Curve: Interface and concepts can be initially challenging for those unfamiliar with data pipelines or Apache Beam.
Cost: Pricing model can become expensive for large-scale or complex deployments, especially with high data volumes.
Debugging: Troubleshooting pipeline errors can be difficult due to limited logging and error messages.
Customization Limits: Reliance on pre-built plugins and connectors can restrict flexibility for highly customized or niche use cases.
Performance: Some users have reported occasional performance slowdowns or job failures, especially with large datasets.
Version Control: Lacks robust built-in version control for pipelines, requiring external tools or manual management.
Show more

User reviews of AWS Glue paint a picture of a powerful and user-friendly ETL tool for the cloud, but one with limitations. Praise often centers around its intuitive visual interface, making complex data pipelines accessible even to non-programmers. Pre-built connectors and automated schema discovery further simplify setup, saving users time and effort. Glue's serverless nature and tight integration with the broader AWS ecosystem are also major draws, offering seamless scalability and data flow within a familiar environment. However, some users find Glue's strength in simplicity a double-edged sword. For complex transformations beyond basic filtering and aggregation, custom scripting in Python or Scala is required, limiting flexibility for those unfamiliar with these languages. On-premise data integration is another pain point, with Glue primarily catering to cloud-based sources. This leaves users seeking hybrid deployments or integration with legacy systems feeling somewhat stranded. Cost also arises as a concern. Glue's pay-per-use model can lead to unexpected bills for large data volumes or intricate pipelines, unlike some competitors offering fixed monthly subscriptions. Additionally, Glue's deep integration with AWS can create lock-in anxieties for users worried about switching cloud providers in the future. Overall, user reviews suggest Glue shines in cloud-based ETL for users comfortable with its visual interface and scripting limitations. Its scalability, ease of use, and AWS integration are undeniable strengths. However, for complex transformations, on-premise data needs, or cost-conscious users, alternative tools may offer a better fit.

Show more

Cloud Data Fusion emerges from user reviews as a powerful ETL tool, particularly for those prioritizing ease of use and visual workflow. Its drag-and-drop interface and pre-built templates are praised for accelerating development and reducing the need for coding expertise, making it ideal for ETL beginners or teams with mixed skillsets. This accessibility shines compared to competitors like Fivetran or Stitch, which often require more technical knowledge. However, Cloud Data Fusion's user experience isn't without its bumps. Some find the learning curve steeper than expected, particularly when troubleshooting complex pipelines. Additionally, the pricing structure can become expensive for large-scale deployments, especially compared to open-source options like Airflow. For smaller teams or budget-conscious users, these drawbacks might outweigh the benefits. Where Cloud Data Fusion truly shines is its focus on collaboration and security. Features like pipeline sharing and access control promote teamwork and knowledge transfer, while leveraging Google Cloud's robust security infrastructure is a major selling point for data-sensitive organizations. This focus on collaboration and security differentiates Cloud Data Fusion from competitors, making it a compelling choice for teams working on critical data pipelines. Ultimately, Cloud Data Fusion's strengths and weaknesses paint a picture of a versatile ETL tool that excels in ease of use, collaboration, and security. While its learning curve and pricing might deter some, its user-friendly interface and focus on specific needs make it a strong contender for teams seeking a streamlined and secure ETL solution.

Show more

Screenshots

Top Alternatives in ETL Tools


Azure Data Factory

Cloud Data Fusion

Dataflow

DataStage

Fivetran

Hevo

IDMC

Informatica PowerCenter

InfoSphere Information Server

Integrate.io

Oracle Data Integrator

Pentaho

Qlik Talend Data Integration

SAP Data Services

SAS Data Management

Skyvia

SQL Server

SQL Server Integration Services

Talend

TIBCO Cloud Integration

Head-to-Head Comparison

WE DISTILL IT INTO REAL REQUIREMENTS, COMPARISON REPORTS, PRICE GUIDES and more...

Compare products
Comparison Report
Just drag this link to the bookmark bar.
?
Table settings