Data Engineering with AWS - Book Review

Data Engineering with AWS - Book Review

·

5 min read

  • We live in a world where the amount of data being generated is constantly increasing. While a few decades ago, an organization may have had a single database that could store everything they needed to track, today most organizations have tens, hundreds, or even thousands of databases, along with data warehouses, and perhaps a data lake. And these data stores are being fed from an increasing number of data sources (transaction data, web server log files, IoT and other sensors, and social media, to name just a few).

  • It is no surprise that we hear more and more companies talk about being data-driven in their decision making. But in order for an organization to be truly data-driven, they need to be masters of managing and drawing insights from these ever-increasing quantities and types of data. And to enable this, organizations need to employ people with specialized data skills.

  • Doing a search on LinkedIn for jobs related to data returns over 1.5 million results (and that is just for the United States!). The job titles include roles such as data engineers (with 185,000 results), data scientists (120,000 results), and data architects (75,000 results).

  • While this book will not magically make you a data engineer, it has been designed to accelerate your journey toward data engineering on AWS.

  • By the end of this book, you will not only have learned some of the core concepts around data engineering, but you will also have a good understanding of the wide variety of tools available in AWS for working with data. You will also have been through numerous hands-on exercises, gaining practical experience with things such as ingesting streaming data, transforming and optimizing data, building visualizations, and even drawing insights from data using AI.

Who this book is for ?

  • This book has been designed for two groups of people; firstly, those people looking to get started with a career in data engineering, and who want to learn core data engineering concepts. This book introduces many different aspects of data engineering, providing a comprehensive high-level understanding of, and practical hands-on experience with, different focus areas of data engineering.

  • Secondly, this book is for those people who may already have an established career focused on data, but who are new to the cloud, and to AWS in particular. For these people, this book provides a clear understanding of many of the different AWS services for working with data and gets them hands-on experience with a variety of these AWS services.

  • To start with, we examine why data is so important to organizations today, and introduce foundational concepts of data engineering, including coverage of governance and security topics. We also learn about the AWS services that form part of the data engineer’s toolkit, and get hands-on with creating an AWS account and using services such as Amazon S3, AWS Lambda, and AWS Identity and Access Management (IAM).

This section comprises the following chapters: • Chapter 1, An Introduction to Data Engineering • Chapter 2, Data Management Architectures for Analytics • Chapter 3, The AWS Data Engineer’s Toolkit • Chapter 4, Data Cataloging, Security, and Governance

Section 2: Architecting and Implementing Data Lakes and Data

Lake Houses

  • In this section of the book, we examine an approach for architecting a high-level data pipeline and then dive into the specifics of data ingestion and transformation. We also examine different types of data consumers, learn about the important role of data marts and data warehouses, and finally put it all together by orchestrating data pipelines. We get hands-on with various AWS services for data ingestion (Amazon Kinesis and DMS), transformation (AWS Glue Studio), consumption (AWS Glue DataBrew), and pipeline orchestration (Step Functions).

This section comprises the following chapters: • Chapter 5, Architecting Data Engineering Pipelines • Chapter 6, Ingesting Batch and Streaming Data • Chapter 7, Transforming Data to Optimize for Analytics • Chapter 8, Identifying and Enabling Data Consumers • Chapter 9, Loading Data into a Data Mart • Chapter 10, Orchestrating the Data Pipeline

Section 3: The Bigger Picture: Data Analytics, Data Visualization, and Machine Learning

  • In this section, we examine the bigger picture of data analytics in modern organizations. We learn about the tools that data consumers commonly use to work with data transformed by data engineers, and briefly look into how machine learning (ML) and artificial intelligence (AI) can draw rich insights out of data.

  • We also get hands-on with tools for running ad hoc SQL queries on data in the data lake (Amazon Athena), for creating data visualizations (Amazon QuickSight), and for using AI to derive insights from data (Amazon Comprehend). We then conclude by looking at data engineering examples from the real world and explore some emerging trends in data engineering.

This section comprises the following chapters: • Chapter 11, Ad Hoc Queries with Amazon Athena • Chapter 12, Visualizing Data with Amazon QuickSight • Chapter 13, Enabling Artificial Intelligence and Machine Learning • Chapter 14, Wrapping Up the First Part of Your Learning Journey

Conclusion

  • I thoroughly enjoyed reading "Data Engineering with AWS" by Gareth Eagar. Thank you to Shifa Ansari at Packt team for sharing this book and for the opportunity of an Editorial Review of the same. Strongly recommend it if you want to Learn how to design and build cloud-based data transformation pipelines using AWS.

Personally, I loved 👀👇:

👉 The book starts by exploring the data engineering concepts and emerging technologies.

👉 the book does a good job of explaining the data engineering concepts including how to ingest streaming data with Amazon Kinesis Data Firehose and then Optimize, denormalize, and join datasets with AWS Glue Studio, how to Use Amazon S3 events to trigger a Lambda process to transform a file and how to Run complex SQL queries on data lake data using Amazon Athena, how can we Load data into a Redshift data warehouse and run queries and Create a visualization of your data using Amazon QuickSight, how we can Extract sentiment data from a dataset using Amazon Comprehend.

👉 Finally it helps us understand how to effectively design and build data transformation pipelines using AWS.

👉 Link of the Book is here

This post is a collaboration with Packt, I recommend following them if you are interested in book releases and growing the community! ❤

Did you find this article valuable?

Support Adit Modi by becoming a sponsor. Any amount is appreciated!