Overview of the Chapters
Chapter 1 provides an overview of the broad and deep Amazon AI and ML stack, an enormously powerful and diverse set of services, open source libraries, and infrastructure to use for data science projects of any complexity and scale.
Chapter 2 describes how to apply the Amazon AI and ML stack to real-world use cases for recommendations, computer vision, fraud detection, natural language understanding (NLU), conversational devices, cognitive search, customer support, industrial predictive maintenance, home automation, Internet of Things (IoT), healthcare, and quantum computing.
Chapter 3 demonstrates how to use AutoML to implement a specific subset of these use cases with SageMaker Autopilot.
Chapters 4–9 dive deep into the complete model development life cycle (MDLC) for a BERT-based NLP use case, including data ingestion and analysis, feature selection and engineering, model training and tuning, and model deployment with Amazon SageMaker, Amazon Athena, Amazon Redshift, Amazon EMR, TensorFlow, PyTorch, and serverless Apache Spark.
Chapter 10 ties everything together into repeatable pipelines using MLOps with SageMaker Pipelines, Kubeflow Pipelines, Apache Airflow, MLflow, and TFX.
Chapter 11 demonstrates real-time ML, anomaly detection, and streaming analytics on real-time data streams with Amazon Kinesis and Apache Kafka.
Chapter 12 presents a comprehensive set of security best practices for data science projects and workflows, including IAM, authentication, authorization, network isolation, data encryption at rest, post-quantum network encryption in transit, governance, and auditability.
Throughout the book, we provide tips to reduce cost and improve performance for data science projects on AWS.