Support

Davy

(Customer support)

0885996168

Tawin

(Handling complaints)

0885996168

Customer support

Our services How to shop

Check order

Account

>800 million

of products on Amazon

200.000 successful

transaction through Fado168

0885996168 086356168

support@fado168.com

Learning Spark: Lightning-Fast Data Analytics Learning Spark: Lightning-Fast Data Analytics Paperback Kindle

Brand: Jules S. Damji

(310 votes)

Seller: N/A

Sell on:

Amazon US

Learning Spark: Lightning-Fast Data Analytics Learning Spark: Lightning-Fast Data Analytics Paperback Kindle

0885996168

/ Email: Support@fado168.com

Monday to Friday: from 08h - 22h, including Saturday and Sunday

$57.9

Delivery time 15-20 day. View detail

The Product does not have an exact weight, The total order value may be changed after approval

Quantity:

Seller List

The Time commitment

Fado168.com offers flexible shipping packages
3-5 days 7-20 days

+ See the detail

Free customs procedures

Fado168.com represent the customers to handle all the paperworks from the stage of export from ASIA import to Cambodia

+ See the detail

Diversified payment, flexible

Transfer by Internet Banking, via Bank account, by Wing or pay direct at Company

+ See the detail

Free delivery

Free delivery in Phnom Penh

+ See the detail

Safe shopping

With Fado168.com buyers always have insurance the risk when order the international goods

+ See the detail

Loading Product Infomation...

PRODUCT FEATURES

Highlight, take notes, and search in the book

Product Specifications

Publisher: O'Reilly Media; 2nd edition (August 25, 2020)
Language: English
Paperback: 397 pages
ISBN-10: 1492050040
ISBN-13: 978-1492050049
Item Weight: 1.4 pounds
Dimensions: 7.25 x 1 x 9.25 inches
Best Sellers Rank: #133,168 in Books (See Top 100 in Books) #24 in Mathematical Analysis (Books) #48 in Data Processing #149 in Software Development (Books)
Customer Reviews: 4.7 out of 5 stars 290Reviews

Product Information

From the Publisher

From the Preface

Who This Book Is For

Most developers who grapple with big data are data engineers, data scientists, or machine learning engineers. This book is aimed at those professionals who are looking to use Spark to scale their applications to handle massive amounts of data.

In particular, data engineers will learn how to use Spark’s Structured APIs to perform complex data exploration and analysis on both batch and streaming data; use Spark SQL for interactive queries; use Spark’s built-in and external data sources to read, refine, and write data in different file formats as part of their extract, transform, and load (ETL) tasks; and build reliable data lakes with Spark and the open source Delta Lake table format.

For data scientists and machine learning engineers, Spark’s MLlib library offers many common algorithms to build distributed machine learning models. We will cover how to build pipelines with MLlib, best practices for distributed machine learning, how to use Spark to scale single-node models, and how to manage and deploy these models using the open source library MLflow.

While the book is focused on learning Spark as an analytical engine for diverse workloads, we will not cover all of the languages that Spark supports. Most of the examples in the chapters are written in Scala, Python, and SQL. Where necessary, we have infused a bit of Java. For those interested in learning Spark with R, we recommend Javier Luraschi, Kevin Kuo, and Edgar Ruiz’s Mastering Spark with R (O’Reilly).

Finally, because Spark is a distributed engine, building an understanding of Spark application concepts is critical. We will guide you through how your Spark application interacts with Spark’s distributed components and how this is decomposed into parallel tasks on a cluster. We will also cover which deployment modes are supported and in what environments.

While there are many topics we have chosen to cover, there are a few that we have opted to not focus on. These include the older low-level Resilient Distributed Dataset (RDD) APIs and GraphX, Spark’s API for graphs and graph-parallel computation. Nor have we covered advanced topics such as how to extend Spark’s Catalyst optimizer to implement your own operations, how to implement your own catalog, or how to write your own DataSource V2 data sinks and sources. Though part of Spark, these are beyond the scope of your first book on learning Spark.

Instead, we have focused and organized the book around Spark’s Structured APIs, across all its components, and how you can use Spark to process structured data at scale to perform your data engineering or data science tasks.

Contact