Support

Davy

(Customer support)

0885996168

Tawin

(Handling complaints)

0885996168

Customer support

Our services How to shop

Check order

Account

>800 million

of products on Amazon

200.000 successful

transaction through Fado168

0885996168 086356168

support@fado168.com

Spark: The Definitive Guide: Big Data Processing Made Simple Spark: The Definitive Guide: Big Data Processing Made Simple Paperback Kindle

Brand: Bill Chambers

(444 votes)

Seller: N/A

Sell on:

Amazon US

Spark: The Definitive Guide: Big Data Processing Made Simple Spark: The Definitive Guide: Big Data Processing Made Simple Paperback Kindle

0885996168

/ Email: Support@fado168.com

Monday to Friday: from 08h - 22h, including Saturday and Sunday

$70.17

Delivery time 15-20 day. View detail

The Product does not have an exact weight, The total order value may be changed after approval

Quantity:

Seller List

The Time commitment

Fado168.com offers flexible shipping packages
3-5 days 7-20 days

+ See the detail

Free customs procedures

Fado168.com represent the customers to handle all the paperworks from the stage of export from ASIA import to Cambodia

+ See the detail

Diversified payment, flexible

Transfer by Internet Banking, via Bank account, by Wing or pay direct at Company

+ See the detail

Free delivery

Free delivery in Phnom Penh

+ See the detail

Safe shopping

With Fado168.com buyers always have insurance the risk when order the international goods

+ See the detail

Loading Product Infomation...

PRODUCT FEATURES

Product Specifications

Best Sellers Rank: #112,890 in Books (See Top 100 in Books) #25 in Data Modeling & Design (Books) #39 in Data Processing #90 in Python Programming
Customer Reviews: 4.5 out of 5 stars 444Reviews

Product Information

From the Publisher

Spark: The Definitive Guide: Big Data Processing Made Simple

Spark’s toolkit-illustrates all the components and libraries Spark offers to end-users.

What Is Apache Spark?

Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scientist interested in big data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or incredibly large scale.

Although the project has existed for multiple years-first as a research project started at UC Berkeley in 2009, then at the Apache Software Foundation since 2013-the open source community is continuing to build more powerful APIs and high-level libraries over Spark, so there is still a lot to write about the project. We decided to write this book for two reasons. First, we wanted to present the most comprehensive book on Apache Spark, covering all of the fundamental use cases with easy-to-run examples. Second, we especially wanted to explore the higher-level 'structured' APIs that were finalized in Apache Spark 2.0-namely DataFrames, Datasets, Spark SQL, and Structured Streaming-which older books on Spark don’t always include. We hope this book gives you a solid foundation to write modern Apache Spark applications using all the available tools in the project.

Who This Book Is For

We designed this book mainly for data scientists and data engineers looking to use Apache Spark. The two roles have slightly different needs, but in reality, most application development covers a bit of both, so we think the material will be useful in both cases. Specifically, in our minds, the data scientist workload focuses more on interactively querying data to answer questions and build statistical models, while the data engineer job focuses on writing maintainable, repeatable production applications-either to use the data scientist’s models in practice, or just to prepare data for further analysis (e.g., building a data ingest pipeline). However, we often see with Spark that these roles blur. For instance, data scientists are able to package production applications without too much hassle and data engineers use interactive analysis to understand and inspect their data to build and maintain pipelines.

While we tried to provide everything data scientists and engineers need to get started, there are some things we didn’t have space to focus on in this book. First, this book does not include in-depth introductions to some of the analytics techniques you can use in Apache Spark, such as machine learning. Instead, we show you how to invoke these techniques using libraries in Spark, assuming you already have a basic background in machine learning. Many full, standalone books exist to cover these techniques in formal detail, so we recommend starting with those if you want to learn about these areas. Second, this book focuses more on application development than on operations and administration (e.g., how to manage an Apache Spark cluster with dozens of users). Nonetheless, we have tried to include comprehensive material on monitoring, debugging, and configuration in Parts V and VI of the book to help engineers get their application running efficiently and tackle day-to-day maintenance. Finally, this book places less emphasis on the older, lower-level APIs in Spark-specifically RDDs and DStreams-to introduce most of the concepts using the newer, higher-level structured APIs. Thus, the book may not be the best fit if you need to maintain an old RDD or DStream application, but should be a great introduction to writing new applications.

Contact