Mastering Apache Spark 2.0
by Jacek Laskowski
Publisher: GitBook 2016
Number of pages: 1621
This collections of notes (what some may rashly call a 'book') serves as the ultimate place of mine to collect all the nuts and bolts of using Apache Spark. The notes aim to help me designing and developing better products with Apache Spark.
Home page url
Download or read it online for free here:
by Open Knowledge Foundation - School of Data
The Data Wrangling Handbook is a companion text to the School of Data. Its function is something like a traditional textbook -- it will provide the detail and background theory to support the School of Data courses and challenges.
by Alan F Gates - O'Reilly Media
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs. The structure of Pig programs is amenable to parallelization, which enables them to handle very large data sets.
by Eric Redmond - GitBook
This is a free little book about Riak, a scalable, high availability NoSQL datastore. Riak is an open-source, distributed key/value database for high availability and near-linear scalability. Riak has remarkably high uptime and grows with you.
by Marc Farley - Microsoft Press
The book describes a storage architecture that some experts are calling a game changer in the infrastructure industry. Called the Microsoft hybrid cloud storage, it is a way to integrate cloud storage services with traditional enterprise storage.