SQL on big data : technology, architecture, and innovation /

by Pal, Sumit

Published by : Apress, ([United States] : ) Physical details: xvii, 157 p. ; 24 cm. ISBN: 1484222466 Subject(s): SQL (Computer program language) | Big data Year: 2016

Online Resources:

OhioLINK
- Connect to resource
SpringerLink
- Connect to resource
SpringerLink
- Connect to resource (off-campus)

Tags from this library:

No tags from this library for this title.

Item type	Location	Call Number	Status	Notes	Date Due
Book	AUM Main Library	006.312 P153 (Browse Shelf)	Available	674

At a Glance; Contents; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Chapter 1: Why SQL on Big Data?; Why SQL on Big Data?; Why RDBMS Cannot Scale; SQL-on-Big-Data Goals; SQL-on-Big-Data Landscape; Open Source Tools; Apache Drill; Apache Phoenix; Apache Presto; BlinkDB; Impala; Hadapt; Hive; Kylin; Tajo; Spark SQL; Spark SQL with Tachyon; Splice Machine; Trafodion; Commercial Tools; Actian Vector; AtScale; Citus; Greenplum; HAWQ; JethroData; SQLstream; VoltDB; Appliances and Analytic DB Engines; IBM BLU; Microsoft PolyBase; Netezza; Oracle Exadata

TeradataVertica; How to Choose an SQL-on-Big-Data Solution; Summary; Chapter 2: SQL-on-Big-Data Challenges & Solutions; Types of SQL; Query Workloads; Types of Data: Structured, Semi-Structured, and Unstructured; Semi-Structured Data; Unstructured Data; How to Implement SQL Engines on Big Data; SQL Engines on Traditional Databases; How an SQL Engine Works in an Analytic Database; Why Is DML Difficult on HDFS?; Challenges to Doing Low-Latency SQL on Big Data; Approaches to Solving SQL on Big Data; Approaches to Reduce Latency on SQL Queries; File Formats; Text/CSV Files; JSON Records

Avro FormatSequence Files; RC Files; ORC Files; Parquet Files; How to Choose a File Format?; Data Compression; Indexing, Partitioning, and Bucketing; Why Indexing Is Difficult; Partitioning; Advantages; Limitations; Bucketing; Recommendations; Summary; Chapter 3: Batch SQL-Architecture; Hive; Hive Architecture Deep Dive; How Hive Translates SQL into MR; Hive Query Compiler; Analytic Functions in Hive; Common Real-Life Use Cases of Analytic Functions; TopN; Clickstream Sessionization; Grouping Sets, Cube, and Rollup; ACID Support in Hive; Serialization and SerDe in Hive

Performance Improvements in HiveOptimization by Using a Broadcast Join; Pipelining the Data for Joins; Dynamically Partitioned Joins; Vectorization of Queries; Use of LLAP with Tez; CBO Optimizers; Join Order; Bushy Trees; Table Sizing; Recommendations to Speed Up Hive; Upcoming Features in Hive; Summary; Chapter 4: Interactive SQL-Architecture; Why Is Interactive SQL So Important?; SQL Engines for Interactive Workloads; Spark; Spark Stack; Spark Architecture; Spark SQL; Spark SQL Architecture; Spark SQL Optimization-Catalyst Optimizer; Spark SQL with Tachyon (Alluxio)

Analytic Query Support in Spark SQLGeneral Architecture Pattern; Impala; Impala Architecture; Impala Optimizations; HDFS Caching; File Format Selection; Recommendations to Make Impala Queries Faster; Code Generation; SQL Enhancements and Impala Shortcomings; Apache Drill; Apache Drill Architecture; Key Features; Query Execution; Vertica; Vertica with Hadoop; Hadoop MapReduce Connector; Vertica Hadoop Connector for HDFS; Jethro Data; Others; MPP vs. Batch-Comparisons; Capabilities and Characteristics to Look for in the SQL Engine; Technical Decisions; Soft Decisions; Summary

Available to OhioLINK libraries

There are no comments for this item.

Library of American University of Madaba

Public Lists

SQL on big data : technology, architecture, and innovation /

by Pal, Sumit

Search for this title in: