Big data face-off: Spark vs. Impala vs. Hive vs. Presto

Today AtScale released its Q4 benchmark results for the major big data SQL engines: Spark, Impala, Hive/Tez, and Presto.

The findings prove a lot of what we already know: Impala is better for needles in moderate-size haystacks, even when there are a lot of users. Presto also does well here. Hive and Spark do better on long-running analytics queries.

I spoke to Joshua Klar, AtScale’s vice president of product management, and he noted that many of the company’s customers use two engines. Generally they view Hive as more stable and tend to run their long-running queries on it. All of their Hive customers use Tez, and none use MapReduce any longer.

To read this article in full or to leave a comment, please click here

from InfoWorld Big Data