Today AtScale released its Q4 benchmark results for the major big data SQL engines: Spark, Impala, Hive/Tez, and Presto.
The findings prove a lot of what we already know: Impala is better for needles in moderate-size haystacks, even when there are a lot of users. Presto also does well here. Hive and Spark do better on long-running analytics queries.
I spoke to Joshua Klar, AtScale’s vice president of product management, and he noted that many of the company’s customers use two engines. Generally they view Hive as more stable and tend to run their long-running queries on it. All of their Hive customers use Tez, and none use MapReduce any longer.