On Tuesday, my company, Mammoth Data, released benchmarks on Google Cloud Dataflow and Apache Spark. The benchmarks were primarily for batch use cases on Google’s cloud infrastructure. Last year, Google contracted us to implement some use cases and extract user experience data points from people experienced in this field. As a follow-on, we did a benchmark for Google to see how its technology stacked up.
Benchmarks are often a black art of vendor-driven deception. I’ve never worked with a company more concerned with avoiding that. The benchmarks we released were constructed around Google Cloud Dataflow and Spark’s batch processing capabilities. They don’t address the more rapidly developing parts of both engines: the streaming portion.