Google Cloud Dataflow vs. Apache Spark: Benchmarks are in

On Tuesday, my company, Mammoth Data, released benchmarks on Google Cloud Dataflow and Apache Spark. The benchmarks were primarily for batch use cases on Google’s cloud infrastructure. Last year, Google contracted us to implement some use cases and extract user experience data points from people experienced in this field. As a follow-on, we did a benchmark for Google to see how its technology stacked up.

Benchmarks are often a black art of vendor-driven deception. I’ve never worked with a company more concerned with avoiding that. The benchmarks we released were constructed around Google Cloud Dataflow and Spark’s batch processing capabilities. They don’t address the more rapidly developing parts of both engines: the streaming portion.

To read this article in full or to leave a comment, please click here

from InfoWorld Big Data

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s