Big data problem? Don’t forget search

With every cool new technology, people get overly infatuated and start using it for the wrong things. For example: Looking through a bazillion records for a few million marked with a set of criteria is a rather stupid use of MapReduce or your favorite DAG implementation (see: Spark).

For that and similar tasks, don’t forget the original big data technology: search. With great open source tools like Solr, Lucidworks, and Elasticsearch, you have a powerful way to optimize your I/O and personalize your user experience. It’s much better than holding fancy new tools from the wrong end.

A bad job for Spark

Not long ago a client asked me how to use Spark to search through a bunch of data they’d streamed into a NoSQL database. The trouble was that their pattern was a simple string search and a drill-down. It was beyond the capabilities of the database to do efficiently: They would have to pull all the data out of storage and parse through it in memory. Even with a DAG it was a little slow (not to mention expensive) on AWS.

To read this article in full or to leave a comment, please click here

from InfoWorld Big Data