Prioritize predictable performance in Hadoop

The growth of Apache Hadoop over the past decade has proven that the ability of this open source technology to process data at massive scale and allow users access to shared resources is not hype. However, the downside to Hadoop is that it lacks predictability. Hadoop does not allow enterprises to ensure that the most important jobs complete on time, and it does not effectively use the full capacity of a cluster.

YARN provides the ability to preempt jobs in order to make room for other jobs that are queued up and waiting to be scheduled. Both the capacity scheduler and the fair scheduler can be statically configured to kill jobs that are taking up cluster resources otherwise needed to schedule higher-priority jobs.

To read this article in full or to leave a comment, please click here

from InfoWorld Big Data