Hadoop runs out of gas

Big data remains a big deal, but that fact is somewhat obscured by the recent stumbling of its former poster children: Cloudera, Hortonworks, and MapR. Once the darlings of data, able to raise gargantuan piles of cash—Intel pumped $766 million into Cloudera in just one investment round!—the heavyweights have been forced to skinny down, whether… Continue reading Hadoop runs out of gas

Natural language processing explained

From a friend on Facebook: Me: Alexa please remind me my morning yoga sculpt class is at 5:30am. Alexa: I have added Tequila to your shopping list. We talk to our devices, and sometimes they recognize what we are saying correctly. We use free services to translate foreign language phrases encountered online into English, and… Continue reading Natural language processing explained

Deep learning explained

What is deep learning? Deep learning is a form of machine learning that models patterns in data as complex, multi-layered networks. Because deep learning is the most general way to model a problem, it has the potential to solve difficult problems—such as computer vision and natural language processing—that outstrip both conventional programming and other machine… Continue reading Deep learning explained

4 reasons big data projects fail—and 4 ways to succeed

Big data projects are, well, big in size and scope, often very ambitious, and all too often, complete failures. In 2016, Gartner estimated that 60 percent of big data projects failed. A year later, Gartner analyst Nick Heudecker‏ said his company was “too conservative” with its 60 percent estimate and put the failure rate at… Continue reading 4 reasons big data projects fail—and 4 ways to succeed

Delta Lake gives Apache Spark data sets new powers

Databricks, the main commercial backer for Apache Spark, has released Delta Lake, an open source storage layer for Spark that provides ACID transactions and other data-management functions for machine learning and other big data work. Many kinds of data work need features like ACID transactions or schema enforcement for consistency, metadata management for security, and the… Continue reading Delta Lake gives Apache Spark data sets new powers

Pub/sub messaging: Apache Kafka vs. Apache Pulsar

These days, massively scalable pub/sub messaging is virtually synonymous with Apache Kafka. Apache Kafka continues to be the rock-solid, open-source, go-to choice for distributed streaming applications, whether you’re adding something like Apache Storm or Apache Spark for processing or using the processing tools provided by Apache Kafka itself. But Kafka isn’t the only game in town.… Continue reading Pub/sub messaging: Apache Kafka vs. Apache Pulsar