What is a data lake? Flexible big data management explained

If you are tuned in to the latest technology concepts around big data, you’ve likely heard the term “data lake.” The image conjures up a large reservoir of water—and that’s what a data lake is, in concept: a reservoir. Only it’s for data. Data lake defined A data lake holds a vast amount of raw,… Continue reading What is a data lake? Flexible big data management explained

Matei Zaharia, creator of the Apache Spark project, on the big data framework | True Technologist Ep 2

In this episode of True Technologist, host Eric Knorr talks with Matei Zaharia, chief technologist at Databricks and an assistant professor of computer science at Stanford, about the Apache Spark and Apache Mesos projects from InfoWorld Big Data https://ift.tt/2NuSyOo via IFTTT

Why there are no shortcuts to machine learning

Big data remains a game for the 1 percent. Or the 15 percent, as new O’Reilly survey data suggests. According to the survey, most enterprises (85 percent) still haven’t cracked the code on AI and machine learning. A mere 15 percent “sophisticated” enterprises have been running models in production for more than five years. Importantly,… Continue reading Why there are no shortcuts to machine learning

IDG Contributor Network: Why we lose out if we leave everything to algorithms

“Bad Romance”—an amazing piece of journalism by Sarah Jeong at the Verge—implicitly answers this question. It’s about the romance genre on Kindle Unlimited, and the royal rumble that’s been happening this year. It’s a story about how “rampant algorithmic tricks” have ripped apart an author community. Deep down, the cause of the controversy is about… Continue reading IDG Contributor Network: Why we lose out if we leave everything to algorithms

How to build stateful streaming applications with Apache Flink

Fabian Hueske is a committer and PMC member of the Apache Flink project and a co-founder of Data Artisans. Apache Flink is a framework for implementing stateful stream processing applications and running them at scale on a compute cluster. In a previous article we examined what stateful stream processing is, what use cases it addresses,… Continue reading How to build stateful streaming applications with Apache Flink

Introducing BigQuery ML for building predictive models with SQL

One key to efficient data analysis of big data is to do the computations where the data lives. In some cases, that means running R, Python, Java, or Scala programs in a database such as SQL Server or in a big data environment such as Spark. But that takes some fairly technical programming and data… Continue reading Introducing BigQuery ML for building predictive models with SQL

IDG Contributor Network: Big data: enabling new approaches to IT infrastructure security

Consider modern enterprise IT infrastructure. Increasingly, it is a complex combination of on premise computing and storage and off premise, cloud-based resources. Tying all of this together is a web of data connections. Applications can run either in the cloud or locally, and all of this is subject to penetration by bad actors. Combine this… Continue reading IDG Contributor Network: Big data: enabling new approaches to IT infrastructure security