Why you should use Presto for ad hoc analytics

Presto! It’s not only an incantation to excite your audience after a magic trick, but also a name being used more and more when discussing how to churn through big data. While there are many deployments of Presto in the wild, the technology — a distributed SQL query engine that supports all kinds of data… Continue reading Why you should use Presto for ad hoc analytics

Rakuten frees itself of Hadoop investment in two years

Based in San Mateo, California, Rakuten Rewards is a shopping rewards company that makes money through affiliate marketing links across the web. In return, members earn reward points every time they make a purchase through a partner retailer and get cash back rewards. Naturally this drives a lot of user insight data – hundreds of… Continue reading Rakuten frees itself of Hadoop investment in two years

Cython tutorial: How to speed up Python

Python is a powerful programming language that is easy to learn and easy to work with, but it is not always the fastest to run—especially when you’re dealing with math or statistics. Third-party libraries like NumPy, which wrap C libraries, can improve the performance of some operations significantly, but sometimes you just need the raw… Continue reading Cython tutorial: How to speed up Python

Is your data lake open enough? What to watch out for

A data lake is a system or repository that stores data in its raw format along with transformed, trusted data sets, and provides both programmatic and SQL-based access to this data for diverse analytics tasks such as data exploration, interactive analytics, and machine learning. The data stored in a data lake can include structured data… Continue reading Is your data lake open enough? What to watch out for

What is Apache Spark? The big data platform that crushed Hadoop

Apache Spark defined Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. These two qualities are key to the worlds of big data and… Continue reading What is Apache Spark? The big data platform that crushed Hadoop

Why data-driven businesses need a data catalog

Relational databases, data lakes, and NoSQL data stores are powerful at inserting, updating, querying, searching, and processing data. But the ironic aspect of working with data management platforms is they usually don’t provide robust tools or user interfaces to share what’s inside them. They are more like data vaults. You know there’s valuable data inside,… Continue reading Why data-driven businesses need a data catalog

Data catalogs explained

Relational databases, data lakes, and NoSQL data stores are powerful at inserting, updating, querying, searching, and processing data. But the ironic aspect of working with data management platforms is they usually don’t provide robust tools or user interfaces to share what’s inside them. They are more like data vaults. You know there’s valuable data inside,… Continue reading Data catalogs explained

Qubole review: Self-service big data analytics

Billed as a cloud-native data platform for analytics, AI, and machine learning, Qubole offers solutions for customer engagement, digital transformation, data-driven products, digital marketing, modernization, and security intelligence. It claims fast time to value, multi-cloud support, 10x administrator productivity, a 1:200 operator-to-user ratio, and lower cloud costs. To read this article in full, please click… Continue reading Qubole review: Self-service big data analytics

IDG Contributor Network: Who should be responsible for your data? The knowledge scientist

How can you build a data-driven culture and spur digital transformation without thinking through who should be responsible for your data? Let’s do that together. Data engineers and data scientists each occupy critical roles. Data engineers manage the data infrastructure and are in charge of designing, building, and integrating data workflows, pipelines, and the ETL… Continue reading IDG Contributor Network: Who should be responsible for your data? The knowledge scientist