Cython tutorial: How to speed up Python

Python is a powerful programming language that is easy to learn and easy to work with, but it is not always the fastest to run—especially when you’re dealing with math or statistics. Third-party libraries like NumPy, which wrap C libraries, can improve the performance of some operations significantly, but sometimes you just need the raw… Continue reading Cython tutorial: How to speed up Python

Is your data lake open enough? What to watch out for

A data lake is a system or repository that stores data in its raw format along with transformed, trusted data sets, and provides both programmatic and SQL-based access to this data for diverse analytics tasks such as data exploration, interactive analytics, and machine learning. The data stored in a data lake can include structured data… Continue reading Is your data lake open enough? What to watch out for

What is Apache Spark? The big data platform that crushed Hadoop

Apache Spark defined Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. These two qualities are key to the worlds of big data and… Continue reading What is Apache Spark? The big data platform that crushed Hadoop

Why data-driven businesses need a data catalog

Relational databases, data lakes, and NoSQL data stores are powerful at inserting, updating, querying, searching, and processing data. But the ironic aspect of working with data management platforms is they usually don’t provide robust tools or user interfaces to share what’s inside them. They are more like data vaults. You know there’s valuable data inside,… Continue reading Why data-driven businesses need a data catalog

Data catalogs explained

Relational databases, data lakes, and NoSQL data stores are powerful at inserting, updating, querying, searching, and processing data. But the ironic aspect of working with data management platforms is they usually don’t provide robust tools or user interfaces to share what’s inside them. They are more like data vaults. You know there’s valuable data inside,… Continue reading Data catalogs explained

Qubole review: Self-service big data analytics

Billed as a cloud-native data platform for analytics, AI, and machine learning, Qubole offers solutions for customer engagement, digital transformation, data-driven products, digital marketing, modernization, and security intelligence. It claims fast time to value, multi-cloud support, 10x administrator productivity, a 1:200 operator-to-user ratio, and lower cloud costs. To read this article in full, please click… Continue reading Qubole review: Self-service big data analytics

IDG Contributor Network: Who should be responsible for your data? The knowledge scientist

How can you build a data-driven culture and spur digital transformation without thinking through who should be responsible for your data? Let’s do that together. Data engineers and data scientists each occupy critical roles. Data engineers manage the data infrastructure and are in charge of designing, building, and integrating data workflows, pipelines, and the ETL… Continue reading IDG Contributor Network: Who should be responsible for your data? The knowledge scientist