One of the challenges that data scientists face when running machine learning workloads is processing information before it’s ready for use. Google unveiled a new cloud service Thursday aimed at easing that pain.
Google Cloud Dataprep will automatically detect data schemas, joins, and anomalies like missing or duplicate values, without requiring coding. After that, it will help users build a set of rules for processing the information. Those rules are then built in Apache Streams format and can be imported into products like Google’s Cloud Dataflow for processing information as it’s imported into services like the BigQuery data warehouse service.