StreamSets Partners

How DataOps is Adding Value to Data Lakes

For those of you who joined us on June 6th, you dialed into a forward-thinking conversation between three industry experts. They waxed poetic about topics including big data, DataOps, governance, data science, and more in an effort to help modern data architects and analytics professionals better understand the emerging practices and themes around DataOps. If […]

Creating the OmniSci F1 Demo: Real-Time Data Ingestion With StreamSets

Randy Zwitch is a Senior Director of Developer Advocacy at OmniSci, enabling customers and community users alike to utilize OmniSci to its fullest potential. With broad industry experience in energy, digital analytics, banking, telecommunications and media, Randy brings a wealth of knowledge across verticals as well as an in-depth knowledge of open-source tools for analytics. […]

A Cost Comparison of a Cloudera Hadoop Cluster with StreamSets Ingestion Framework on Oracle Cloud Infrastructure

Introduction It should come as no surprise that a Hadoop cluster and the public cloud go together like peanut butter and jelly because of scale, agility, and economy. It should come as even less of a surprise that a software provider like Oracle is now providing an enterprise-grade cloud via its bare metal compute offering […]

Ingestion for a Cyber Security Data Lake with Oracle and StreamSets

If you were lucky enough to get the gift of replacing your existing security event and incident system (SEIM) this year, then there is a chance your organization has considered building a cybersecurity data lake. Maybe the current solution is too expensive or doesn’t support complex data or distributed algorithms. Maybe it lacks capabilities in […]

Solving Data Quality in Streaming Data Flows

Vinu Kumar is Chief Technologist at HorizonX, based in Sydney, Australia. Vinu helps businesses in unifying data, focusing on a centralized data architecture. In this guest post, reposted from the original here, he explains how to automate data quality using open source tools such as StreamSets Data Collector, Apache Griffin and Apache Kafka. “Data is the new oil. […]

Scaling Data Collectors on Azure Kubernetes Service

In this blog post, I will present a step-by-step guide on how to scale Data Collector instances on Azure Kubernetes Service (AKS) using provisioning agents—which help automate upgrading and scaling resources on-demand, without having to stop execution of pipeline jobs. AKS removes the complexity of implementing, installing, and maintaining Kubernetes in Azure and you only […]

Execute Machine Learning Jobs in Microsoft Azure Databricks from StreamSets

In my previous blog post, I demonstrated how to achieve low-latency inference using Databricks ML models in StreamSets. Now let’s say you have a dataflow pipeline that is ingesting data, enriching it, performing transformations, and based on certain condition(s), you’d like to (re)train the Databricks ML model. For instance, using different value for hyperparameter n_estimators […]

Accelerate Your Journey To The Cloud Data Warehouse: StreamSets For Snowflake

Introduction Data warehouses are a critical component of modern data architecture in enterprises that leverage massive amounts of data to drive quality of their products and services. A data warehouse is an OLAP (Online Analytical Processing) database that collects data from transactional databases such as Billing, CRM, ERP, etc. and provides a layer on top […]

Adaptive Data Integration and Operations on Oracle Cloud using StreamSets

StreamSets is pleased to announce a new partnership with Oracle Cloud Infrastructure (OCI). As enterprises move their big data workloads to the cloud, it becomes imperative that their Data Operations are more resilient and adaptive to continue to serve the business’s needs.  This is why StreamSets Data Collector™ is now easily deployable on OCI. What […]

Receive Updates

Receive Updates

Join our mailing list to receive the latest news from StreamSets.

You have Successfully Subscribed!