skip to Main Content

The DataOps Blog

Where Change Is Welcome

7 Examples of Data Pipelines


The best way to understand something is through concrete examples. I’ve put together seven examples of data pipelines that represent very typical patterns that we see our customers engage in. These are also patterns that are frequently encountered by data engineers in the production environments of any tool.  Use these patterns as a starting point for your own data integration…

Brenna Buuck By November 18, 2022

Mainframe Data Is Critical for Cloud Analytics Success—But Getting to It Isn’t Easy


Modern Data Integration Technology Is Helping With the drive to modernize data infrastructure on cloud technologies, one would be forgiven for thinking mainframe systems are destined to go the way of the dodo. The truth is, far from extinction, mainframes are making a comeback. Seventy-four percent of respondents in a 2021 Forrester survey said they see the mainframe as a…

michael andrews streamsets By November 16, 2022

How to Use Spark for Machine Learning Pipelines (With Examples)


Spark is a widely used platform for businesses today because of its support for a broad range of use cases. Developed in 2009 at U.C. Berkeley, Apache Spark has become a leading big data distributed processing framework for its fast, flexible, and developer-friendly large-scale SQL, batch processing, stream processing, and machine learning. The Basics of Apache Spark Apache Spark is…

Brenna Buuck By November 15, 2022

Spark Streaming


One crucial part of Big Data is streaming data. As the name suggests, streaming data refers to data that undergoes continuous generation from multiple sources like social media, CRM, and ERM platforms. Handling and analyzing streaming data can be complex, as data arrives from numerous sources and in various formats. It requires consideration of data integration points, data types, and…

Brenna Buuck By November 9, 2022

Use StreamSets Dynamic Engine Deployment to Reduce Public Cloud Infrastructure Costs


One of the most exciting new capabilities of StreamSets DataOps Platform is its ability to dynamically provision Public Cloud VMs running Data Collector or Transformer for Spark engines.  Public cloud VMs running StreamSets engines can be deployed "just in time" to run jobs, and can be automatically torn down once those jobs complete. This technique can significantly reduce public cloud infrastructure…

By November 2, 2022
Back To Top