skip to Main Content

The DataOps Blog

Where Change Is Welcome

StreamSets Integration with Delta Lake using Standalone


A while ago we talked to a prospect that wanted to stream their IoT data into Delta Lake without a Spark (Databricks) cluster running continuously. While streaming data into Delta Lake is supported by StreamSets, the second part of this requirement made me raise an eyebrow. The prospect wanted the pipeline to run 24/7, and‌ running a Databricks cluster can…

Roman Bukarev StreamSets By September 15, 2022

Data Warehouse Architecture: Explanation, Examples, Best Practices, and Alternatives


Data warehouses have become essential for organizations dealing with massive amounts of data, and while other alternatives come to market their adoption remains on the rise. This increased adoption accounts for a projected Compound Annual Growth Rate (CAGR) of 10.7% from 2020 to over $51B in 2028. The growth rate comes as no surprise, as there is increased demand for…

Brenna Buuck By September 13, 2022

Difference Between Slowly Changing Dimensions and Change Data Capture


While some might observe that the difference between slowly changing dimensions (SCD) And Change Data Capture (CDC) might be subtle, there is in fact a technical difference between the two processes.  Both processes detect changes in a source database and deliver the changed data to a target database. The difference between the two is almost entirely about what happens in…

Brenna Buuck By September 12, 2022

Data Integration Architecture


An organization rarely has a single data source. Instead, it aggregates data from various sources like websites, applications, social media, and databases to make data easily accessible through data integration. In addition, this data needs to be transformed before being transported to their target locations. All these ingestion and transformation processes involve data of various sizes, structures, and types, thereby…

Brenna Buuck By September 6, 2022

Getting to Success with Data Integration and ETL

Engineering, Use Cases

Data is everywhere, in different formats and databases. Being able to integrate multiple, highly varied data sources is essential to running a business today. You have to be able to Extract-Transfer-Load (ETL) the data from each source into a database suitable for data analysis, like a data warehouse. In this article, PeerSpot’s real users of StreamSets discuss how the platform…

Jesse Summan By September 1, 2022
Back To Top