2018 | Page 2 of 5 | StreamSets

By Pat Patterson September 28, 2018

Minneapolis-based phData has long been a StreamSets partner, deploying the StreamSets DataOps Platform at customers across the US. It’s not surprising then, that when phData principal solutions architect Keith Smith wanted to integrate the Ethereum blockchain platform with the Apache Hadoop filesystem and Apache Kudu, he reached for StreamSets Data Collector.

Introducing the StreamSets Test Framework

Data Integration

By Dima Spivak September 24, 2018

When we began investing in test automation several years ago, we were driven by a commitment to the quality of our products and by a need to enable our developers. Since then, our customers have let us know that they're…

Building a Real-Time Bike-Share Data Pipeline with StreamSets, Kafka and MapD

Data Integration

Operational Analytics

Stream Data Processing

By Pat Patterson September 17, 2018

Jowanza Joseph is a principal software engineer at One Click Retail with long experience of building reliable and performant distributed data systems. Recently, Jowanza built a pair of data pipelines with StreamSets Data Collector to read data from Ford GoBike and send…

StreamSets Congratulates Voya Financial And Other Cloudera Data Impact Award Winners

Data Integration

By Sean Anderson September 12, 2018

StreamSets has a rich tradition of partnering with Cloudera to highlight companies that are pushing the possible with data and advanced analytics. The Data Impact Awards is an annual event that recognizes the best organizations with data at the center…

Series C Funding
Right Where We Knew We’d Be

Data Integration

By Girish Pancha September 11, 2018

Today, we announced that StreamSets raised $35 million in a Series C funding round, led by new investor Harmony Partners. I met Mark Lotke, Harmony’s Managing Partner, over 18 months ago, and we immediately hit it off because it was clear that he really got both Data and Operations, exemplified by his investments in AppDynamics, Alation and InfluxDB. Our other new investor is Paul Drews of Tenaya Capital. Paul and StreamSets go back a long way; he was a Board Observer in his past life at Battery Ventures and must have liked what he saw. I’m also delighted that our existing investors, Dharmesh Thakker from Battery Ventures and Pete Sonsini at NEA, also participated to their fullest, validating our “say what we’ll do, do what we said” doctrine.

Streamline Data Integration for Hybrid Cloud with DataOps

Data Integration

Cloud Data Migration

By Clarke Patterson September 11, 2018

As cloud adoption grows, so does the complexity of the data architectures that serve as the backbone for modern enterprise applications and the need to enable data integration for hybrid cloud. This complexity, if not planned for, can cripple any cloud initiative. According to research firm Gartner (subscription required), by 2021, at least 75% of large and global organizations will implement a multi-cloud capable hybrid integration platform, up from less than 25% in 2018. Taking a DataOps approach to methods of data integration can help streamline how data is moved around the business and ensure integration initiatives support the cloud-oriented goals of any organization.

DataOps in Healthcare

Data Integration

Operational Analytics

By Sean Anderson August 28, 2018

In healthcare, data is delivering life-saving results with predictive capabilities that can address preventable outcomes. The intelligence guiding these initiatives relies on timely data delivery to applications and reviewers. This may involve complex, high velocity data forms with the expectation…

Easy Splunk Integration with StreamSets Data Collector

Data Integration

Stream Data Processing

By Pat Patterson August 21, 2018

Splunk is the tool-of-choice for many enterprises mining insights from machine-generated data such as server logs, but one problem with the default tools is that there is no way to filter the data as it is fed into Splunk. It’s…

StreamSets Data Collector – the First Four Years

Data Integration

By Pat Patterson August 16, 2018

June 2018 marked the fourth anniversary of StreamSets’ founding; here’s a look back at the past four years of StreamSets and the Data Collector product, from the early days in stealth-startup mode, to the recent release of StreamSets Data Collector 3.4.0.

Girish Pancha and Arvind Prabhakar founded StreamSets on June 27th, 2014. Girish had been at Informatica for many years, rising to the level of Chief Product Officer; Arvind had held technical roles at both Informatica and Cloudera. Between them, they had realized that the needs of enterprises were not being met by existing data integration products – for instance, the best practice for many customers ingesting data into Hadoop was laborious manual coding of data processing logic and orchestration using low-level frameworks like Apache Sqoop. They also realized that this would be true for other proliferating data platforms such as Kafka (Kafka Connect), Elastic (Logstash) and cloud infrastructure (Amazon Glue, etc.). The main obstacle they identified was ‘data drift’, the inescapable evolution of data structure, semantics and infrastructure, making data integration difficult and solutions brittle. Data drift exists in the traditional data integration world, but grows and accelerates when dealing with modern data sources and platforms. The founding vision for StreamSets was to ”solve the data drift problem for all modern data architectures”.

Create a Microservice Data Pipeline with StreamSets Data Collector Engine (Tutorial)

Data Integration

By Pat Patterson August 8, 2018

How to Create a Microservice Data Pipeline with StreamSets DataOps Platform

A microservice data pipeline is a lightweight component that implements a relatively small component of a larger system – for example, providing access to user data. A microservice architecture comprises a set of independent microservices, often implemented as RESTful web services communicating via JSON over HTTP, that together implement a system’s functionality, rather than a single monolithic application. Think of an e-commerce web site: we might have separate microservices for searching for inventory, managing the shopping cart, and recommending items based on the shopping cart’s content. Compared to monolithic applications, the microservice approach promotes fine-grained modularity, allowing agile implementation of components by independent teams, which may even be using different technologies. Now, one of those technologies can be StreamSets Data Collector Englne. Data Collector 3.4.0, released earlier this week, introduces microservice data pipelines, with a new REST Service origin and Send Response to Origin destination allowing you to implement RESTful web services completely within the Data Collector Engine.

StreamSets Data Integration Blog

Hadoop meets Blockchain: Trust your (Big) Data

Introducing the StreamSets Test Framework

Building a Real-Time Bike-Share Data Pipeline with StreamSets, Kafka and MapD

StreamSets Congratulates Voya Financial And Other Cloudera Data Impact Award Winners

Series C Funding
Right Where We Knew We’d Be

DataOps in Healthcare

Easy Splunk Integration with StreamSets Data Collector

StreamSets Data Collector – the First Four Years

Create a Microservice Data Pipeline with StreamSets Data Collector Engine (Tutorial)

How to Create a Microservice Data Pipeline with StreamSets DataOps Platform

Stay in Touch

Connect