skip to Main Content

StreamSets User Community Survey Reveals Insights About Real-World Data in Motion

Prevalence of Streaming Data and Mix of Traditional and Big Data Applications Among the Findings

SAN FRANCISCO – March 15, 2017 – StreamSets Inc., a provider of innovative data in motion middleware, today unveiled results from its first user community survey. The survey, conducted earlier this month, yields insights into real-world data movement as reported by more than 100 enterprises across a range of industries from banking to education. Results revealed the StreamSets community uses StreamSets Data Collector™ primarily for integrating streaming and batch data for immediate use in both big data and traditional applications. StreamSets Data Collector has now been downloaded more than 150,000 times, and 20 percent of the Fortune 500 have been identified among these users.

Full Stream Ahead

According to the survey, use of streaming data is moving ahead quickly, with 72 percent of respondents using StreamSets Data Collector for streaming applications. Of these, two-thirds (48 percent) are integrating batch and streaming data within their pipelines, while the remaining one-third (24 percent) are streaming only. Twenty-eight percent are employing StreamSets solely for movement of batch data.

Consistent with use of streaming data, most respondents demanded very quick data consumption, with over one-half (56 percent) requiring analysis of the incoming data within minutes and 15 percent needing analysis performed within seconds of arrival.

For Both Old School and New School Use Cases

In keeping with StreamSets Data Collector’s design as general-purpose data-in-motion middleware, a large majority (84 percent) of responding enterprises use it for big data applications as well as traditional data analysis. Traditional uses include dashboards (88 percent), interactive SQL queries (64 percent) and data warehouse (51 percent). Big data applications include customer insights (50 percent), IoT (23 percent) and cybersecurity (10 percent).

While it’s no surprise that Hadoop was cited as the most popular destination for data pipeline, search-oriented stores, such as Apache Solr and Elasticsearch, were also significantly represented (44 percent). Spark shows strong penetration at 26 percent, closing in on NoSQL data stores (28 percent). Traditional databases are used by 32 percent of respondents. Approximately one-half of respondents are moving data into multiple destination types.

Data in the Cloud Reaching Critical Mass

When it comes to the location of the data, cloud environments are used by two-thirds of the enterprises surveyed. Sixty-six percent use StreamSets Data Collector in a public or private cloud while 58 percent use StreamSets Data Collector on premises. Pointing to a hybrid reality, only 12 percent of all enterprises surveyed were performing data movement solely within a public cloud environment. Interestingly, nearly one-quarter (22 percent) of respondents listed cloud data migration as one of their use cases.

“In the 18 months since our launch, we at StreamSets have been humbled by the breadth and inventiveness of our open source community,” said Pat Patterson, Community Champion at StreamSets. “We built our technology for general use and ensured its flexibility so that it could evolve with the big data landscape. We’re delighted to see that play out for our customer community as they re-platform traditional workloads and bring brand-new big data applications online.”

To gain a bit of insight into the personality of our user community, we asked respondents for their favorite AI movie. “The Matrix,” chosen by 30 percent of respondents, was the runaway winner, followed by “Blade Runner” (14 percent) and “The Terminator” (11 percent). Respondents also gave their reasons behind their choice. Some of the more interesting answers included the following:

  • “‘The Matrix,’ [because] it’s open source.”
  • “‘Westworld,’” [because] I like that it showed the robots being maintained and the ops behind the whole thing.”
  • “‘Her,’ [because] I’m already pretty close to dating my computer.”

Read more from the user community survey.

About StreamSets

StreamSets provides innovative data-in-motion middleware that reinvents how enterprises deliver timely and trustworthy data to their critical applications. StreamSets Data Collector™ is award-winning, open source software for the development of any-to-any dataflows. StreamSets Dataflow Performance Manager (DPM™) provides a comprehensive control panel for managing the day-to-day operation of complex dataflow topologies. Founded by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, a former engineering leader at Cloudera, StreamSets is backed by top-tier Silicon Valley venture capital firms, including Accel Partners, Battery Ventures and New Enterprise Associates (NEA). For more information, visit

Media Contact:

Brittney Timmins

BOCA Communications

Back To Top