2017 StreamSets Community Survey

[heading text_color=”text-light” header_align=”center” margin_top=”50″ margin_bottom=”0″ scroll_animation=”fadeIn”]StreamSets Community Survey 2017[/heading][bordered_divider divider_color=”#ffffff” divider_height=”3″ divider_width=”60″ scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]

Understanding Modern Data in Motion

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]

In March 2017, we conducted our first community survey in order to better understand how and why customers are creating data pipelines. Responses came from data engineers, data scientists and developers working in a range of industries from banking to education and companies of all sizes.

Results revealed the StreamSets community uses StreamSets Data Collector™ primarily for integrating streaming and batch data for immediate use in both big data and traditional applications.

Only 18 months after it’s launch, StreamSets Data Collector has been downloaded more than 150,000 times and roughly 20% of the Fortune 500 have been identified amongst these users.

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”25″ sm_height=”25″ md_height=”25″ lg_height=”25″][heading header_type=”h3″ header_weight=”bold” extra_classes=”widget-title” margin_top=”0″]Type of Data Movement[/heading][heading header_type=”h4″ header_size=”big” scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]72% are using StreamSets Data Collector for streaming data[/heading]

According to the survey, real-time use of streaming data is moving ahead quickly. A full 72 percent of respondents are using StreamSets Data Collector for streaming data. Of these, two-thirds (48 percent) are integrating batch and streaming data within their pipelines while the remainder (24 percent) are streaming only. Twenty-eight percent are employing StreamSets solely for movement of batch data.

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”25″ sm_height=”25″ md_height=”25″ lg_height=”25″][heading header_type=”h3″ header_weight=”bold” extra_classes=”widget-title” margin_top=”0″]Data Movement Use Case[/heading][heading header_type=”h4″ header_size=”big” scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]84% use StreamSets Data Collector for both big data applications and traditional data analysis[/heading]

A large majority (84%) of respondents are using StreamSets Data Collector for both new big data applications and traditional data analysis. Traditional, horizontal capabilities include dashboards (88 percent), interactive SQL queries (64 percent) and data warehouse (51 percent). Big data applications include customer insights (50 percent), IoT (23 percent) and cybersecurity (10 percent).

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”25″ sm_height=”25″ md_height=”25″ lg_height=”25″][heading header_type=”h3″ header_weight=”bold” extra_classes=”widget-title” margin_top=”0″]Data Sources[/heading][heading header_type=”h4″ header_size=”big” scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]Transactional databases were most popular, with newer Log and IoT source types emerging quickly[/heading]

Fitting with popularity of traditional, transactional data sources were most the popular used and analytics databases sources came in 3rd. Newer source types comprising interaction data such as log files and clickstream data were also popular. IoT sources are emerging quickly and already in use by roughly one in 5 respondents.

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”25″ sm_height=”25″ md_height=”25″ lg_height=”25″][heading header_type=”h3″ header_weight=”bold” extra_classes=”widget-title” margin_top=”0″]Data Destinations[/heading][heading header_type=”h4″ header_size=”big” scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]50% are moving data into multiple destination types[/heading]

While it’s no surprise that Hadoop was cited as the most popular destination for data pipeline, search-oriented stores, such as Apache Solr and Elasticsearch, were also significantly represented (44 percent). Spark is already proving to be quite popular (26 percent), already closing in on NoSQL data stores (28 percent). Traditional databases are used by 32 percent of respondents. Approximately one-half of respondents are moving data into multiple destination types.

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”25″ sm_height=”25″ md_height=”25″ lg_height=”25″][heading header_type=”h3″ header_weight=”bold” extra_classes=”widget-title” margin_top=”0″]Location of Data Stores[/heading][heading header_type=”h4″ header_size=”big” scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]66% use StreamSets Data Collector in a private or public cloud[/heading]

When it comes to the location of the data, cloud environments are used by two-thirds of the enterprises surveyed. Sixty-six percent useStreamSets Data Collector in a public or private cloud while 58 percent use StreamSets Data Collector on premises. Pointing to a hybrid reality, only 12 percent of all enterprises surveyed were performing data movement solely within a public cloud environment. Interestingly, nearly one-quarter (22 percent) of respondents listed cloud data migration as one of their use cases.

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”25″ sm_height=”25″ md_height=”25″ lg_height=”25″][heading header_type=”h3″ header_weight=”bold” extra_classes=”widget-title” margin_top=”0″]Required Time to Analysis[/heading][heading header_type=”h4″ header_size=”big” scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]56% require analysis of the incoming data within minutes[/heading]

Over half (56%) require analysis of the incoming data within minutes, with 15 percent needing analysis performed within seconds of arrival.

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”25″ sm_height=”25″ md_height=”25″ lg_height=”25″][heading header_type=”h3″ header_weight=”bold” extra_classes=”widget-title” margin_top=”0″]Favorite Programming Language[/heading][heading header_type=”h4″ header_size=”big” scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]60% prefer programming in Python and Java[/heading]

We asked our community about their favorite programming language. Both Python and Java took top honors followed by JavaScript and Scala. Within “Others” was Go and, honestly – COBOL. Clearly our community can snark with the best of them!

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[divider xs_height=”25″ sm_height=”25″ md_height=”25″ lg_height=”25″][heading header_type=”h3″ header_weight=”bold” extra_classes=”widget-title” margin_top=”0″]Favorite Movie About Artificial Intelligence[/heading][heading header_type=”h4″ header_size=”big” scroll_animation=”fadeIn” scroll_animation_delay=”0.1″]”The Matrix” was a clear winner chosen by 30%, followed by “Blade Runner” and “The Terminator”[/heading]

For fun, we asked respondents for their favorite AI movie. “The Matrix,” chosen by 30 percent of respondents, was the runaway winner, followed by “Blade Runner” (14 percent) and “The Terminator” (11 percent). Respondents also gave their reasons behind their choice. Our favorite comments follow:

[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
null

Her

‘Her' [because] I’m already pretty close to dating my computer […] and my project name is also ‘Her'.

Anonymous Respondent
null

Bladerunner

‘Bladerunner' [because] Philip K. Dick > Arthur C. Clarke > The rest

Anonymous Respondent
null

The Matrix

‘The Matrix' [because] it’s open source,[…] and I had a crush on Keanu Reaves, ok!?

Anonymous Respondent
null

Westworld

‘Westworld' [because] I like that it showed the robots being maintained and the ops behind the whole thing.

Anonymous Respondent
[divider xs_height=”30″ sm_height=”30″ md_height=”60″ lg_height=”60″]
[heading text_color=”text-light” header_size=”bigger” header_weight=”light” header_align=”center” margin_top=”60″ margin_bottom=”0″ scroll_animation=”fadeIn”]

Get started today

[/heading]

StreamSets runs on Linux or Mac OS X

[button type=”danger” label=”Download Open Source” link=”/opensource” margin_top=”0″ margin_bottom=”60″ scroll_animation=”fadeIn” scroll_animation_delay=”0.2″]
Receive Updates

Receive Updates

Join our mailing list to receive the latest news from StreamSets.

You have Successfully Subscribed!

Pin It on Pinterest