Angel Alvarado is a senior software engineer at One Degree, a San Francisco-based non-profit, and also helps run the Molanco data engineering community. In his spare time, Angel enjoys playing Minecraft with his 11 year-old-cousin. Recently, Angel, found a fun way to combine his gaming with data engineering. This blog entry, reposted from the original with Angel's kind permission, picks up the story…
Data Engineering can get really complex really quick and being aware of the hundreds of tools and data platforms in the industry can get very overwhelming. The following project is about how to use three data engineering tools to visualize data in a video game, it aims to solve a common data engineering problem with a twist to make it fun and entertaining.
I got involved with Minecraft thanks to my 11 year old cousin who lives in Mexico City. He actually taught me how to play Minecraft and inspired me to combine it with data engineering.
After playing for a while, I realized that I actually enjoyed it and it was a great way for my little cousin and I to connect even though 2000 miles separated us.
Eventually, I decided to combine data engineering with this video game and for this I used the following data engineering tools:
The plan for this project was to create a map of the world where we could see the location of users visiting a website in real time. Luckily, StreamSets helped us to get ahead of the game and we were able to do this really quickly using StreamSets Data Collector!
Here's what the project looks like when it's running:
Apache Kafka: Here at the Molanco Data Engineering community, this is our preferred tool when it comes to processing events in real time. If you are looking into building publisher/subscriber distributed systems, this is a great piece of software to start with. Lately, some of our members have been moving away from Kinesis and instead using Kafka for their data architectures.
StreamSets Data Collector: If you are a fan of ETLing and love developing customized ETL processes, I'd encourage you to look at StreamSets. We've been using this tool for 1.5 years now. It's open source and seeing how it has matured so quickly gives us hope that it's here to stay. SDC provides dozens of connectors out of the box, connectors to Hadoop, Hive, ElasticSearch, SQL databases, Jython processors and much more.
Docker: This may not be news for anybody, but we believe microservices and containers are where the industry is heading. If you are in the DevOps or/and Data Engineering worlds and you are still using VMs, it's about time to explore Docker, it's going to be worth it. Docker was used for this project to allow anyone to replicate it with just one command:
$ docker compose up.
This project was presented at:
The easiest way to replicate this project is by using the code in Github: Dockerized project in Github. Feel free to reach out with any questions.
Have you created something fun with StreamSets? Get in touch via the comments or email firstname.lastname@example.org – we'd love to feature your project here on the blog!