Looking at the Whole File data format, introduced last month in SDC 184.108.40.206, inspired me… Our latest tutorial, Creating a Custom StreamSets Processor, explains how to extract metadata tags from image files as they are ingested, adding them to records as fields.
With the help of Drew Noakes‘ excellent metadata-extractor you can access Exif and other metadata in a wide variety of image file types. In the tutorial, I give a simple example of reading and writing record fields, then show you how to integrate the metadata-extractor library, access whole file content, and write the resulting metadata tags as record fields. Having this metadata in the SDC record is incredibly useful. Want to search your photos by their location? As I describe in the tutorial, you can easily write the image's GPS coordinates, as well as the filename, to a database table.
Do you have a custom processor in mind for SDC? Follow the tutorial to get started, and let us know how it goes in the comments!