Avro Data Format

Pipeline stages can read and write Avro data.

Reading Avro Data

When reading Avro data, file- and object-based origins, such as the Directory and Amazon S3 origins, generate a record for every Avro record within the processed file or object.

Processors that read Avro data generate records as described in the processor overview.

Generated records include the Avro schema in the avroSchema record header attribute. They also include a precision and scale field attribute for each Decimal field.

You can configure most stages to use Avro schemas stored in one of the following locations:
  • An avroSchema record header attribute
  • A stage configuration property
  • Confluent Schema Registry

Some stages require that the Avro schema be stored in a particular location.

Some stages read data compressed by Avro-supported compression codecs without requiring additional configuration. You can configure some stages to read data compressed by other codecs.

For details on how each stage reads Avro data, see "Data Formats" in the stage documentation.

Writing Avro Data

When writing Avro data, destinations and processors write the data based on an Avro schema. The Avro schema can be located in one of the following locations:
  • An avroSchema record header attribute
  • A stage configuration property
  • Confluent Schema Registry

Some stages automatically include the Avro schema in the output. Other stages can be configured to include the Avro schema in the output. You can compress the output data using an Avro-supported compression codec.

For details on how each stage writes Avro data, see "Data Formats" in the destination documentation.