Summary -
In this topic, we described about the below sections -
Event serializer is an interface that allows random serialization of an event. An event serializer is the mechanism by which a Flume event is converted into another format for output. The file_roll sink and the hdfs sink both supports the EventSerializer interface.
The details of the EventSerializers are provided below.
- Body text serializer
- Text with headers serializer
- Avro event serializer
Let us discuss each serializer in detail.
Body text serializer -
The same event data writes to the output stream without any modification or transformation. The event headers are ignored. This is a by default serializer if any not specified. If the headers are exists on the event, those will be discarded.
<agent_name>.sinks = <sink_name>
<agent_name>.sinks.<sink-name>.type = file_roll
<agent_name>.sinks.<sink-name>.channel = <channel-name>
<agent_name>.sinks.<sink-name>.sink.directory = /var/log/flume
<agent_name>.sinks.<sink-name>.sink.serializer = text
<agent_name>.sinks.<sink-name>.sink.serializer.appendNewline = false
Example for agent named agt -
agt.sinks = sn1
agt.sinks.sn1.type = file_roll
agt.sinks.sn1.channel = chn1
agt.sinks.sn1.sink.directory = /var/log/flume
agt.sinks.sn1.sink.serializer = text
agt.sinks.sn1.sink.serializer.appendNewline = false
Configuration options are as follows -
Property Name | Default | Description |
---|---|---|
appendNewline | true | Whether a newline will be appended to each event at write time. |
Text with headers serializer -
Text with header serializer outputs both the headers and the body. Text with header serializer allows saving the header along with body. The header will write to output stream first then body and finally a newline.
<agent_name>.sinks = <sink_name>
<agent_name>.sinks.<sink-name>.type = file_roll
<agent_name>.sinks.<sink-name>.channel = <channel-name>
<agent_name>.sinks.<sink-name>.sink.directory = /var/log/flume
<agent_name>.sinks.<sink-name>.sink.serializer = text_with_header
<agent_name>.sinks.<sink-name>.sink.serializer.appendNewline = false
Example for agent named agt -
agt.sinks = sn1
agt.sinks.sn1.type = file_roll
agt.sinks.sn1.channel = chn1
agt.sinks.sn1.sink.directory = /var/log/flume
agt.sinks.sn1.sink.serializer = text_with_header
agt.sinks.sn1.sink.serializer.appendNewline = false
Configuration options are as follows -
Property Name | Default | Description |
---|---|---|
appendNewline | true | Whether a newline will be appended to each event at write time. |
Avro event serializer -
The record schema is configurable in Avro event serializer. This serializes Flume events into an Avro container file like the “Flume Event” Avro Event Serializer.
avro_event serializer that can be used to create an Avro representation of the event. This is a great file format that has a lot of advantages over platform- and language-specific serialization formats.
Example for agent named agt -
agt.sinks.sn1.type = hdfs
agt.sinks.sn1.channel = chn
agt.sinks.sn1.hdfs.path = /flume/events/%y-%m-%d/%H%M%S
agt.sinks.sn1.serializer
= org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
agt.sinks.sn1.serializer.compressionCodec = snappy
agt.sinks.sn1.serializer.schemaURL
= hdfs://namenode/path/to/schema.avsc
Configuration options are as follows -
Property Name | Default | Description |
---|---|---|
syncIntervalBytes | 2048000 | Approx avro sync interval in bytes. |
compressionCodec | null | Avro compression codec. |
schemaURL | null | Avro schema URL. |