Apache Avro Serialization Java Example

posted on Nov 20th, 2016

Apache Avro

Apache Avro is a remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system installed.

2) Apache Avro 1.8.1 libraries (Download Here)

Apache Avro Serialization Java Example

This post describes, how to read the schema by using parsers library and to serialize the data using Avro. The following is a depiction of serializing the data with Avro using parser libraries. Here, emp.avsc is the schema file which we pass as input to Avro utility.

Add these libraries to your java project build path.

avro-1.8.1.jar
avro-tools-1.8.1.jar
log4j-api-2.0-beta9.jar
log4j-core-2.0-beta9.jar

OR - Edit $HOME/.bashrc file by adding the path of avro jar files. In my case these are in /home/hduser/Desktop/AVRO/jars/ folder.

$ sudo gedit $HOME/.bashrc

$HOME/.bashrc file

export CLASSPATH=$CLASSPATH:/home/hduser/Desktop/AVRO/jars/*

Reload your changed $HOME/.bashrc settings

source $HOME/.bashrc

Step 1 - Change the directory to /home/hduser/Desktop/AVRO

$ cd /home/hduser/Desktop/AVRO

Step 2 - Make a new directory schema in /home/hduser/Desktop/AVRO

$ mkdir /home/hduser/Desktop/AVRO/schema

Step 3 - Change the directory to /home/hduser/Desktop/AVRO/schema

$ cd /home/hduser/Desktop/AVRO/schema

Step 4 - Create a new avro schema emp.avsc in /home/hduser/Desktop/AVRO/schema. It creates a new emp.avsc file if it doesnt exists and opens for editing.

$ gedit emp.avsc

Step 5 - Add these following lines to emp.avsc file. Save and close it.

{
"type": "record",
"name": "emp",
"fields": [
{"name": "name", "type": "string"},
{"name": "id", "type": "int"},
{"name": "salary", "type": "int"},
{"name": "age", "type": "int"},
{"name": "address", "type": "string"}
]
}

SerializeNew.java

import java.io.File;
import java.io.IOException;
import org.apache.avro.Schema;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumWriter;

public class SerializeNew {
	public static void main(String args[]) throws IOException {
		// Instantiating the Schema.Parser class.
		Schema schema = new Schema.Parser().parse(new File(
				"/home/hduser/Desktop/AVRO/schema/emp.avsc"));
		// Instantiating the GenericRecord class.
		GenericRecord e1 = new GenericData.Record(schema);
		// Insert data according to schema
		e1.put("name", "ramu");
		e1.put("id", 001);
		e1.put("salary", 30000);
		e1.put("age", 25);
		e1.put("address", "chenni");
		GenericRecord e2 = new GenericData.Record(schema);
		e2.put("name", "rahman");
		e2.put("id", 002);
		e2.put("salary", 35000);
		e2.put("age", 30);
		e2.put("address", "Delhi");
		DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(
				schema);
		DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(
				datumWriter);
		dataFileWriter.create(schema, new File(
				"/home/hduser/Desktop/AVRO/mydata.txt"));
		dataFileWriter.append(e1);
		dataFileWriter.append(e2);
		dataFileWriter.close();
		System.out.println("data successfully serialized");
	}
}

I have created emp.avsc schema in /home/hduser/Desktop/AVRO/schema/ folder. Change emp.avsc file path if you have created in someother folder.

Schema schema = new Schema.Parser().parse(new File("/home/hduser/Desktop/AVRO/schema/emp.avsc"));

I'm storing serialized data in mydata.txt in /home/hduser/Desktop/AVRO/ folder. Change mydata.txt file path if you want to store in someother folder.

dataFileWriter.create(schema, new File("/home/hduser/Desktop/AVRO/mydata.txt"));

Step 6 - Compile and execute SerializeNew.java program.

javac SerializeNew.java
java SerializeNew

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Avro Deserialization Java Example