Apache Avro Deserialization Java Example

posted on Nov 20th, 2016

Apache Avro

Apache Avro is a remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system installed.

2) Apache Avro 1.8.1 libraries (Download Here)

3) Serialized data file (How to serialize data using Avro)

Apache Avro Deserialization Java Example

This post describes how to read the schema by using parser library and Deserialize the data using Avro.

Add these libraries to your java project build path.

avro-1.8.1.jar
avro-tools-1.8.1.jar
log4j-api-2.0-beta9.jar
log4j-core-2.0-beta9.jar

OR - Edit $HOME/.bashrc file by adding the path of avro jar files. In my case these are in /home/hduser/Desktop/AVRO/jars/ folder.

$ sudo gedit $HOME/.bashrc

$HOME/.bashrc file

export CLASSPATH=$CLASSPATH:/home/hduser/Desktop/AVRO/jars/*

Reload your changed $HOME/.bashrc settings

source $HOME/.bashrc

DeserializeNew.java

import java.io.File;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;

public class DeserializeNew {
	public static void main(String args[]) throws Exception {
		// Instantiating the Schema.Parser class.
		Schema schema = new Schema.Parser().parse(new File(
				"/home/hduser/Desktop/AVRO/schema/emp.avsc"));
		DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(
				schema);
		DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(
				new File("/home/hduser/Desktop/AVRO/mydata.txt"),
				datumReader);
		GenericRecord emp = null;
		while (dataFileReader.hasNext()) {
			emp = dataFileReader.next(emp);
			System.out.println(emp);
		}
		System.out.println("data deserialized wwoooooo...!");
	}
}

I have created emp.avsc schema in /home/hduser/Desktop/AVRO/schema/ folder. Change emp.avsc file path if you have created in someother folder.

Schema schema = new Schema.Parser().parse(new File("/home/hduser/Desktop/AVRO/schema/emp.avsc"));

I've stored serialized data in mydata.txt in /home/hduser/Desktop/AVRO/ folder. Change mydata.txt file path if you have stored in someother folder.

dataFileWriter.create(schema, new File("/home/hduser/Desktop/AVRO/mydata.txt"));

Compile and execute DeserializeNew.java program.

javac DeserializeNew.java
java DeserializeNew  

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Avro Serialization Java Example