What is Avro format?

Avro provides data serialization and data exchange for Apache Hadoop. It facilitates the transfer of big data and is not specific to any particular language. The serialization service helps programs serialize data into files that are compact and efficient.

Avro stores the data definition in a JSON format, but the data itself is stored in a binary format, making data storage compact and efficient. Avro files include markers. These markers come in handy for splitting large data sets into subsets suitable for Apache MapReduce processing. Avro does not use code generators to interpret data definition; therefore, it is compatible with scripting languages.

Avro actively supports schema evolutiondata schemes that change over time. These schemes are treated like missing, added, or changed fields. Therefore, old programs can read new data and vice-versa.

Avro includes APIs for Java, Python, Ruby, C, C++, etc. Data stored using Avro can be passed from programs written in different programming languages – this includes data that is passed from programs written in compiled languages to programs written in scripted languages.