1*c30bd091SSascha Wildner 2*c30bd091SSascha Wildner#------------------------------------------------------------------------------ 3*c30bd091SSascha Wildner# $File: apache,v 1.1 2017/04/11 14:52:15 christos Exp $ 4*c30bd091SSascha Wildner# apache: file(1) magic for Apache Big Data formats 5*c30bd091SSascha Wildner 6*c30bd091SSascha Wildner# Avro files 7*c30bd091SSascha Wildner0 string Obj Apache Avro 8*c30bd091SSascha Wildner>3 byte x version %d 9*c30bd091SSascha Wildner 10*c30bd091SSascha Wildner# ORC files 11*c30bd091SSascha Wildner# Important information is in file footer, which we can't index to :( 12*c30bd091SSascha Wildner0 string ORC Apache ORC 13*c30bd091SSascha Wildner 14*c30bd091SSascha Wildner# Parquet files 15*c30bd091SSascha Wildner0 string PAR1 Apache Parquet 16*c30bd091SSascha Wildner 17*c30bd091SSascha Wildner# Hive RC files 18*c30bd091SSascha Wildner0 string RCF Apache Hive RC file 19*c30bd091SSascha Wildner>3 byte x version %d 20*c30bd091SSascha Wildner 21*c30bd091SSascha Wildner# Sequence files (and the careless first version of RC file) 22*c30bd091SSascha Wildner 23*c30bd091SSascha Wildner0 string SEQ 24*c30bd091SSascha Wildner>3 byte <6 Apache Hadoop Sequence file version %d 25*c30bd091SSascha Wildner>3 byte >6 Apache Hadoop Sequence file version %d 26*c30bd091SSascha Wildner>3 byte =6 27*c30bd091SSascha Wildner>>5 string org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer Apache Hive RC file version 0 28*c30bd091SSascha Wildner>>3 default x Apache Hadoop Sequence file version 6 29