xref: /minix3/external/bsd/file/dist/magic/magdir/msooxml (revision 0a6a1f1d05b60e214de2f05a7310ddd1f0e590e7)
1835f6802SDirk Vogt
2835f6802SDirk Vogt#------------------------------------------------------------------------------
3*0a6a1f1dSLionel Sambuc# $File: msooxml,v 1.5 2014/08/05 07:38:45 christos Exp $
4835f6802SDirk Vogt# msooxml:  file(1) magic for Microsoft Office XML
5835f6802SDirk Vogt# From: Ralf Brown <ralf.brown@gmail.com>
6835f6802SDirk Vogt
7835f6802SDirk Vogt# .docx, .pptx, and .xlsx are XML plus other files inside a ZIP
8835f6802SDirk Vogt#   archive.  The first member file is normally "[Content_Types].xml".
9*0a6a1f1dSLionel Sambuc#   but some libreoffice generated files put this later. Perhaps skip
10*0a6a1f1dSLionel Sambuc#   the "[Content_Types].xml" test?
11835f6802SDirk Vogt# Since MSOOXML doesn't have anything like the uncompressed "mimetype"
12835f6802SDirk Vogt#   file of ePub or OpenDocument, we'll have to scan for a filename
13835f6802SDirk Vogt#   which can distinguish between the three types
14835f6802SDirk Vogt
15835f6802SDirk Vogt# start by checking for ZIP local file header signature
16835f6802SDirk Vogt0		string		PK\003\004
17*0a6a1f1dSLionel Sambuc!:strength +10
18835f6802SDirk Vogt# make sure the first file is correct
19*0a6a1f1dSLionel Sambuc>0x1E		regex		\\[Content_Types\\]\\.xml|_rels/\\.rels
20835f6802SDirk Vogt# skip to the second local file header
21835f6802SDirk Vogt# since some documents include a 520-byte extra field following the file
22835f6802SDirk Vogt# header, we need to scan for the next header
23835f6802SDirk Vogt>>(18.l+49)	search/2000	PK\003\004
24835f6802SDirk Vogt# now skip to the *third* local file header; again, we need to scan due to a
25835f6802SDirk Vogt# 520-byte extra field following the file header
26835f6802SDirk Vogt>>>&26		search/1000	PK\003\004
27835f6802SDirk Vogt# and check the subdirectory name to determine which type of OOXML
28*0a6a1f1dSLionel Sambuc# file we have.  Correct the mimetype with the registered ones:
2984d9c625SLionel Sambuc# http://technet.microsoft.com/en-us/library/cc179224.aspx
30835f6802SDirk Vogt>>>>&26		string		word/		Microsoft Word 2007+
3184d9c625SLionel Sambuc!:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document
32835f6802SDirk Vogt>>>>&26		string		ppt/		Microsoft PowerPoint 2007+
3384d9c625SLionel Sambuc!:mime application/vnd.openxmlformats-officedocument.presentationml.presentation
34835f6802SDirk Vogt>>>>&26		string		xl/		Microsoft Excel 2007+
3584d9c625SLionel Sambuc!:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
36835f6802SDirk Vogt>>>>&26		default		x		Microsoft OOXML
37