1ff91a668SPeter Avalos 2ff91a668SPeter Avalos#------------------------------------------------------------------------------ 3*3b9cdfa3SAntonio Huete Jimenez# $File: msooxml,v 1.18 2022/08/16 11:16:39 christos Exp $ 4ff91a668SPeter Avalos# msooxml: file(1) magic for Microsoft Office XML 5ff91a668SPeter Avalos# From: Ralf Brown <ralf.brown@gmail.com> 6ff91a668SPeter Avalos 7ff91a668SPeter Avalos# .docx, .pptx, and .xlsx are XML plus other files inside a ZIP 8ff91a668SPeter Avalos# archive. The first member file is normally "[Content_Types].xml". 9e8af9738SPeter Avalos# but some libreoffice generated files put this later. Perhaps skip 10e8af9738SPeter Avalos# the "[Content_Types].xml" test? 11ff91a668SPeter Avalos# Since MSOOXML doesn't have anything like the uncompressed "mimetype" 12ff91a668SPeter Avalos# file of ePub or OpenDocument, we'll have to scan for a filename 13ff91a668SPeter Avalos# which can distinguish between the three types 14ff91a668SPeter Avalos 156fca56fbSSascha Wildner0 name msooxml 166fca56fbSSascha Wildner>0 string word/ Microsoft Word 2007+ 176fca56fbSSascha Wildner!:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document 18*3b9cdfa3SAntonio Huete Jimenez!:ext docx 196fca56fbSSascha Wildner>0 string ppt/ Microsoft PowerPoint 2007+ 206fca56fbSSascha Wildner!:mime application/vnd.openxmlformats-officedocument.presentationml.presentation 21*3b9cdfa3SAntonio Huete Jimenez!:ext pptx 226fca56fbSSascha Wildner>0 string xl/ Microsoft Excel 2007+ 236fca56fbSSascha Wildner!:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 24*3b9cdfa3SAntonio Huete Jimenez!:ext xlsx 25970935fdSSascha Wildner>0 string visio/ Microsoft Visio 2013+ 266fca56fbSSascha Wildner!:mime application/vnd.ms-visio.drawing.main+xml 27614728caSSascha Wildner>0 string AppManifest.xaml Microsoft Silverlight Application 28614728caSSascha Wildner!:mime application/x-silverlight-app 296fca56fbSSascha Wildner 30ff91a668SPeter Avalos# start by checking for ZIP local file header signature 31ff91a668SPeter Avalos0 string PK\003\004 32e8af9738SPeter Avalos!:strength +10 33ff91a668SPeter Avalos# make sure the first file is correct 346fca56fbSSascha Wildner>0x1E use msooxml 35970935fdSSascha Wildner>0x1E default x 36*3b9cdfa3SAntonio Huete Jimenez>>0x1E regex \\[Content_Types\\]\\.xml|_rels/\\.rels|docProps|customXml 37ff91a668SPeter Avalos# skip to the second local file header 38ff91a668SPeter Avalos# since some documents include a 520-byte extra field following the file 39ff91a668SPeter Avalos# header, we need to scan for the next header 40970935fdSSascha Wildner>>>(18.l+49) search/6000 PK\003\004 41ff91a668SPeter Avalos# now skip to the *third* local file header; again, we need to scan due to a 42ff91a668SPeter Avalos# 520-byte extra field following the file header 43970935fdSSascha Wildner>>>>&26 search/6000 PK\003\004 44ff91a668SPeter Avalos# and check the subdirectory name to determine which type of OOXML 45e8af9738SPeter Avalos# file we have. Correct the mimetype with the registered ones: 466fca56fbSSascha Wildner# https://technet.microsoft.com/en-us/library/cc179224.aspx 47970935fdSSascha Wildner>>>>>&26 use msooxml 48970935fdSSascha Wildner>>>>>&26 default x 496fca56fbSSascha Wildner# OpenOffice/Libreoffice orders ZIP entry differently, so check the 4th file 50970935fdSSascha Wildner>>>>>>&26 search/6000 PK\003\004 51970935fdSSascha Wildner>>>>>>>&26 use msooxml 52*3b9cdfa3SAntonio Huete Jimenez# Some OOXML generators add an extra customXml directory. Check another file. 53*3b9cdfa3SAntonio Huete Jimenez>>>>>>>&26 default x 54*3b9cdfa3SAntonio Huete Jimenez>>>>>>>>&26 search/6000 PK\003\004 55*3b9cdfa3SAntonio Huete Jimenez>>>>>>>>>&26 use msooxml 56*3b9cdfa3SAntonio Huete Jimenez>>>>>>>>>&26 default x Microsoft OOXML 57970935fdSSascha Wildner>>>>>>>&26 default x Microsoft OOXML 586fca56fbSSascha Wildner>>>>>&26 default x Microsoft OOXML 59