xref: /netbsd-src/external/bsd/file/dist/doc/magic.5 (revision a536ee5124e62c9a0051a252f7833dc8f50f44c9)
1.\"	$NetBSD: magic.5,v 1.7 2012/02/22 17:53:50 christos Exp $
2.\"
3.\" $File: magic.man,v 1.71 2011/12/07 11:58:24 rrt Exp $
4.Dd April 20, 2011
5.Dt MAGIC 5
6.Os
7.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems.
8.Sh NAME
9.Nm magic
10.Nd file command's magic pattern file
11.Sh DESCRIPTION
12This manual page documents the format of the magic file as
13used by the
14.Xr file 1
15command, version 5.11.
16The
17.Xr file 1
18command identifies the type of a file using,
19among other tests,
20a test for whether the file contains certain
21.Dq "magic patterns" .
22The file
23.Pa /usr/share/misc/magic
24specifies what patterns are to be tested for, what message or
25MIME type to print if a particular pattern is found,
26and additional information to extract from the file.
27.Pp
28Each line of the file specifies a test to be performed.
29A test compares the data starting at a particular offset
30in the file with a byte value, a string or a numeric value.
31If the test succeeds, a message is printed.
32The line consists of the following fields:
33.Bl -tag -width ".Dv message"
34.It Dv offset
35A number specifying the offset, in bytes, into the file of the data
36which is to be tested.
37.It Dv type
38The type of the data to be tested.
39The possible values are:
40.Bl -tag -width ".Dv lestring16"
41.It Dv byte
42A one-byte value.
43.It Dv short
44A two-byte value in this machine's native byte order.
45.It Dv long
46A four-byte value in this machine's native byte order.
47.It Dv quad
48An eight-byte value in this machine's native byte order.
49.It Dv float
50A 32-bit single precision IEEE floating point number in this machine's native byte order.
51.It Dv double
52A 64-bit double precision IEEE floating point number in this machine's native byte order.
53.It Dv string
54A string of bytes.
55The string type specification can be optionally followed
56by /[WwcCtb]*.
57The
58.Dq W
59flag compacts whitespace in the target, which must
60contain at least one whitespace character.
61If the magic has
62.Dv n
63consecutive blanks, the target needs at least
64.Dv n
65consecutive blanks to match.
66The
67.Dq w
68flag treats every blank in the magic as an optional blank.
69The
70.Dq c
71flag specifies case insensitive matching: lower case
72characters in the magic match both lower and upper case characters in the
73target, whereas upper case characters in the magic only match upper case
74characters in the target.
75The
76.Dq C
77flag specifies case insensitive matching: upper case
78characters in the magic match both lower and upper case characters in the
79target, whereas lower case characters in the magic only match upper case
80characters in the target.
81To do a complete case insensitive match, specify both
82.Dq c
83and
84.Dq C .
85The
86.Dq t
87flag forces the test to be done for text files, while the
88.Dq b
89flag forces the test to be done for binary files.
90.It Dv pstring
91A Pascal-style string where the first byte/short/int is interpreted as the an
92unsigned length.
93The length defaults to byte and can be specified as a modifier.
94The following modifiers are supported:
95.Bl -tag -compact -width B
96.It B
97A byte length (default).
98.It H
99A 2 byte big endian length.
100.It h
101A 2 byte big little length.
102.It L
103A 4 byte big endian length.
104.It l
105A 4 byte big little length.
106.It J
107The length includes itself in its count.
108.El
109The string is not NUL terminated.
110.Dq J
111is used rather than the more
112valuable
113.Dq I
114because this type of length is a feature of the JPEG
115format.
116.It Dv date
117A four-byte value interpreted as a UNIX date.
118.It Dv qdate
119A eight-byte value interpreted as a UNIX date.
120.It Dv ldate
121A four-byte value interpreted as a UNIX-style date, but interpreted as
122local time rather than UTC.
123.It Dv qldate
124An eight-byte value interpreted as a UNIX-style date, but interpreted as
125local time rather than UTC.
126.It Dv beid3
127A 32-bit ID3 length in big-endian byte order.
128.It Dv beshort
129A two-byte value in big-endian byte order.
130.It Dv belong
131A four-byte value in big-endian byte order.
132.It Dv bequad
133An eight-byte value in big-endian byte order.
134.It Dv befloat
135A 32-bit single precision IEEE floating point number in big-endian byte order.
136.It Dv bedouble
137A 64-bit double precision IEEE floating point number in big-endian byte order.
138.It Dv bedate
139A four-byte value in big-endian byte order,
140interpreted as a Unix date.
141.It Dv beqdate
142An eight-byte value in big-endian byte order,
143interpreted as a Unix date.
144.It Dv beldate
145A four-byte value in big-endian byte order,
146interpreted as a UNIX-style date, but interpreted as local time rather
147than UTC.
148.It Dv beqldate
149An eight-byte value in big-endian byte order,
150interpreted as a UNIX-style date, but interpreted as local time rather
151than UTC.
152.It Dv bestring16
153A two-byte unicode (UCS16) string in big-endian byte order.
154.It Dv leid3
155A 32-bit ID3 length in little-endian byte order.
156.It Dv leshort
157A two-byte value in little-endian byte order.
158.It Dv lelong
159A four-byte value in little-endian byte order.
160.It Dv lequad
161An eight-byte value in little-endian byte order.
162.It Dv lefloat
163A 32-bit single precision IEEE floating point number in little-endian byte order.
164.It Dv ledouble
165A 64-bit double precision IEEE floating point number in little-endian byte order.
166.It Dv ledate
167A four-byte value in little-endian byte order,
168interpreted as a UNIX date.
169.It Dv leqdate
170An eight-byte value in little-endian byte order,
171interpreted as a UNIX date.
172.It Dv leldate
173A four-byte value in little-endian byte order,
174interpreted as a UNIX-style date, but interpreted as local time rather
175than UTC.
176.It Dv leqldate
177An eight-byte value in little-endian byte order,
178interpreted as a UNIX-style date, but interpreted as local time rather
179than UTC.
180.It Dv lestring16
181A two-byte unicode (UCS16) string in little-endian byte order.
182.It Dv melong
183A four-byte value in middle-endian (PDP-11) byte order.
184.It Dv medate
185A four-byte value in middle-endian (PDP-11) byte order,
186interpreted as a UNIX date.
187.It Dv meldate
188A four-byte value in middle-endian (PDP-11) byte order,
189interpreted as a UNIX-style date, but interpreted as local time rather
190than UTC.
191.It Dv indirect
192Starting at the given offset, consult the magic database again.
193.It Dv regex
194A regular expression match in extended POSIX regular expression syntax
195(like egrep).
196Regular expressions can take exponential time to process, and their
197performance is hard to predict, so their use is discouraged.
198When used in production environments, their performance
199should be carefully checked.
200The type specification can be optionally followed by
201.Dv /[c][s] .
202The
203.Dq c
204flag makes the match case insensitive, while the
205.Dq s
206flag update the offset to the start offset of the match, rather than the end.
207The regular expression is tested against line
208.Dv N + 1
209onwards, where
210.Dv N
211is the given offset.
212Line endings are assumed to be in the machine's native format.
213.Dv ^
214and
215.Dv $
216match the beginning and end of individual lines, respectively,
217not beginning and end of file.
218.It Dv search
219A literal string search starting at the given offset.
220The same modifier flags can be used as for string patterns.
221The modifier flags (if any) must be followed by
222.Dv /number
223the range, that is, the number of positions at which the match will be
224attempted, starting from the start offset.
225This is suitable for
226searching larger binary expressions with variable offsets, using
227.Dv \e
228escapes for special characters.
229The offset works as for regex.
230.It Dv default
231This is intended to be used with the test
232.Em x
233(which is always true) and a message that is to be used if there are
234no other matches.
235.El
236.Pp
237Each top-level magic pattern (see below for an explanation of levels)
238is classified as text or binary according to the types used.
239Types
240.Dq regex
241and
242.Dq search
243are classified as text tests, unless non-printable characters are used
244in the pattern.
245All other tests are classified as binary.
246A top-level
247pattern is considered to be a test text when all its patterns are text
248patterns; otherwise, it is considered to be a binary pattern.
249When
250matching a file, binary patterns are tried first; if no match is
251found, and the file looks like text, then its encoding is determined
252and the text patterns are tried.
253.Pp
254The numeric types may optionally be followed by
255.Dv \*[Am]
256and a numeric value,
257to specify that the value is to be AND'ed with the
258numeric value before any comparisons are done.
259Prepending a
260.Dv u
261to the type indicates that ordered comparisons should be unsigned.
262.It Dv test
263The value to be compared with the value from the file.
264If the type is
265numeric, this value
266is specified in C form; if it is a string, it is specified as a C string
267with the usual escapes permitted (e.g. \en for new-line).
268.Pp
269Numeric values
270may be preceded by a character indicating the operation to be performed.
271It may be
272.Dv = ,
273to specify that the value from the file must equal the specified value,
274.Dv \*[Lt] ,
275to specify that the value from the file must be less than the specified
276value,
277.Dv \*[Gt] ,
278to specify that the value from the file must be greater than the specified
279value,
280.Dv \*[Am] ,
281to specify that the value from the file must have set all of the bits
282that are set in the specified value,
283.Dv ^ ,
284to specify that the value from the file must have clear any of the bits
285that are set in the specified value, or
286.Dv ~ ,
287the value specified after is negated before tested.
288.Dv x ,
289to specify that any value will match.
290If the character is omitted, it is assumed to be
291.Dv = .
292Operators
293.Dv \*[Am] ,
294.Dv ^ ,
295and
296.Dv ~
297don't work with floats and doubles.
298The operator
299.Dv !\&
300specifies that the line matches if the test does
301.Em not
302succeed.
303.Pp
304Numeric values are specified in C form; e.g.
305.Dv 13
306is decimal,
307.Dv 013
308is octal, and
309.Dv 0x13
310is hexadecimal.
311.Pp
312For string values, the string from the
313file must match the specified string.
314The operators
315.Dv = ,
316.Dv \*[Lt]
317and
318.Dv \*[Gt]
319(but not
320.Dv \*[Am] )
321can be applied to strings.
322The length used for matching is that of the string argument
323in the magic file.
324This means that a line can match any non-empty string (usually used to
325then print the string), with
326.Em \*[Gt]\e0
327(because all non-empty strings are greater than the empty string).
328.Pp
329The special test
330.Em x
331always evaluates to true.
332.It Dv message
333The message to be printed if the comparison succeeds.
334If the string contains a
335.Xr printf 3
336format specification, the value from the file (with any specified masking
337performed) is printed using the message as the format string.
338If the string begins with
339.Dq \eb ,
340the message printed is the remainder of the string with no whitespace
341added before it: multiple matches are normally separated by a single
342space.
343.El
344.Pp
345An APPLE 4+4 character APPLE creator and type can be specified as:
346.Bd -literal -offset indent
347!:apple	CREATYPE
348.Ed
349.Pp
350A MIME type is given on a separate line, which must be the next
351non-blank or comment line after the magic line that identifies the
352file type, and has the following format:
353.Bd -literal -offset indent
354!:mime	MIMETYPE
355.Ed
356.Pp
357i.e. the literal string
358.Dq !:mime
359followed by the MIME type.
360.Pp
361An optional strength can be supplied on a separate line which refers to
362the current magic description using the following format:
363.Bd -literal -offset indent
364!:strength OP VALUE
365.Ed
366.Pp
367The operand
368.Dv OP
369can be:
370.Dv + ,
371.Dv - ,
372.Dv * ,
373or
374.Dv /
375and
376.Dv VALUE
377is a constant between 0 and 255.
378This constant is applied using the specified operand
379to the currently computed default magic strength.
380.Pp
381Some file formats contain additional information which is to be printed
382along with the file type or need additional tests to determine the true
383file type.
384These additional tests are introduced by one or more
385.Em \*[Gt]
386characters preceding the offset.
387The number of
388.Em \*[Gt]
389on the line indicates the level of the test; a line with no
390.Em \*[Gt]
391at the beginning is considered to be at level 0.
392Tests are arranged in a tree-like hierarchy:
393if the test on a line at level
394.Em n
395succeeds, all following tests at level
396.Em n+1
397are performed, and the messages printed if the tests succeed, until a line
398with level
399.Em n
400(or less) appears.
401For more complex files, one can use empty messages to get just the
402"if/then" effect, in the following way:
403.Bd -literal -offset indent
4040      string   MZ
405\*[Gt]0x18  leshort  \*[Lt]0x40   MS-DOS executable
406\*[Gt]0x18  leshort  \*[Gt]0x3f   extended PC executable (e.g., MS Windows)
407.Ed
408.Pp
409Offsets do not need to be constant, but can also be read from the file
410being examined.
411If the first character following the last
412.Em \*[Gt]
413is a
414.Em \&(
415then the string after the parenthesis is interpreted as an indirect offset.
416That means that the number after the parenthesis is used as an offset in
417the file.
418The value at that offset is read, and is used again as an offset
419in the file.
420Indirect offsets are of the form:
421.Em (( x [.[bislBISL]][+\-][ y ]) .
422The value of
423.Em x
424is used as an offset in the file.
425A byte, id3 length, short or long is read at that offset depending on the
426.Em [bislBISLm]
427type specifier.
428The capitalized types interpret the number as a big endian
429value, whereas the small letter versions interpret the number as a little
430endian value;
431the
432.Em m
433type interprets the number as a middle endian (PDP-11) value.
434To that number the value of
435.Em y
436is added and the result is used as an offset in the file.
437The default type if one is not specified is long.
438.Pp
439That way variable length structures can be examined:
440.Bd -literal -offset indent
441# MS Windows executables are also valid MS-DOS executables
4420           string  MZ
443\*[Gt]0x18       leshort \*[Lt]0x40   MZ executable (MS-DOS)
444# skip the whole block below if it is not an extended executable
445\*[Gt]0x18       leshort \*[Gt]0x3f
446\*[Gt]\*[Gt](0x3c.l)  string  PE\e0\e0  PE executable (MS-Windows)
447\*[Gt]\*[Gt](0x3c.l)  string  LX\e0\e0  LX executable (OS/2)
448.Ed
449.Pp
450This strategy of examining has a drawback: You must make sure that
451you eventually print something, or users may get empty output (like, when
452there is neither PE\e0\e0 nor LE\e0\e0 in the above example)
453.Pp
454If this indirect offset cannot be used directly, simple calculations are
455possible: appending
456.Em [+-*/%\*[Am]|^]number
457inside parentheses allows one to modify
458the value read from the file before it is used as an offset:
459.Bd -literal -offset indent
460# MS Windows executables are also valid MS-DOS executables
4610           string  MZ
462# sometimes, the value at 0x18 is less that 0x40 but there's still an
463# extended executable, simply appended to the file
464\*[Gt]0x18       leshort \*[Lt]0x40
465\*[Gt]\*[Gt](4.s*512) leshort 0x014c  COFF executable (MS-DOS, DJGPP)
466\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
467.Ed
468.Pp
469Sometimes you do not know the exact offset as this depends on the length or
470position (when indirection was used before) of preceding fields.
471You can specify an offset relative to the end of the last up-level
472field using
473.Sq \*[Am]
474as a prefix to the offset:
475.Bd -literal -offset indent
4760           string  MZ
477\*[Gt]0x18       leshort \*[Gt]0x3f
478\*[Gt]\*[Gt](0x3c.l)  string  PE\e0\e0    PE executable (MS-Windows)
479# immediately following the PE signature is the CPU type
480\*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort 0x14c     for Intel 80386
481\*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort 0x184     for DEC Alpha
482.Ed
483.Pp
484Indirect and relative offsets can be combined:
485.Bd -literal -offset indent
4860             string  MZ
487\*[Gt]0x18         leshort \*[Lt]0x40
488\*[Gt]\*[Gt](4.s*512)   leshort !0x014c MZ executable (MS-DOS)
489# if it's not COFF, go back 512 bytes and add the offset taken
490# from byte 2/3, which is yet another way of finding the start
491# of the extended executable
492\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string  LE      LE executable (MS Windows VxD driver)
493.Ed
494.Pp
495Or the other way around:
496.Bd -literal -offset indent
4970                 string  MZ
498\*[Gt]0x18             leshort \*[Gt]0x3f
499\*[Gt]\*[Gt](0x3c.l)        string  LE\e0\e0  LE executable (MS-Windows)
500# at offset 0x80 (-4, since relative offsets start at the end
501# of the up-level match) inside the LE header, we find the absolute
502# offset to the code area, where we look for a specific signature
503\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string  UPX     \eb, UPX compressed
504.Ed
505.Pp
506Or even both!
507.Bd -literal -offset indent
5080                string  MZ
509\*[Gt]0x18            leshort \*[Gt]0x3f
510\*[Gt]\*[Gt](0x3c.l)       string  LE\e0\e0 LE executable (MS-Windows)
511# at offset 0x58 inside the LE header, we find the relative offset
512# to a data area where we look for a specific signature
513\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3)  string  UNACE  \eb, ACE self-extracting archive
514.Ed
515.Pp
516Finally, if you have to deal with offset/length pairs in your file, even the
517second value in a parenthesized expression can be taken from the file itself,
518using another set of parentheses.
519Note that this additional indirect offset is always relative to the
520start of the main indirect offset.
521.Bd -literal -offset indent
5220                 string       MZ
523\*[Gt]0x18             leshort      \*[Gt]0x3f
524\*[Gt]\*[Gt](0x3c.l)        string       PE\e0\e0 PE executable (MS-Windows)
525# search for the PE section called ".idata"...
526\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4          search/0x140 .idata
527# ...and go to the end of it, calculated from start+length;
528# these are located 14 and 10 bytes after the section name
529\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string       PK\e3\e4 \eb, ZIP self-extracting archive
530.Ed
531.Sh SEE ALSO
532.Xr file 1
533\- the command that reads this file.
534.Sh BUGS
535The formats
536.Dv long ,
537.Dv belong ,
538.Dv lelong ,
539.Dv melong ,
540.Dv short ,
541.Dv beshort ,
542.Dv leshort ,
543.Dv date ,
544.Dv bedate ,
545.Dv medate ,
546.Dv ledate ,
547.Dv beldate ,
548.Dv leldate ,
549and
550.Dv meldate
551are system-dependent; perhaps they should be specified as a number
552of bytes (2B, 4B, etc),
553since the files being recognized typically come from
554a system on which the lengths are invariant.
555.\"
556.\" From: guy@sun.uucp (Guy Harris)
557.\" Newsgroups: net.bugs.usg
558.\" Subject: /etc/magic's format isn't well documented
559.\" Message-ID: <2752@sun.uucp>
560.\" Date: 3 Sep 85 08:19:07 GMT
561.\" Organization: Sun Microsystems, Inc.
562.\" Lines: 136
563.\"
564.\" Here's a manual page for the format accepted by the "file" made by adding
565.\" the changes I posted to the S5R2 version.
566.\"
567.\" Modified for Ian Darwin's version of the file command.
568