1.\" $NetBSD: magic.5,v 1.7 2012/02/22 17:53:50 christos Exp $ 2.\" 3.\" $File: magic.man,v 1.71 2011/12/07 11:58:24 rrt Exp $ 4.Dd April 20, 2011 5.Dt MAGIC 5 6.Os 7.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 8.Sh NAME 9.Nm magic 10.Nd file command's magic pattern file 11.Sh DESCRIPTION 12This manual page documents the format of the magic file as 13used by the 14.Xr file 1 15command, version 5.11. 16The 17.Xr file 1 18command identifies the type of a file using, 19among other tests, 20a test for whether the file contains certain 21.Dq "magic patterns" . 22The file 23.Pa /usr/share/misc/magic 24specifies what patterns are to be tested for, what message or 25MIME type to print if a particular pattern is found, 26and additional information to extract from the file. 27.Pp 28Each line of the file specifies a test to be performed. 29A test compares the data starting at a particular offset 30in the file with a byte value, a string or a numeric value. 31If the test succeeds, a message is printed. 32The line consists of the following fields: 33.Bl -tag -width ".Dv message" 34.It Dv offset 35A number specifying the offset, in bytes, into the file of the data 36which is to be tested. 37.It Dv type 38The type of the data to be tested. 39The possible values are: 40.Bl -tag -width ".Dv lestring16" 41.It Dv byte 42A one-byte value. 43.It Dv short 44A two-byte value in this machine's native byte order. 45.It Dv long 46A four-byte value in this machine's native byte order. 47.It Dv quad 48An eight-byte value in this machine's native byte order. 49.It Dv float 50A 32-bit single precision IEEE floating point number in this machine's native byte order. 51.It Dv double 52A 64-bit double precision IEEE floating point number in this machine's native byte order. 53.It Dv string 54A string of bytes. 55The string type specification can be optionally followed 56by /[WwcCtb]*. 57The 58.Dq W 59flag compacts whitespace in the target, which must 60contain at least one whitespace character. 61If the magic has 62.Dv n 63consecutive blanks, the target needs at least 64.Dv n 65consecutive blanks to match. 66The 67.Dq w 68flag treats every blank in the magic as an optional blank. 69The 70.Dq c 71flag specifies case insensitive matching: lower case 72characters in the magic match both lower and upper case characters in the 73target, whereas upper case characters in the magic only match upper case 74characters in the target. 75The 76.Dq C 77flag specifies case insensitive matching: upper case 78characters in the magic match both lower and upper case characters in the 79target, whereas lower case characters in the magic only match upper case 80characters in the target. 81To do a complete case insensitive match, specify both 82.Dq c 83and 84.Dq C . 85The 86.Dq t 87flag forces the test to be done for text files, while the 88.Dq b 89flag forces the test to be done for binary files. 90.It Dv pstring 91A Pascal-style string where the first byte/short/int is interpreted as the an 92unsigned length. 93The length defaults to byte and can be specified as a modifier. 94The following modifiers are supported: 95.Bl -tag -compact -width B 96.It B 97A byte length (default). 98.It H 99A 2 byte big endian length. 100.It h 101A 2 byte big little length. 102.It L 103A 4 byte big endian length. 104.It l 105A 4 byte big little length. 106.It J 107The length includes itself in its count. 108.El 109The string is not NUL terminated. 110.Dq J 111is used rather than the more 112valuable 113.Dq I 114because this type of length is a feature of the JPEG 115format. 116.It Dv date 117A four-byte value interpreted as a UNIX date. 118.It Dv qdate 119A eight-byte value interpreted as a UNIX date. 120.It Dv ldate 121A four-byte value interpreted as a UNIX-style date, but interpreted as 122local time rather than UTC. 123.It Dv qldate 124An eight-byte value interpreted as a UNIX-style date, but interpreted as 125local time rather than UTC. 126.It Dv beid3 127A 32-bit ID3 length in big-endian byte order. 128.It Dv beshort 129A two-byte value in big-endian byte order. 130.It Dv belong 131A four-byte value in big-endian byte order. 132.It Dv bequad 133An eight-byte value in big-endian byte order. 134.It Dv befloat 135A 32-bit single precision IEEE floating point number in big-endian byte order. 136.It Dv bedouble 137A 64-bit double precision IEEE floating point number in big-endian byte order. 138.It Dv bedate 139A four-byte value in big-endian byte order, 140interpreted as a Unix date. 141.It Dv beqdate 142An eight-byte value in big-endian byte order, 143interpreted as a Unix date. 144.It Dv beldate 145A four-byte value in big-endian byte order, 146interpreted as a UNIX-style date, but interpreted as local time rather 147than UTC. 148.It Dv beqldate 149An eight-byte value in big-endian byte order, 150interpreted as a UNIX-style date, but interpreted as local time rather 151than UTC. 152.It Dv bestring16 153A two-byte unicode (UCS16) string in big-endian byte order. 154.It Dv leid3 155A 32-bit ID3 length in little-endian byte order. 156.It Dv leshort 157A two-byte value in little-endian byte order. 158.It Dv lelong 159A four-byte value in little-endian byte order. 160.It Dv lequad 161An eight-byte value in little-endian byte order. 162.It Dv lefloat 163A 32-bit single precision IEEE floating point number in little-endian byte order. 164.It Dv ledouble 165A 64-bit double precision IEEE floating point number in little-endian byte order. 166.It Dv ledate 167A four-byte value in little-endian byte order, 168interpreted as a UNIX date. 169.It Dv leqdate 170An eight-byte value in little-endian byte order, 171interpreted as a UNIX date. 172.It Dv leldate 173A four-byte value in little-endian byte order, 174interpreted as a UNIX-style date, but interpreted as local time rather 175than UTC. 176.It Dv leqldate 177An eight-byte value in little-endian byte order, 178interpreted as a UNIX-style date, but interpreted as local time rather 179than UTC. 180.It Dv lestring16 181A two-byte unicode (UCS16) string in little-endian byte order. 182.It Dv melong 183A four-byte value in middle-endian (PDP-11) byte order. 184.It Dv medate 185A four-byte value in middle-endian (PDP-11) byte order, 186interpreted as a UNIX date. 187.It Dv meldate 188A four-byte value in middle-endian (PDP-11) byte order, 189interpreted as a UNIX-style date, but interpreted as local time rather 190than UTC. 191.It Dv indirect 192Starting at the given offset, consult the magic database again. 193.It Dv regex 194A regular expression match in extended POSIX regular expression syntax 195(like egrep). 196Regular expressions can take exponential time to process, and their 197performance is hard to predict, so their use is discouraged. 198When used in production environments, their performance 199should be carefully checked. 200The type specification can be optionally followed by 201.Dv /[c][s] . 202The 203.Dq c 204flag makes the match case insensitive, while the 205.Dq s 206flag update the offset to the start offset of the match, rather than the end. 207The regular expression is tested against line 208.Dv N + 1 209onwards, where 210.Dv N 211is the given offset. 212Line endings are assumed to be in the machine's native format. 213.Dv ^ 214and 215.Dv $ 216match the beginning and end of individual lines, respectively, 217not beginning and end of file. 218.It Dv search 219A literal string search starting at the given offset. 220The same modifier flags can be used as for string patterns. 221The modifier flags (if any) must be followed by 222.Dv /number 223the range, that is, the number of positions at which the match will be 224attempted, starting from the start offset. 225This is suitable for 226searching larger binary expressions with variable offsets, using 227.Dv \e 228escapes for special characters. 229The offset works as for regex. 230.It Dv default 231This is intended to be used with the test 232.Em x 233(which is always true) and a message that is to be used if there are 234no other matches. 235.El 236.Pp 237Each top-level magic pattern (see below for an explanation of levels) 238is classified as text or binary according to the types used. 239Types 240.Dq regex 241and 242.Dq search 243are classified as text tests, unless non-printable characters are used 244in the pattern. 245All other tests are classified as binary. 246A top-level 247pattern is considered to be a test text when all its patterns are text 248patterns; otherwise, it is considered to be a binary pattern. 249When 250matching a file, binary patterns are tried first; if no match is 251found, and the file looks like text, then its encoding is determined 252and the text patterns are tried. 253.Pp 254The numeric types may optionally be followed by 255.Dv \*[Am] 256and a numeric value, 257to specify that the value is to be AND'ed with the 258numeric value before any comparisons are done. 259Prepending a 260.Dv u 261to the type indicates that ordered comparisons should be unsigned. 262.It Dv test 263The value to be compared with the value from the file. 264If the type is 265numeric, this value 266is specified in C form; if it is a string, it is specified as a C string 267with the usual escapes permitted (e.g. \en for new-line). 268.Pp 269Numeric values 270may be preceded by a character indicating the operation to be performed. 271It may be 272.Dv = , 273to specify that the value from the file must equal the specified value, 274.Dv \*[Lt] , 275to specify that the value from the file must be less than the specified 276value, 277.Dv \*[Gt] , 278to specify that the value from the file must be greater than the specified 279value, 280.Dv \*[Am] , 281to specify that the value from the file must have set all of the bits 282that are set in the specified value, 283.Dv ^ , 284to specify that the value from the file must have clear any of the bits 285that are set in the specified value, or 286.Dv ~ , 287the value specified after is negated before tested. 288.Dv x , 289to specify that any value will match. 290If the character is omitted, it is assumed to be 291.Dv = . 292Operators 293.Dv \*[Am] , 294.Dv ^ , 295and 296.Dv ~ 297don't work with floats and doubles. 298The operator 299.Dv !\& 300specifies that the line matches if the test does 301.Em not 302succeed. 303.Pp 304Numeric values are specified in C form; e.g. 305.Dv 13 306is decimal, 307.Dv 013 308is octal, and 309.Dv 0x13 310is hexadecimal. 311.Pp 312For string values, the string from the 313file must match the specified string. 314The operators 315.Dv = , 316.Dv \*[Lt] 317and 318.Dv \*[Gt] 319(but not 320.Dv \*[Am] ) 321can be applied to strings. 322The length used for matching is that of the string argument 323in the magic file. 324This means that a line can match any non-empty string (usually used to 325then print the string), with 326.Em \*[Gt]\e0 327(because all non-empty strings are greater than the empty string). 328.Pp 329The special test 330.Em x 331always evaluates to true. 332.It Dv message 333The message to be printed if the comparison succeeds. 334If the string contains a 335.Xr printf 3 336format specification, the value from the file (with any specified masking 337performed) is printed using the message as the format string. 338If the string begins with 339.Dq \eb , 340the message printed is the remainder of the string with no whitespace 341added before it: multiple matches are normally separated by a single 342space. 343.El 344.Pp 345An APPLE 4+4 character APPLE creator and type can be specified as: 346.Bd -literal -offset indent 347!:apple CREATYPE 348.Ed 349.Pp 350A MIME type is given on a separate line, which must be the next 351non-blank or comment line after the magic line that identifies the 352file type, and has the following format: 353.Bd -literal -offset indent 354!:mime MIMETYPE 355.Ed 356.Pp 357i.e. the literal string 358.Dq !:mime 359followed by the MIME type. 360.Pp 361An optional strength can be supplied on a separate line which refers to 362the current magic description using the following format: 363.Bd -literal -offset indent 364!:strength OP VALUE 365.Ed 366.Pp 367The operand 368.Dv OP 369can be: 370.Dv + , 371.Dv - , 372.Dv * , 373or 374.Dv / 375and 376.Dv VALUE 377is a constant between 0 and 255. 378This constant is applied using the specified operand 379to the currently computed default magic strength. 380.Pp 381Some file formats contain additional information which is to be printed 382along with the file type or need additional tests to determine the true 383file type. 384These additional tests are introduced by one or more 385.Em \*[Gt] 386characters preceding the offset. 387The number of 388.Em \*[Gt] 389on the line indicates the level of the test; a line with no 390.Em \*[Gt] 391at the beginning is considered to be at level 0. 392Tests are arranged in a tree-like hierarchy: 393if the test on a line at level 394.Em n 395succeeds, all following tests at level 396.Em n+1 397are performed, and the messages printed if the tests succeed, until a line 398with level 399.Em n 400(or less) appears. 401For more complex files, one can use empty messages to get just the 402"if/then" effect, in the following way: 403.Bd -literal -offset indent 4040 string MZ 405\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 406\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 407.Ed 408.Pp 409Offsets do not need to be constant, but can also be read from the file 410being examined. 411If the first character following the last 412.Em \*[Gt] 413is a 414.Em \&( 415then the string after the parenthesis is interpreted as an indirect offset. 416That means that the number after the parenthesis is used as an offset in 417the file. 418The value at that offset is read, and is used again as an offset 419in the file. 420Indirect offsets are of the form: 421.Em (( x [.[bislBISL]][+\-][ y ]) . 422The value of 423.Em x 424is used as an offset in the file. 425A byte, id3 length, short or long is read at that offset depending on the 426.Em [bislBISLm] 427type specifier. 428The capitalized types interpret the number as a big endian 429value, whereas the small letter versions interpret the number as a little 430endian value; 431the 432.Em m 433type interprets the number as a middle endian (PDP-11) value. 434To that number the value of 435.Em y 436is added and the result is used as an offset in the file. 437The default type if one is not specified is long. 438.Pp 439That way variable length structures can be examined: 440.Bd -literal -offset indent 441# MS Windows executables are also valid MS-DOS executables 4420 string MZ 443\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 444# skip the whole block below if it is not an extended executable 445\*[Gt]0x18 leshort \*[Gt]0x3f 446\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 447\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 448.Ed 449.Pp 450This strategy of examining has a drawback: You must make sure that 451you eventually print something, or users may get empty output (like, when 452there is neither PE\e0\e0 nor LE\e0\e0 in the above example) 453.Pp 454If this indirect offset cannot be used directly, simple calculations are 455possible: appending 456.Em [+-*/%\*[Am]|^]number 457inside parentheses allows one to modify 458the value read from the file before it is used as an offset: 459.Bd -literal -offset indent 460# MS Windows executables are also valid MS-DOS executables 4610 string MZ 462# sometimes, the value at 0x18 is less that 0x40 but there's still an 463# extended executable, simply appended to the file 464\*[Gt]0x18 leshort \*[Lt]0x40 465\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 466\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 467.Ed 468.Pp 469Sometimes you do not know the exact offset as this depends on the length or 470position (when indirection was used before) of preceding fields. 471You can specify an offset relative to the end of the last up-level 472field using 473.Sq \*[Am] 474as a prefix to the offset: 475.Bd -literal -offset indent 4760 string MZ 477\*[Gt]0x18 leshort \*[Gt]0x3f 478\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 479# immediately following the PE signature is the CPU type 480\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 481\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 482.Ed 483.Pp 484Indirect and relative offsets can be combined: 485.Bd -literal -offset indent 4860 string MZ 487\*[Gt]0x18 leshort \*[Lt]0x40 488\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 489# if it's not COFF, go back 512 bytes and add the offset taken 490# from byte 2/3, which is yet another way of finding the start 491# of the extended executable 492\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 493.Ed 494.Pp 495Or the other way around: 496.Bd -literal -offset indent 4970 string MZ 498\*[Gt]0x18 leshort \*[Gt]0x3f 499\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 500# at offset 0x80 (-4, since relative offsets start at the end 501# of the up-level match) inside the LE header, we find the absolute 502# offset to the code area, where we look for a specific signature 503\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 504.Ed 505.Pp 506Or even both! 507.Bd -literal -offset indent 5080 string MZ 509\*[Gt]0x18 leshort \*[Gt]0x3f 510\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 511# at offset 0x58 inside the LE header, we find the relative offset 512# to a data area where we look for a specific signature 513\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 514.Ed 515.Pp 516Finally, if you have to deal with offset/length pairs in your file, even the 517second value in a parenthesized expression can be taken from the file itself, 518using another set of parentheses. 519Note that this additional indirect offset is always relative to the 520start of the main indirect offset. 521.Bd -literal -offset indent 5220 string MZ 523\*[Gt]0x18 leshort \*[Gt]0x3f 524\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 525# search for the PE section called ".idata"... 526\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 527# ...and go to the end of it, calculated from start+length; 528# these are located 14 and 10 bytes after the section name 529\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 530.Ed 531.Sh SEE ALSO 532.Xr file 1 533\- the command that reads this file. 534.Sh BUGS 535The formats 536.Dv long , 537.Dv belong , 538.Dv lelong , 539.Dv melong , 540.Dv short , 541.Dv beshort , 542.Dv leshort , 543.Dv date , 544.Dv bedate , 545.Dv medate , 546.Dv ledate , 547.Dv beldate , 548.Dv leldate , 549and 550.Dv meldate 551are system-dependent; perhaps they should be specified as a number 552of bytes (2B, 4B, etc), 553since the files being recognized typically come from 554a system on which the lengths are invariant. 555.\" 556.\" From: guy@sun.uucp (Guy Harris) 557.\" Newsgroups: net.bugs.usg 558.\" Subject: /etc/magic's format isn't well documented 559.\" Message-ID: <2752@sun.uucp> 560.\" Date: 3 Sep 85 08:19:07 GMT 561.\" Organization: Sun Microsystems, Inc. 562.\" Lines: 136 563.\" 564.\" Here's a manual page for the format accepted by the "file" made by adding 565.\" the changes I posted to the S5R2 version. 566.\" 567.\" Modified for Ian Darwin's version of the file command. 568