1.\" $NetBSD: magic.5,v 1.15 2017/02/10 17:53:24 christos Exp $ 2.\" 3.\" $File: magic.man,v 1.90 2017/02/08 21:52:03 christos Exp $ 4.Dd February 8, 2017 5.Dt MAGIC 5 6.Os 7.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 8.Sh NAME 9.Nm magic 10.Nd file command's magic pattern file 11.Sh DESCRIPTION 12This manual page documents the format of magic files as 13used by the 14.Xr file 1 15command, version 5.30. 16The 17.Xr file 1 18command identifies the type of a file using, 19among other tests, 20a test for whether the file contains certain 21.Dq "magic patterns" . 22The database of these 23.Dq "magic patterns" 24is usually located in a binary file in 25.Pa /usr/share/misc/magic.mgc 26or a directory of source text magic pattern fragment files in 27.Pa /usr/share/misc/magic . 28The database specifies what patterns are to be tested for, what message or 29MIME type to print if a particular pattern is found, 30and additional information to extract from the file. 31.Pp 32The format of the source fragment files that are used to build this database 33is as follows: 34Each line of a fragment file specifies a test to be performed. 35A test compares the data starting at a particular offset 36in the file with a byte value, a string or a numeric value. 37If the test succeeds, a message is printed. 38The line consists of the following fields: 39.Bl -tag -width ".Dv message" 40.It Dv offset 41A number specifying the offset, in bytes, into the file of the data 42which is to be tested. 43.It Dv type 44The type of the data to be tested. 45The possible values are: 46.Bl -tag -width ".Dv lestring16" 47.It Dv byte 48A one-byte value. 49.It Dv short 50A two-byte value in this machine's native byte order. 51.It Dv long 52A four-byte value in this machine's native byte order. 53.It Dv quad 54An eight-byte value in this machine's native byte order. 55.It Dv float 56A 32-bit single precision IEEE floating point number in this machine's native byte order. 57.It Dv double 58A 64-bit double precision IEEE floating point number in this machine's native byte order. 59.It Dv string 60A string of bytes. 61The string type specification can be optionally followed 62by /[WwcCtbT]*. 63The 64.Dq W 65flag compacts whitespace in the target, which must 66contain at least one whitespace character. 67If the magic has 68.Dv n 69consecutive blanks, the target needs at least 70.Dv n 71consecutive blanks to match. 72The 73.Dq w 74flag treats every blank in the magic as an optional blank. 75The 76.Dq c 77flag specifies case insensitive matching: lower case 78characters in the magic match both lower and upper case characters in the 79target, whereas upper case characters in the magic only match upper case 80characters in the target. 81The 82.Dq C 83flag specifies case insensitive matching: upper case 84characters in the magic match both lower and upper case characters in the 85target, whereas lower case characters in the magic only match upper case 86characters in the target. 87To do a complete case insensitive match, specify both 88.Dq c 89and 90.Dq C . 91The 92.Dq t 93flag forces the test to be done for text files, while the 94.Dq b 95flag forces the test to be done for binary files. 96The 97.Dq T 98flag causes the string to be trimmed, i.e. leading and trailing whitespace 99is deleted before the string is printed. 100.It Dv pstring 101A Pascal-style string where the first byte/short/int is interpreted as the 102unsigned length. 103The length defaults to byte and can be specified as a modifier. 104The following modifiers are supported: 105.Bl -tag -compact -width B 106.It B 107A byte length (default). 108.It H 109A 2 byte big endian length. 110.It h 111A 2 byte big little length. 112.It L 113A 4 byte big endian length. 114.It l 115A 4 byte big little length. 116.It J 117The length includes itself in its count. 118.El 119The string is not NUL terminated. 120.Dq J 121is used rather than the more 122valuable 123.Dq I 124because this type of length is a feature of the JPEG 125format. 126.It Dv date 127A four-byte value interpreted as a UNIX date. 128.It Dv qdate 129A eight-byte value interpreted as a UNIX date. 130.It Dv ldate 131A four-byte value interpreted as a UNIX-style date, but interpreted as 132local time rather than UTC. 133.It Dv qldate 134An eight-byte value interpreted as a UNIX-style date, but interpreted as 135local time rather than UTC. 136.It Dv qwdate 137An eight-byte value interpreted as a Windows-style date. 138.It Dv beid3 139A 32-bit ID3 length in big-endian byte order. 140.It Dv beshort 141A two-byte value in big-endian byte order. 142.It Dv belong 143A four-byte value in big-endian byte order. 144.It Dv bequad 145An eight-byte value in big-endian byte order. 146.It Dv befloat 147A 32-bit single precision IEEE floating point number in big-endian byte order. 148.It Dv bedouble 149A 64-bit double precision IEEE floating point number in big-endian byte order. 150.It Dv bedate 151A four-byte value in big-endian byte order, 152interpreted as a Unix date. 153.It Dv beqdate 154An eight-byte value in big-endian byte order, 155interpreted as a Unix date. 156.It Dv beldate 157A four-byte value in big-endian byte order, 158interpreted as a UNIX-style date, but interpreted as local time rather 159than UTC. 160.It Dv beqldate 161An eight-byte value in big-endian byte order, 162interpreted as a UNIX-style date, but interpreted as local time rather 163than UTC. 164.It Dv beqwdate 165An eight-byte value in big-endian byte order, 166interpreted as a Windows-style date. 167.It Dv bestring16 168A two-byte unicode (UCS16) string in big-endian byte order. 169.It Dv leid3 170A 32-bit ID3 length in little-endian byte order. 171.It Dv leshort 172A two-byte value in little-endian byte order. 173.It Dv lelong 174A four-byte value in little-endian byte order. 175.It Dv lequad 176An eight-byte value in little-endian byte order. 177.It Dv lefloat 178A 32-bit single precision IEEE floating point number in little-endian byte order. 179.It Dv ledouble 180A 64-bit double precision IEEE floating point number in little-endian byte order. 181.It Dv ledate 182A four-byte value in little-endian byte order, 183interpreted as a UNIX date. 184.It Dv leqdate 185An eight-byte value in little-endian byte order, 186interpreted as a UNIX date. 187.It Dv leldate 188A four-byte value in little-endian byte order, 189interpreted as a UNIX-style date, but interpreted as local time rather 190than UTC. 191.It Dv leqldate 192An eight-byte value in little-endian byte order, 193interpreted as a UNIX-style date, but interpreted as local time rather 194than UTC. 195.It Dv leqwdate 196An eight-byte value in little-endian byte order, 197interpreted as a Windows-style date. 198.It Dv lestring16 199A two-byte unicode (UCS16) string in little-endian byte order. 200.It Dv melong 201A four-byte value in middle-endian (PDP-11) byte order. 202.It Dv medate 203A four-byte value in middle-endian (PDP-11) byte order, 204interpreted as a UNIX date. 205.It Dv meldate 206A four-byte value in middle-endian (PDP-11) byte order, 207interpreted as a UNIX-style date, but interpreted as local time rather 208than UTC. 209.It Dv indirect 210Starting at the given offset, consult the magic database again. 211The offset of the 212.Dv indirect 213magic is by default absolute in the file, but one can specify 214.Dv /r 215to indicate that the offset is relative from the beginning of the entry. 216.It Dv name 217Define a 218.Dq named 219magic instance that can be called from another 220.Dv use 221magic entry, like a subroutine call. 222Named instance direct magic offsets are relative to the offset of the 223previous matched entry, but indirect offsets are relative to the beginning 224of the file as usual. 225Named magic entries always match. 226.It Dv use 227Recursively call the named magic starting from the current offset. 228If the name of the referenced begins with a 229.Dv ^ 230then the endianness of the magic is switched; if the magic mentioned 231.Dv leshort 232for example, 233it is treated as 234.Dv beshort 235and vice versa. 236This is useful to avoid duplicating the rules for different endianness. 237.It Dv regex 238A regular expression match in extended POSIX regular expression syntax 239(like egrep). 240Regular expressions can take exponential time to process, and their 241performance is hard to predict, so their use is discouraged. 242When used in production environments, their performance 243should be carefully checked. 244The size of the string to search should also be limited by specifying 245.Dv /<length> , 246to avoid performance issues scanning long files. 247The type specification can also be optionally followed by 248.Dv /[c][s][l] . 249The 250.Dq c 251flag makes the match case insensitive, while the 252.Dq s 253flag update the offset to the start offset of the match, rather than the end. 254The 255.Dq l 256modifier, changes the limit of length to mean number of lines instead of a 257byte count. 258Lines are delimited by the platforms native line delimiter. 259When a line count is specified, an implicit byte count also computed assuming 260each line is 80 characters long. 261If neither a byte or line count is specified, the search is limited automatically 262to 8KiB. 263.Dv ^ 264and 265.Dv $ 266match the beginning and end of individual lines, respectively, 267not beginning and end of file. 268.It Dv search 269A literal string search starting at the given offset. 270The same modifier flags can be used as for string patterns. 271The search expression must contain the range in the form 272.Dv /number, 273that is the number of positions at which the match will be 274attempted, starting from the start offset. 275This is suitable for 276searching larger binary expressions with variable offsets, using 277.Dv \e 278escapes for special characters. 279The order of modifier and number is not relevant. 280.It Dv default 281This is intended to be used with the test 282.Em x 283(which is always true) and it has no type. 284It matches when no other test at that continuation level has matched before. 285Clearing that matched tests for a continuation level, can be done using the 286.Dv clear 287test. 288.It Dv clear 289This test is always true and clears the match flag for that continuation level. 290It is intended to be used with the 291.Dv default 292test. 293.El 294.Pp 295For compatibility with the Single 296.Ux 297Standard, the type specifiers 298.Dv dC 299and 300.Dv d1 301are equivalent to 302.Dv byte , 303the type specifiers 304.Dv uC 305and 306.Dv u1 307are equivalent to 308.Dv ubyte , 309the type specifiers 310.Dv dS 311and 312.Dv d2 313are equivalent to 314.Dv short , 315the type specifiers 316.Dv uS 317and 318.Dv u2 319are equivalent to 320.Dv ushort , 321the type specifiers 322.Dv dI , 323.Dv dL , 324and 325.Dv d4 326are equivalent to 327.Dv long , 328the type specifiers 329.Dv uI , 330.Dv uL , 331and 332.Dv u4 333are equivalent to 334.Dv ulong , 335the type specifier 336.Dv d8 337is equivalent to 338.Dv quad , 339the type specifier 340.Dv u8 341is equivalent to 342.Dv uquad , 343and the type specifier 344.Dv s 345is equivalent to 346.Dv string . 347In addition, the type specifier 348.Dv dQ 349is equivalent to 350.Dv quad 351and the type specifier 352.Dv uQ 353is equivalent to 354.Dv uquad . 355.Pp 356Each top-level magic pattern (see below for an explanation of levels) 357is classified as text or binary according to the types used. 358Types 359.Dq regex 360and 361.Dq search 362are classified as text tests, unless non-printable characters are used 363in the pattern. 364All other tests are classified as binary. 365A top-level 366pattern is considered to be a test text when all its patterns are text 367patterns; otherwise, it is considered to be a binary pattern. 368When 369matching a file, binary patterns are tried first; if no match is 370found, and the file looks like text, then its encoding is determined 371and the text patterns are tried. 372.Pp 373The numeric types may optionally be followed by 374.Dv \*[Am] 375and a numeric value, 376to specify that the value is to be AND'ed with the 377numeric value before any comparisons are done. 378Prepending a 379.Dv u 380to the type indicates that ordered comparisons should be unsigned. 381.It Dv test 382The value to be compared with the value from the file. 383If the type is 384numeric, this value 385is specified in C form; if it is a string, it is specified as a C string 386with the usual escapes permitted (e.g. \en for new-line). 387.Pp 388Numeric values 389may be preceded by a character indicating the operation to be performed. 390It may be 391.Dv = , 392to specify that the value from the file must equal the specified value, 393.Dv \*[Lt] , 394to specify that the value from the file must be less than the specified 395value, 396.Dv \*[Gt] , 397to specify that the value from the file must be greater than the specified 398value, 399.Dv \*[Am] , 400to specify that the value from the file must have set all of the bits 401that are set in the specified value, 402.Dv ^ , 403to specify that the value from the file must have clear any of the bits 404that are set in the specified value, or 405.Dv ~ , 406the value specified after is negated before tested. 407.Dv x , 408to specify that any value will match. 409If the character is omitted, it is assumed to be 410.Dv = . 411Operators 412.Dv \*[Am] , 413.Dv ^ , 414and 415.Dv ~ 416don't work with floats and doubles. 417The operator 418.Dv !\& 419specifies that the line matches if the test does 420.Em not 421succeed. 422.Pp 423Numeric values are specified in C form; e.g. 424.Dv 13 425is decimal, 426.Dv 013 427is octal, and 428.Dv 0x13 429is hexadecimal. 430.Pp 431Numeric operations are not performed on date types, instead the numeric 432value is interpreted as an offset. 433.Pp 434For string values, the string from the 435file must match the specified string. 436The operators 437.Dv = , 438.Dv \*[Lt] 439and 440.Dv \*[Gt] 441(but not 442.Dv \*[Am] ) 443can be applied to strings. 444The length used for matching is that of the string argument 445in the magic file. 446This means that a line can match any non-empty string (usually used to 447then print the string), with 448.Em \*[Gt]\e0 449(because all non-empty strings are greater than the empty string). 450.Pp 451Dates are treated as numerical values in the respective internal 452representation. 453.Pp 454The special test 455.Em x 456always evaluates to true. 457.It Dv message 458The message to be printed if the comparison succeeds. 459If the string contains a 460.Xr printf 3 461format specification, the value from the file (with any specified masking 462performed) is printed using the message as the format string. 463If the string begins with 464.Dq \eb , 465the message printed is the remainder of the string with no whitespace 466added before it: multiple matches are normally separated by a single 467space. 468.El 469.Pp 470An APPLE 4+4 character APPLE creator and type can be specified as: 471.Bd -literal -offset indent 472!:apple CREATYPE 473.Ed 474.Pp 475A MIME type is given on a separate line, which must be the next 476non-blank or comment line after the magic line that identifies the 477file type, and has the following format: 478.Bd -literal -offset indent 479!:mime MIMETYPE 480.Ed 481.Pp 482i.e. the literal string 483.Dq !:mime 484followed by the MIME type. 485.Pp 486An optional strength can be supplied on a separate line which refers to 487the current magic description using the following format: 488.Bd -literal -offset indent 489!:strength OP VALUE 490.Ed 491.Pp 492The operand 493.Dv OP 494can be: 495.Dv + , 496.Dv - , 497.Dv * , 498or 499.Dv / 500and 501.Dv VALUE 502is a constant between 0 and 255. 503This constant is applied using the specified operand 504to the currently computed default magic strength. 505.Pp 506Some file formats contain additional information which is to be printed 507along with the file type or need additional tests to determine the true 508file type. 509These additional tests are introduced by one or more 510.Em \*[Gt] 511characters preceding the offset. 512The number of 513.Em \*[Gt] 514on the line indicates the level of the test; a line with no 515.Em \*[Gt] 516at the beginning is considered to be at level 0. 517Tests are arranged in a tree-like hierarchy: 518if the test on a line at level 519.Em n 520succeeds, all following tests at level 521.Em n+1 522are performed, and the messages printed if the tests succeed, until a line 523with level 524.Em n 525(or less) appears. 526For more complex files, one can use empty messages to get just the 527"if/then" effect, in the following way: 528.Bd -literal -offset indent 5290 string MZ 530\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 531\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 532.Ed 533.Pp 534Offsets do not need to be constant, but can also be read from the file 535being examined. 536If the first character following the last 537.Em \*[Gt] 538is a 539.Em \&( 540then the string after the parenthesis is interpreted as an indirect offset. 541That means that the number after the parenthesis is used as an offset in 542the file. 543The value at that offset is read, and is used again as an offset 544in the file. 545Indirect offsets are of the form: 546.Em (( x [[.,][bislBISL]][+\-][ y ]) . 547The value of 548.Em x 549is used as an offset in the file. 550A byte, id3 length, short or long is read at that offset depending on the 551.Em [bislBISLm] 552type specifier. 553The value is treated as signed if 554.Dq , 555is specified or unsigned if 556.Dq . 557is specified. 558The capitalized types interpret the number as a big endian 559value, whereas the small letter versions interpret the number as a little 560endian value; 561the 562.Em m 563type interprets the number as a middle endian (PDP-11) value. 564To that number the value of 565.Em y 566is added and the result is used as an offset in the file. 567The default type if one is not specified is long. 568.Pp 569That way variable length structures can be examined: 570.Bd -literal -offset indent 571# MS Windows executables are also valid MS-DOS executables 5720 string MZ 573\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 574# skip the whole block below if it is not an extended executable 575\*[Gt]0x18 leshort \*[Gt]0x3f 576\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 577\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 578.Ed 579.Pp 580This strategy of examining has a drawback: you must make sure that you 581eventually print something, or users may get empty output (such as when 582there is neither PE\e0\e0 nor LE\e0\e0 in the above example). 583.Pp 584If this indirect offset cannot be used directly, simple calculations are 585possible: appending 586.Em [+-*/%\*[Am]|^]number 587inside parentheses allows one to modify 588the value read from the file before it is used as an offset: 589.Bd -literal -offset indent 590# MS Windows executables are also valid MS-DOS executables 5910 string MZ 592# sometimes, the value at 0x18 is less that 0x40 but there's still an 593# extended executable, simply appended to the file 594\*[Gt]0x18 leshort \*[Lt]0x40 595\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 596\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 597.Ed 598.Pp 599Sometimes you do not know the exact offset as this depends on the length or 600position (when indirection was used before) of preceding fields. 601You can specify an offset relative to the end of the last up-level 602field using 603.Sq \*[Am] 604as a prefix to the offset: 605.Bd -literal -offset indent 6060 string MZ 607\*[Gt]0x18 leshort \*[Gt]0x3f 608\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 609# immediately following the PE signature is the CPU type 610\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 611\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 612.Ed 613.Pp 614Indirect and relative offsets can be combined: 615.Bd -literal -offset indent 6160 string MZ 617\*[Gt]0x18 leshort \*[Lt]0x40 618\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 619# if it's not COFF, go back 512 bytes and add the offset taken 620# from byte 2/3, which is yet another way of finding the start 621# of the extended executable 622\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 623.Ed 624.Pp 625Or the other way around: 626.Bd -literal -offset indent 6270 string MZ 628\*[Gt]0x18 leshort \*[Gt]0x3f 629\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 630# at offset 0x80 (-4, since relative offsets start at the end 631# of the up-level match) inside the LE header, we find the absolute 632# offset to the code area, where we look for a specific signature 633\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 634.Ed 635.Pp 636Or even both! 637.Bd -literal -offset indent 6380 string MZ 639\*[Gt]0x18 leshort \*[Gt]0x3f 640\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 641# at offset 0x58 inside the LE header, we find the relative offset 642# to a data area where we look for a specific signature 643\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 644.Ed 645.Pp 646If you have to deal with offset/length pairs in your file, even the 647second value in a parenthesized expression can be taken from the file itself, 648using another set of parentheses. 649Note that this additional indirect offset is always relative to the 650start of the main indirect offset. 651.Bd -literal -offset indent 6520 string MZ 653\*[Gt]0x18 leshort \*[Gt]0x3f 654\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 655# search for the PE section called ".idata"... 656\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 657# ...and go to the end of it, calculated from start+length; 658# these are located 14 and 10 bytes after the section name 659\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 660.Ed 661.Pp 662If you have a list of known values at a particular continuation level, 663and you want to provide a switch-like default case: 664.Bd -literal -offset indent 665# clear that continuation level match 666\*[Gt]18 clear 667\*[Gt]18 lelong 1 one 668\*[Gt]18 lelong 2 two 669\*[Gt]18 default x 670# print default match 671\*[Gt]\*[Gt]18 lelong x unmatched 0x%x 672.Ed 673.Sh SEE ALSO 674.Xr file 1 675\- the command that reads this file. 676.Sh BUGS 677The formats 678.Dv long , 679.Dv belong , 680.Dv lelong , 681.Dv melong , 682.Dv short , 683.Dv beshort , 684and 685.Dv leshort 686do not depend on the length of the C data types 687.Dv short 688and 689.Dv long 690on the platform, even though the Single 691.Ux 692Specification implies that they do. However, as OS X Mountain Lion has 693passed the Single 694.Ux 695Specification validation suite, and supplies a version of 696.Xr file 1 697in which they do not depend on the sizes of the C data types and that is 698built for a 64-bit environment in which 699.Dv long 700is 8 bytes rather than 4 bytes, presumably the validation suite does not 701test whether, for example 702.Dv long 703refers to an item with the same size as the C data type 704.Dv long . 705There should probably be 706.Dv type 707names 708.Dv int8 , 709.Dv uint8 , 710.Dv int16 , 711.Dv uint16 , 712.Dv int32 , 713.Dv uint32 , 714.Dv int64 , 715and 716.Dv uint64 , 717and specified-byte-order variants of them, 718to make it clearer that those types have specified widths. 719.\" 720.\" From: guy@sun.uucp (Guy Harris) 721.\" Newsgroups: net.bugs.usg 722.\" Subject: /etc/magic's format isn't well documented 723.\" Message-ID: <2752@sun.uucp> 724.\" Date: 3 Sep 85 08:19:07 GMT 725.\" Organization: Sun Microsystems, Inc. 726.\" Lines: 136 727.\" 728.\" Here's a manual page for the format accepted by the "file" made by adding 729.\" the changes I posted to the S5R2 version. 730.\" 731.\" Modified for Ian Darwin's version of the file command. 732