1.\" $NetBSD: file.1,v 1.18 2017/02/10 17:53:24 christos Exp $ 2.\" 3.\" $File: file.man,v 1.125 2017/01/03 11:24:46 christos Exp $ 4.Dd October 19, 2016 5.Dt FILE 1 6.Os 7.Sh NAME 8.Nm file 9.Nd determine file type 10.Sh SYNOPSIS 11.Nm 12.Bk -words 13.Op Fl bcdEhiklLNnprsvzZ0 14.Op Fl Fl apple 15.Op Fl Fl extension 16.Op Fl Fl mime-encoding 17.Op Fl Fl mime-type 18.Op Fl e Ar testname 19.Op Fl F Ar separator 20.Op Fl f Ar namefile 21.Op Fl m Ar magicfiles 22.Op Fl P Ar name=value 23.Ar 24.Ek 25.Nm 26.Fl C 27.Op Fl m Ar magicfiles 28.Nm 29.Op Fl Fl help 30.Sh DESCRIPTION 31This manual page documents version 5.30 of the 32.Nm 33command. 34.Pp 35.Nm 36tests each argument in an attempt to classify it. 37There are three sets of tests, performed in this order: 38filesystem tests, magic tests, and language tests. 39The 40.Em first 41test that succeeds causes the file type to be printed. 42.Pp 43The type printed will usually contain one of the words 44.Em text 45(the file contains only 46printing characters and a few common control 47characters and is probably safe to read on an 48.Dv ASCII 49terminal), 50.Em executable 51(the file contains the result of compiling a program 52in a form understandable to some 53.Tn UNIX 54kernel or another), 55or 56.Em data 57meaning anything else (data is usually 58.Dq binary 59or non-printable). 60Exceptions are well-known file formats (core files, tar archives) 61that are known to contain binary data. 62When modifying magic files or the program itself, make sure to 63.Em "preserve these keywords" . 64Users depend on knowing that all the readable files in a directory 65have the word 66.Dq text 67printed. 68Don't do as Berkeley did and change 69.Dq shell commands text 70to 71.Dq shell script . 72.Pp 73The filesystem tests are based on examining the return from a 74.Xr stat 2 75system call. 76The program checks to see if the file is empty, 77or if it's some sort of special file. 78Any known file types appropriate to the system you are running on 79(sockets, symbolic links, or named pipes (FIFOs) on those systems that 80implement them) 81are intuited if they are defined in the system header file 82.In sys/stat.h . 83.Pp 84The magic tests are used to check for files with data in 85particular fixed formats. 86The canonical example of this is a binary executable (compiled program) 87.Dv a.out 88file, whose format is defined in 89.In elf.h , 90.In a.out.h 91and possibly 92.In exec.h 93in the standard include directory. 94These files have a 95.Dq "magic number" 96stored in a particular place 97near the beginning of the file that tells the 98.Tn UNIX 99operating system 100that the file is a binary executable, and which of several types thereof. 101The concept of a 102.Dq "magic" 103has been applied by extension to data files. 104Any file with some invariant identifier at a small fixed 105offset into the file can usually be described in this way. 106The information identifying these files is read from the compiled 107magic file 108.Pa /usr/share/misc/magic.mgc , 109or the files in the directory 110.Pa /usr/share/misc/magic 111if the compiled file does not exist. 112In addition, if 113.Pa $HOME/.magic.mgc 114or 115.Pa $HOME/.magic 116exists, it will be used in preference to the system magic files. 117.Pp 118If a file does not match any of the entries in the magic file, 119it is examined to see if it seems to be a text file. 120ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 121(such as those used on Macintosh and IBM PC systems), 122UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 123character sets can be distinguished by the different 124ranges and sequences of bytes that constitute printable text 125in each set. 126If a file passes any of these tests, its character set is reported. 127ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 128as 129.Dq text 130because they will be mostly readable on nearly any terminal; 131UTF-16 and EBCDIC are only 132.Dq character data 133because, while 134they contain text, it is text that will require translation 135before it can be read. 136In addition, 137.Nm 138will attempt to determine other characteristics of text-type files. 139If the lines of a file are terminated by CR, CRLF, or NEL, instead 140of the Unix-standard LF, this will be reported. 141Files that contain embedded escape sequences or overstriking 142will also be identified. 143.Pp 144Once 145.Nm 146has determined the character set used in a text-type file, 147it will 148attempt to determine in what language the file is written. 149The language tests look for particular strings (cf. 150.In names.h ) 151that can appear anywhere in the first few blocks of a file. 152For example, the keyword 153.Em .br 154indicates that the file is most likely a 155.Xr troff 1 156input file, just as the keyword 157.Em struct 158indicates a C program. 159These tests are less reliable than the previous 160two groups, so they are performed last. 161The language test routines also test for some miscellany 162(such as 163.Xr tar 1 164archives). 165.Pp 166Any file that cannot be identified as having been written 167in any of the character sets listed above is simply said to be 168.Dq data . 169.Sh OPTIONS 170.Bl -tag -width indent 171.It Fl Fl apple 172Causes the file command to output the file type and creator code as 173used by older MacOS versions. The code consists of eight letters, 174the first describing the file type, the latter the creator. 175.It Fl b , Fl Fl brief 176Do not prepend filenames to output lines (brief mode). 177.It Fl C , Fl Fl compile 178Write a 179.Pa magic.mgc 180output file that contains a pre-parsed version of the magic file or directory. 181.It Fl c , Fl Fl checking-printout 182Cause a checking printout of the parsed form of the magic file. 183This is usually used in conjunction with the 184.Fl m 185flag to debug a new magic file before installing it. 186.It Fl d 187Prints internal debugging information to stderr. 188.It Fl E 189On filesystem errors (file not found etc), instead of handling the error 190as regular output as POSIX mandates and keep going, issue an error message 191and exit. 192.It Fl e , Fl Fl exclude Ar testname 193Exclude the test named in 194.Ar testname 195from the list of tests made to determine the file type. 196Valid test names are: 197.Bl -tag -width compress 198.It apptype 199.Dv EMX 200application type (only on EMX). 201.It ascii 202Various types of text files (this test will try to guess the text 203encoding, irrespective of the setting of the 204.Sq encoding 205option). 206.It encoding 207Different text encodings for soft magic tests. 208.It tokens 209Ignored for backwards compatibility. 210.It cdf 211Prints details of Compound Document Files. 212.It compress 213Checks for, and looks inside, compressed files. 214.It elf 215Prints ELF file details, provided soft magic tests are enabled and the 216elf magic is found. 217.It soft 218Consults magic files. 219.It tar 220Examines tar files. 221.It text 222A synonym for 223.Sq ascii . 224.El 225.It Fl Fl extension 226Print a slash-separated list of valid extensions for the file type found. 227.It Fl F , Fl Fl separator Ar separator 228Use the specified string as the separator between the filename and the 229file result returned. 230Defaults to 231.Sq \&: . 232.It Fl f , Fl Fl files-from Ar namefile 233Read the names of the files to be examined from 234.Ar namefile 235(one per line) 236before the argument list. 237Either 238.Ar namefile 239or at least one filename argument must be present; 240to test the standard input, use 241.Sq - 242as a filename argument. 243Please note that 244.Ar namefile 245is unwrapped and the enclosed filenames are processed when this option is 246encountered and before any further options processing is done. 247This allows one to process multiple lists of files with different command line 248arguments on the same 249.Nm 250invocation. 251Thus if you want to set the delimiter, you need to do it before you specify 252the list of files, like: 253.Dq Fl F Ar @ Fl f Ar namefile , 254instead of: 255.Dq Fl f Ar namefile Fl F Ar @ . 256.It Fl h , Fl Fl no-dereference 257option causes symlinks not to be followed 258(on systems that support symbolic links). 259This is the default if the environment variable 260.Dv POSIXLY_CORRECT 261is not defined. 262.It Fl i , Fl Fl mime 263Causes the file command to output mime type strings rather than the more 264traditional human readable ones. 265Thus it may say 266.Sq text/plain; charset=us-ascii 267rather than 268.Dq ASCII text . 269.It Fl Fl mime-type , Fl Fl mime-encoding 270Like 271.Fl i , 272but print only the specified element(s). 273.It Fl k , Fl Fl keep-going 274Don't stop at the first match, keep going. 275Subsequent matches will be 276have the string 277.Sq "\[rs]012\- " 278prepended. 279(If you want a newline, see the 280.Fl r 281option.) 282The magic pattern with the highest strength (see the 283.Fl l 284option) comes first. 285.It Fl l , Fl Fl list 286Shows a list of patterns and their strength sorted descending by 287.Xr magic 4 288strength 289which is used for the matching (see also the 290.Fl k 291option). 292.It Fl L , Fl Fl dereference 293option causes symlinks to be followed, as the like-named option in 294.Xr ls 1 295(on systems that support symbolic links). 296This is the default if the environment variable 297.Ev POSIXLY_CORRECT 298is defined. 299.It Fl m , Fl Fl magic-file Ar magicfiles 300Specify an alternate list of files and directories containing magic. 301This can be a single item, or a colon-separated list. 302If a compiled magic file is found alongside a file or directory, 303it will be used instead. 304.It Fl N , Fl Fl no-pad 305Don't pad filenames so that they align in the output. 306.It Fl n , Fl Fl no-buffer 307Force stdout to be flushed after checking each file. 308This is only useful if checking a list of files. 309It is intended to be used by programs that want filetype output from a pipe. 310.It Fl p , Fl Fl preserve-date 311On systems that support 312.Xr utime 3 313or 314.Xr utimes 2 , 315attempt to preserve the access time of files analyzed, to pretend that 316.Nm 317never read them. 318.It Fl P , Fl Fl parameter Ar name=value 319Set various parameter limits. 320.Bl -column "elf_phnum" "Default" "XXXXXXXXXXXXXXXXXXXXXXXXXXX" -offset indent 321.It Sy "Name" Ta Sy "Default" Ta Sy "Explanation" 322.It Li indir Ta 15 Ta recursion limit for indirect magic 323.It Li name Ta 30 Ta use count limit for name/use magic 324.It Li elf_notes Ta 256 Ta max ELF notes processed 325.It Li elf_phnum Ta 128 Ta max ELF program sections processed 326.It Li elf_shnum Ta 32768 Ta max ELF sections processed 327.It Li regex Ta 8192 Ta length limit for regex searches 328.It Li bytes Ta 1048576 Ta max number of bytes to read from file 329.El 330.It Fl r , Fl Fl raw 331Don't translate unprintable characters to \eooo. 332Normally 333.Nm 334translates unprintable characters to their octal representation. 335.It Fl s , Fl Fl special-files 336Normally, 337.Nm 338only attempts to read and determine the type of argument files which 339.Xr stat 2 340reports are ordinary files. 341This prevents problems, because reading special files may have peculiar 342consequences. 343Specifying the 344.Fl s 345option causes 346.Nm 347to also read argument files which are block or character special files. 348This is useful for determining the filesystem types of the data in raw 349disk partitions, which are block special files. 350This option also causes 351.Nm 352to disregard the file size as reported by 353.Xr stat 2 354since on some systems it reports a zero size for raw disk partitions. 355.It Fl v , Fl Fl version 356Print the version of the program and exit. 357.It Fl z , Fl Fl uncompress 358Try to look inside compressed files. 359.It Fl Z , Fl Fl uncompress-noreport 360Try to look inside compressed files, but report information about the contents 361only not the compression. 362.It Fl 0 , Fl Fl print0 363Output a null character 364.Sq \e0 365after the end of the filename. 366Nice to 367.Xr cut 1 368the output. 369This does not affect the separator, which is still printed. 370.Pp 371If this option is repeated more than once, then 372.Nm 373prints just the filename followed by a NUL followed by the description 374(or ERROR: text) followed by a second NUL for each entry. 375.It Fl -help 376Print a help message and exit. 377.El 378.Sh FILES 379.Bl -tag -width /usr/share/misc/magic.mgc -compact 380.It Pa /usr/share/misc/magic.mgc 381Default compiled list of magic. 382.It Pa /usr/share/misc/magic 383Directory containing default magic files. 384.El 385.Sh ENVIRONMENT 386The environment variable 387.Ev MAGIC 388can be used to set the default magic file name. 389If that variable is set, then 390.Nm 391will not attempt to open 392.Pa $HOME/.magic . 393.Nm 394adds 395.Dq Pa .mgc 396to the value of this variable as appropriate. 397However, 398.Pa file 399has to exist in order for 400.Pa file.mime 401to be considered. 402The environment variable 403.Ev POSIXLY_CORRECT 404controls (on systems that support symbolic links), whether 405.Nm 406will attempt to follow symlinks or not. 407If set, then 408.Nm 409follows symlink, otherwise it does not. 410This is also controlled by the 411.Fl L 412and 413.Fl h 414options. 415.Sh SEE ALSO 416.Xr hexdump 1 , 417.Xr od 1 , 418.Xr strings 1 , 419.Xr magic 5 420.Sh STANDARDS CONFORMANCE 421This program is believed to exceed the System V Interface Definition 422of FILE(CMD), as near as one can determine from the vague language 423contained therein. 424Its behavior is mostly compatible with the System V program of the same name. 425This version knows more magic, however, so it will produce 426different (albeit more accurate) output in many cases. 427.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 428.Pp 429The one significant difference 430between this version and System V 431is that this version treats any white space 432as a delimiter, so that spaces in pattern strings must be escaped. 433For example, 434.Bd -literal -offset indent 435\*[Gt]10 string language impress\ (imPRESS data) 436.Ed 437.Pp 438in an existing magic file would have to be changed to 439.Bd -literal -offset indent 440\*[Gt]10 string language\e impress (imPRESS data) 441.Ed 442.Pp 443In addition, in this version, if a pattern string contains a backslash, 444it must be escaped. 445For example 446.Bd -literal -offset indent 4470 string \ebegindata Andrew Toolkit document 448.Ed 449.Pp 450in an existing magic file would have to be changed to 451.Bd -literal -offset indent 4520 string \e\ebegindata Andrew Toolkit document 453.Ed 454.Pp 455SunOS releases 3.2 and later from Sun Microsystems include a 456.Nm 457command derived from the System V one, but with some extensions. 458This version differs from Sun's only in minor ways. 459It includes the extension of the 460.Sq \*[Am] 461operator, used as, 462for example, 463.Bd -literal -offset indent 464\*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped 465.Ed 466.Sh MAGIC DIRECTORY 467The magic file entries have been collected from various sources, 468mainly USENET, and contributed by various authors. 469Christos Zoulas (address below) will collect additional 470or corrected magic file entries. 471A consolidation of magic file entries 472will be distributed periodically. 473.Pp 474The order of entries in the magic file is significant. 475Depending on what system you are using, the order that 476they are put together may be incorrect. 477If your old 478.Nm 479command uses a magic file, 480keep the old magic file around for comparison purposes 481(rename it to 482.Pa /usr/share/misc/magic.orig ) . 483.Sh EXAMPLES 484.Bd -literal -offset indent 485$ file file.c file /dev/{wd0a,hda} 486file.c: C program text 487file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 488 dynamically linked (uses shared libs), stripped 489/dev/wd0a: block special (0/0) 490/dev/hda: block special (3/0) 491 492$ file -s /dev/wd0{b,d} 493/dev/wd0b: data 494/dev/wd0d: x86 boot sector 495 496$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 497/dev/hda: x86 boot sector 498/dev/hda1: Linux/i386 ext2 filesystem 499/dev/hda2: x86 boot sector 500/dev/hda3: x86 boot sector, extended partition table 501/dev/hda4: Linux/i386 ext2 filesystem 502/dev/hda5: Linux/i386 swap file 503/dev/hda6: Linux/i386 swap file 504/dev/hda7: Linux/i386 swap file 505/dev/hda8: Linux/i386 swap file 506/dev/hda9: empty 507/dev/hda10: empty 508 509$ file -i file.c file /dev/{wd0a,hda} 510file.c: text/x-c 511file: application/x-executable 512/dev/hda: application/x-not-regular-file 513/dev/wd0a: application/x-not-regular-file 514 515.Ed 516.Sh HISTORY 517There has been a 518.Nm 519command in every 520.Dv UNIX since at least Research Version 4 521(man page dated November, 1973). 522The System V version introduced one significant major change: 523the external list of magic types. 524This slowed the program down slightly but made it a lot more flexible. 525.Pp 526This program, based on the System V version, 527was written by Ian Darwin 528.Aq ian@darwinsys.com 529without looking at anybody else's source code. 530.Pp 531John Gilmore revised the code extensively, making it better than 532the first version. 533Geoff Collyer found several inadequacies 534and provided some magic file entries. 535Contributions of the 536.Sq \*[Am] 537operator by Rob McMahon, 538.Aq cudcv@warwick.ac.uk , 5391989. 540.Pp 541Guy Harris, 542.Aq guy@netapp.com , 543made many changes from 1993 to the present. 544.Pp 545Primary development and maintenance from 1990 to the present by 546Christos Zoulas 547.Aq christos@astron.com . 548.Pp 549Altered by Chris Lowth 550.Aq chris@lowth.com , 5512000: handle the 552.Fl i 553option to output mime type strings, using an alternative 554magic file and internal logic. 555.Pp 556Altered by Eric Fischer 557.Aq enf@pobox.com , 558July, 2000, 559to identify character codes and attempt to identify the languages 560of non-ASCII files. 561.Pp 562Altered by Reuben Thomas 563.Aq rrt@sc3d.org , 5642007-2011, to improve MIME support, merge MIME and non-MIME magic, 565support directories as well as files of magic, apply many bug fixes, 566update and fix a lot of magic, improve the build system, improve the 567documentation, and rewrite the Python bindings in pure Python. 568.Pp 569The list of contributors to the 570.Sq magic 571directory (magic files) 572is too long to include here. 573You know who you are; thank you. 574Many contributors are listed in the source files. 575.Sh LEGAL NOTICE 576Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 577Covered by the standard Berkeley Software Distribution copyright; see the file 578COPYING in the source distribution. 579.Pp 580The files 581.Pa tar.h 582and 583.Pa is_tar.c 584were written by John Gilmore from his public-domain 585.Xr tar 1 586program, and are not covered by the above license. 587.Sh RETURN CODE 588.Nm 589returns 0 on success, and non-zero on error. 590.Sh BUGS 591Please report bugs and send patches to the bug tracker at 592.Pa http://bugs.gw.com/ 593or the mailing list at 594.Aq file@mx.gw.com 595(visit 596.Pa http://mx.gw.com/mailman/listinfo/file 597first to subscribe). 598.Sh TODO 599Fix output so that tests for MIME and APPLE flags are not needed all 600over the place, and actual output is only done in one place. 601This needs a design. 602Suggestion: push possible outputs on to a list, then pick the 603last-pushed (most specific, one hopes) value at the end, or 604use a default if the list is empty. 605This should not slow down evaluation. 606.Pp 607The handling of 608.Dv MAGIC_CONTINUE 609and printing \e012- between entries is clumsy and complicated; refactor 610and centralize. 611.Pp 612Some of the encoding logic is hard-coded in encoding.c and can be moved 613to the magic files if we had a !:charset annotation 614.Pp 615Continue to squash all magic bugs. 616See Debian BTS for a good source. 617.Pp 618Store arbitrarily long strings, for example for %s patterns, so that 619they can be printed out. 620Fixes Debian bug #271672. 621This can be done by allocating strings in a string pool, storing the 622string pool at the end of the magic file and converting all the string 623pointers to relative offsets from the string pool. 624.Pp 625Add syntax for relative offsets after current level (Debian bug #466037). 626.Pp 627Make file -ki work, i.e. give multiple MIME types. 628.Pp 629Add a zip library so we can peek inside Office2007 documents to 630print more details about their contents. 631.Pp 632Add an option to print URLs for the sources of the file descriptions. 633.Pp 634Combine script searches and add a way to map executable names to MIME 635types (e.g. have a magic value for !:mime which causes the resulting 636string to be looked up in a table). 637This would avoid adding the same magic repeatedly for each new 638hash-bang interpreter. 639.Pp 640When a file descriptor is available, we can skip and adjust the buffer 641instead of the hacky buffer management we do now. 642.Pp 643Fix 644.Dq name 645and 646.Dq use 647to check for consistency at compile time (duplicate 648.Dq name , 649.Dq use 650pointing to undefined 651.Dq name 652). 653Make 654.Dq name 655/ 656.Dq use 657more efficient by keeping a sorted list of names. 658Special-case ^ to flip endianness in the parser so that it does not 659have to be escaped, and document it. 660.Pp 661If the offsets specified internally in the file exceed the buffer size 662( 663.Dv HOWMANY 664variable in file.h), then we don't seek to that offset, but we give up. 665It would be better if buffer managements was done when the file descriptor 666is available so move around the file. 667One must be careful though because this has performance (and thus security 668considerations). 669.Sh AVAILABILITY 670You can obtain the original author's latest version by anonymous FTP 671on 672.Pa ftp.astron.com 673in the directory 674.Pa /pub/file/file-X.YZ.tar.gz . 675