1.\" $NetBSD: file.1,v 1.9 2012/02/22 17:53:50 christos Exp $ 2.\" 3.\" $File: file.man,v 1.98 2011/12/08 12:12:46 rrt Exp $ 4.Dd October 17, 2011 5.Dt FILE 1 6.Os 7.Sh NAME 8.Nm file 9.Nd determine file type 10.Sh SYNOPSIS 11.Nm 12.Bk -words 13.Op Fl bchiklLNnprsvz0 14.Op Fl Fl apple 15.Op Fl Fl mime-encoding 16.Op Fl Fl mime-type 17.Op Fl e Ar testname 18.Op Fl F Ar separator 19.Op Fl f Ar namefile 20.Op Fl m Ar magicfiles 21.Ar 22.Ek 23.Nm 24.Fl C 25.Op Fl m Ar magicfiles 26.Nm 27.Op Fl Fl help 28.Sh DESCRIPTION 29This manual page documents version 5.11 of the 30.Nm 31command. 32.Pp 33.Nm 34tests each argument in an attempt to classify it. 35There are three sets of tests, performed in this order: 36filesystem tests, magic tests, and language tests. 37The 38.Em first 39test that succeeds causes the file type to be printed. 40.Pp 41The type printed will usually contain one of the words 42.Em text 43(the file contains only 44printing characters and a few common control 45characters and is probably safe to read on an 46.Dv ASCII 47terminal), 48.Em executable 49(the file contains the result of compiling a program 50in a form understandable to some 51.Tn UNIX 52kernel or another), 53or 54.Em data 55meaning anything else (data is usually 56.Dq binary 57or non-printable). 58Exceptions are well-known file formats (core files, tar archives) 59that are known to contain binary data. 60When modifying magic files or the program itself, make sure to 61.Em "preserve these keywords" . 62Users depend on knowing that all the readable files in a directory 63have the word 64.Dq text 65printed. 66Don't do as Berkeley did and change 67.Dq shell commands text 68to 69.Dq shell script . 70.Pp 71The filesystem tests are based on examining the return from a 72.Xr stat 2 73system call. 74The program checks to see if the file is empty, 75or if it's some sort of special file. 76Any known file types appropriate to the system you are running on 77(sockets, symbolic links, or named pipes (FIFOs) on those systems that 78implement them) 79are intuited if they are defined in the system header file 80.In sys/stat.h . 81.Pp 82The magic tests are used to check for files with data in 83particular fixed formats. 84The canonical example of this is a binary executable (compiled program) 85.Dv a.out 86file, whose format is defined in 87.In elf.h , 88.In a.out.h 89and possibly 90.In exec.h 91in the standard include directory. 92These files have a 93.Dq "magic number" 94stored in a particular place 95near the beginning of the file that tells the 96.Tn UNIX 97operating system 98that the file is a binary executable, and which of several types thereof. 99The concept of a 100.Dq "magic" 101has been applied by extension to data files. 102Any file with some invariant identifier at a small fixed 103offset into the file can usually be described in this way. 104The information identifying these files is read from the compiled 105magic file 106.Pa /usr/share/misc/magic.mgc , 107or the files in the directory 108.Pa /usr/share/misc/magic 109if the compiled file does not exist. 110In addition, if 111.Pa $HOME/.magic.mgc 112or 113.Pa $HOME/.magic 114exists, it will be used in preference to the system magic files. 115.Pp 116If a file does not match any of the entries in the magic file, 117it is examined to see if it seems to be a text file. 118ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 119(such as those used on Macintosh and IBM PC systems), 120UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 121character sets can be distinguished by the different 122ranges and sequences of bytes that constitute printable text 123in each set. 124If a file passes any of these tests, its character set is reported. 125ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 126as 127.Dq text 128because they will be mostly readable on nearly any terminal; 129UTF-16 and EBCDIC are only 130.Dq character data 131because, while 132they contain text, it is text that will require translation 133before it can be read. 134In addition, 135.Nm 136will attempt to determine other characteristics of text-type files. 137If the lines of a file are terminated by CR, CRLF, or NEL, instead 138of the Unix-standard LF, this will be reported. 139Files that contain embedded escape sequences or overstriking 140will also be identified. 141.Pp 142Once 143.Nm 144has determined the character set used in a text-type file, 145it will 146attempt to determine in what language the file is written. 147The language tests look for particular strings (cf. 148.In names.h ) 149that can appear anywhere in the first few blocks of a file. 150For example, the keyword 151.Em .br 152indicates that the file is most likely a 153.Xr troff 1 154input file, just as the keyword 155.Em struct 156indicates a C program. 157These tests are less reliable than the previous 158two groups, so they are performed last. 159The language test routines also test for some miscellany 160(such as 161.Xr tar 1 162archives). 163.Pp 164Any file that cannot be identified as having been written 165in any of the character sets listed above is simply said to be 166.Dq data . 167.Sh OPTIONS 168.Bl -tag -width indent 169.It Fl b , Fl Fl brief 170Do not prepend filenames to output lines (brief mode). 171.It Fl C , Fl Fl compile 172Write a 173.Pa magic.mgc 174output file that contains a pre-parsed version of the magic file or directory. 175.It Fl c , Fl Fl checking-printout 176Cause a checking printout of the parsed form of the magic file. 177This is usually used in conjunction with the 178.Fl m 179flag to debug a new magic file before installing it. 180.It Fl e , Fl Fl exclude Ar testname 181Exclude the test named in 182.Ar testname 183from the list of tests made to determine the file type. 184Valid test names are: 185.Bl -tag -width compress 186.It apptype 187.Dv EMX 188application type (only on EMX). 189.It ascii 190Various types of text files (this test will try to guess the text 191encoding, irrespective of the setting of the 192.Sq encoding 193option). 194.It encoding 195Different text encodings for soft magic tests. 196.It tokens 197Ignored for backwards compatibility. 198.It cdf 199Prints details of Compound Document Files. 200.It compress 201Checks for, and looks inside, compressed files. 202.It elf 203Prints ELF file details. 204.It soft 205Consults magic files. 206.It tar 207Examines tar files. 208.El 209.It Fl F , Fl Fl separator Ar separator 210Use the specified string as the separator between the filename and the 211file result returned. 212Defaults to 213.Sq \&: . 214.It Fl f , Fl Fl files-from Ar namefile 215Read the names of the files to be examined from 216.Ar namefile 217(one per line) 218before the argument list. 219Either 220.Ar namefile 221or at least one filename argument must be present; 222to test the standard input, use 223.Sq - 224as a filename argument. 225Please note that 226.Ar namefile 227is unwrapped and the enclosed filenames are processed when this option is 228encountered and before any further options processing is done. 229This allows one to process multiple lists of files with different command line 230arguments on the same 231.Nm 232invocation. 233Thus if you want to set the delimiter, you need to do it before you specify 234the list of files, like: 235.Dq Fl F Ar @ Fl f Ar namefile , 236instead of: 237.Dq Fl f Ar namefile Fl F Ar @ . 238.It Fl h , Fl Fl no-dereference 239option causes symlinks not to be followed 240(on systems that support symbolic links). 241This is the default if the environment variable 242.Dv POSIXLY_CORRECT 243is not defined. 244.It Fl i , Fl Fl mime 245Causes the file command to output mime type strings rather than the more 246traditional human readable ones. 247Thus it may say 248.Sq text/plain; charset=us-ascii 249rather than 250.Dq ASCII text . 251.It Fl Fl mime-type , Fl Fl mime-encoding 252Like 253.Fl i , 254but print only the specified element(s). 255.It Fl k , Fl Fl keep-going 256Don't stop at the first match, keep going. 257Subsequent matches will be 258have the string 259.Sq "\[rs]012\- " 260prepended. 261(If you want a newline, see the 262.Fl r 263option.) 264.It Fl l , Fl Fl list 265Print information about the strength of each magic pattern. 266.It Fl L , Fl Fl dereference 267option causes symlinks to be followed, as the like-named option in 268.Xr ls 1 269(on systems that support symbolic links). 270This is the default if the environment variable 271.Ev POSIXLY_CORRECT 272is defined. 273.It Fl l 274Shows sorted patterns list in the order which is used for the matching. 275.It Fl m , Fl Fl magic-file Ar magicfiles 276Specify an alternate list of files and directories containing magic. 277This can be a single item, or a colon-separated list. 278If a compiled magic file is found alongside a file or directory, 279it will be used instead. 280.It Fl N , Fl Fl no-pad 281Don't pad filenames so that they align in the output. 282.It Fl n , Fl Fl no-buffer 283Force stdout to be flushed after checking each file. 284This is only useful if checking a list of files. 285It is intended to be used by programs that want filetype output from a pipe. 286.It Fl p , Fl Fl preserve-date 287On systems that support 288.Xr utime 3 289or 290.Xr utimes 2 , 291attempt to preserve the access time of files analyzed, to pretend that 292.Nm 293never read them. 294.It Fl r , Fl Fl raw 295Don't translate unprintable characters to \eooo. 296Normally 297.Nm 298translates unprintable characters to their octal representation. 299.It Fl s , Fl Fl special-files 300Normally, 301.Nm 302only attempts to read and determine the type of argument files which 303.Xr stat 2 304reports are ordinary files. 305This prevents problems, because reading special files may have peculiar 306consequences. 307Specifying the 308.Fl s 309option causes 310.Nm 311to also read argument files which are block or character special files. 312This is useful for determining the filesystem types of the data in raw 313disk partitions, which are block special files. 314This option also causes 315.Nm 316to disregard the file size as reported by 317.Xr stat 2 318since on some systems it reports a zero size for raw disk partitions. 319.It Fl v , Fl Fl version 320Print the version of the program and exit. 321.It Fl z , Fl Fl uncompress 322Try to look inside compressed files. 323.It Fl 0 , Fl Fl print0 324Output a null character 325.Sq \e0 326after the end of the filename. 327Nice to 328.Xr cut 1 329the output. 330This does not affect the separator which is still printed. 331.It Fl -help 332Print a help message and exit. 333.El 334.Sh FILES 335.Bl -tag -width /usr/share/misc/magic.mgc -compact 336.It Pa /usr/share/misc/magic.mgc 337Default compiled list of magic. 338.It Pa /usr/share/misc/magic 339Directory containing default magic files. 340.El 341.Sh ENVIRONMENT 342The environment variable 343.Ev MAGIC 344can be used to set the default magic file name. 345If that variable is set, then 346.Nm 347will not attempt to open 348.Pa $HOME/.magic . 349.Nm 350adds 351.Dq Pa .mgc 352to the value of this variable as appropriate. 353However, 354.Pa file 355has to exist in order for 356.Pa file.mime 357to be considered. 358The environment variable 359.Ev POSIXLY_CORRECT 360controls (on systems that support symbolic links), whether 361.Nm 362will attempt to follow symlinks or not. 363If set, then 364.Nm 365follows symlink, otherwise it does not. 366This is also controlled by the 367.Fl L 368and 369.Fl h 370options. 371.Sh SEE ALSO 372.Xr magic 5 , 373.Xr hexdump 1 , 374.Xr od 1 , 375.Xr strings 1 , 376.Sh STANDARDS CONFORMANCE 377This program is believed to exceed the System V Interface Definition 378of FILE(CMD), as near as one can determine from the vague language 379contained therein. 380Its behavior is mostly compatible with the System V program of the same name. 381This version knows more magic, however, so it will produce 382different (albeit more accurate) output in many cases. 383.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 384.Pp 385The one significant difference 386between this version and System V 387is that this version treats any white space 388as a delimiter, so that spaces in pattern strings must be escaped. 389For example, 390.Bd -literal -offset indent 391\*[Gt]10 string language impress\ (imPRESS data) 392.Ed 393.Pp 394in an existing magic file would have to be changed to 395.Bd -literal -offset indent 396\*[Gt]10 string language\e impress (imPRESS data) 397.Ed 398.Pp 399In addition, in this version, if a pattern string contains a backslash, 400it must be escaped. 401For example 402.Bd -literal -offset indent 4030 string \ebegindata Andrew Toolkit document 404.Ed 405.Pp 406in an existing magic file would have to be changed to 407.Bd -literal -offset indent 4080 string \e\ebegindata Andrew Toolkit document 409.Ed 410.Pp 411SunOS releases 3.2 and later from Sun Microsystems include a 412.Nm 413command derived from the System V one, but with some extensions. 414This version differs from Sun's only in minor ways. 415It includes the extension of the 416.Sq \*[Am] 417operator, used as, 418for example, 419.Bd -literal -offset indent 420\*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped 421.Ed 422.Sh MAGIC DIRECTORY 423The magic file entries have been collected from various sources, 424mainly USENET, and contributed by various authors. 425Christos Zoulas (address below) will collect additional 426or corrected magic file entries. 427A consolidation of magic file entries 428will be distributed periodically. 429.Pp 430The order of entries in the magic file is significant. 431Depending on what system you are using, the order that 432they are put together may be incorrect. 433If your old 434.Nm 435command uses a magic file, 436keep the old magic file around for comparison purposes 437(rename it to 438.Pa /usr/share/misc/magic.orig ) . 439.Sh EXAMPLES 440.Bd -literal -offset indent 441$ file file.c file /dev/{wd0a,hda} 442file.c: C program text 443file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 444 dynamically linked (uses shared libs), stripped 445/dev/wd0a: block special (0/0) 446/dev/hda: block special (3/0) 447 448$ file -s /dev/wd0{b,d} 449/dev/wd0b: data 450/dev/wd0d: x86 boot sector 451 452$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 453/dev/hda: x86 boot sector 454/dev/hda1: Linux/i386 ext2 filesystem 455/dev/hda2: x86 boot sector 456/dev/hda3: x86 boot sector, extended partition table 457/dev/hda4: Linux/i386 ext2 filesystem 458/dev/hda5: Linux/i386 swap file 459/dev/hda6: Linux/i386 swap file 460/dev/hda7: Linux/i386 swap file 461/dev/hda8: Linux/i386 swap file 462/dev/hda9: empty 463/dev/hda10: empty 464 465$ file -i file.c file /dev/{wd0a,hda} 466file.c: text/x-c 467file: application/x-executable 468/dev/hda: application/x-not-regular-file 469/dev/wd0a: application/x-not-regular-file 470 471.Ed 472.Sh HISTORY 473There has been a 474.Nm 475command in every 476.Dv UNIX since at least Research Version 4 477(man page dated November, 1973). 478The System V version introduced one significant major change: 479the external list of magic types. 480This slowed the program down slightly but made it a lot more flexible. 481.Pp 482This program, based on the System V version, 483was written by Ian Darwin 484.Aq ian@darwinsys.com 485without looking at anybody else's source code. 486.Pp 487John Gilmore revised the code extensively, making it better than 488the first version. 489Geoff Collyer found several inadequacies 490and provided some magic file entries. 491Contributions by the 492.Sq \*[Am] 493operator by Rob McMahon, 494.Aq cudcv@warwick.ac.uk , 4951989. 496.Pp 497Guy Harris, 498.Aq guy@netapp.com , 499made many changes from 1993 to the present. 5001989. 501.Pp 502Primary development and maintenance from 1990 to the present by 503Christos Zoulas 504.Aq christos@astron.com . 505.Pp 506Altered by Chris Lowth 507.Aq chris@lowth.com , 5082000: handle the 509.Fl i 510option to output mime type strings, using an alternative 511magic file and internal logic. 512.Pp 513Altered by Eric Fischer 514.Aq enf@pobox.com , 515July, 2000, 516to identify character codes and attempt to identify the languages 517of non-ASCII files. 518.Pp 519Altered by Reuben Thomas 520.Aq rrt@sc3d.org , 5212007-2011, to improve MIME support, merge MIME and non-MIME magic, 522support directories as well as files of magic, apply many bug fixes, 523update and fix a lot of magic, improve the build system, improve the 524documentation, and rewrite the Python bindings in pure Python. 525.Pp 526The list of contributors to the 527.Sq magic 528directory (magic files) 529is too long to include here. 530You know who you are; thank you. 531Many contributors are listed in the source files. 532.Sh LEGAL NOTICE 533Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 534Covered by the standard Berkeley Software Distribution copyright; see the file 535COPYING in the source distribution. 536.Pp 537The files 538.Pa tar.h 539and 540.Pa is_tar.c 541were written by John Gilmore from his public-domain 542.Xr tar 1 543program, and are not covered by the above license. 544.Sh RETURN CODE 545.Nm 546returns 0 on success, and non-zero on error. 547.Sh BUGS 548.Pp 549Please report bugs and send patches to the bug tracker at 550.Pa http://bugs.gw.com/ 551or the mailing list at 552.Aq file@mx.gw.com . 553.Sh TODO 554.Pp 555Fix output so that tests for MIME and APPLE flags are not needed all 556over the place, and actual output is only done in one place. This 557needs a design. Suggestion: push possible outputs on to a list, then 558pick the last-pushed (most specific, one hopes) value at the end, or 559use a default if the list is empty. This should not slow down evaluation. 560.Pp 561Continue to squash all magic bugs. See Debian BTS for a good source. 562.Pp 563Store arbitrarily long strings, for example for %s patterns, so that 564they can be printed out. Fixes Debian bug #271672. Would require more 565complex store/load code in apprentice. 566.Pp 567Add syntax for relative offsets after current level (Debian bug #466037). 568.Pp 569Make file -ki work, i.e. give multiple MIME types. 570.Pp 571Add a zip library so we can peek inside Office2007 documents to 572figure out what they are. 573.Pp 574Add an option to print URLs for the sources of the file descriptions. 575.Sh AVAILABILITY 576You can obtain the original author's latest version by anonymous FTP 577on 578.Pa ftp.astron.com 579in the directory 580.Pa /pub/file/file-X.YZ.tar.gz . 581