1.\" $File: file.man,v 1.66 2007/10/23 19:58:59 christos Exp $ 2.Dd January 8, 2007 3.Dt FILE __CSECTION__ 4.Os 5.Sh NAME 6.Nm file 7.Nd determine file type 8.Sh SYNOPSIS 9.Nm 10.Op Fl bchikLnNprsvz 11.Op Fl mime-type 12.Op Fl mime-encoding 13.Op Fl f Ar namefile 14.Op Fl F Ar separator 15.Op Fl m Ar magicfiles 16.Ar file 17.Nm 18.Fl C 19.Op Fl m Ar magicfile 20.Sh DESCRIPTION 21This manual page documents version __VERSION__ of the 22.Nm 23command. 24.Pp 25.Nm 26tests each argument in an attempt to classify it. 27There are three sets of tests, performed in this order: 28filesystem tests, magic number tests, and language tests. 29The 30.Em first 31test that succeeds causes the file type to be printed. 32.Pp 33The type printed will usually contain one of the words 34.Em text 35(the file contains only 36printing characters and a few common control 37characters and is probably safe to read on an 38.Dv ASCII 39terminal), 40.Em executable 41(the file contains the result of compiling a program 42in a form understandable to some 43.Dv UNIX 44kernel or another), 45or 46.Em data 47meaning anything else (data is usually 48.Sq binary 49or non-printable). 50Exceptions are well-known file formats (core files, tar archives) 51that are known to contain binary data. 52When modifying the file 53.Pa __MAGIC__ 54or the program itself, make sure to 55.Em "preserve these keywords" . 56People depend on knowing that all the readable files in a directory 57have the word 58.Dq text 59printed. 60Don't do as Berkeley did and change 61.Dq shell commands text 62to 63.Dq shell script . 64Note that the file 65.Pa __MAGIC__ 66is built mechanically from a large number of small files in 67the subdirectory 68.Pa Magdir 69in the source distribution of this program. 70.Pp 71The filesystem tests are based on examining the return from a 72.Xr stat 2 73system call. 74The program checks to see if the file is empty, 75or if it's some sort of special file. 76Any known file types appropriate to the system you are running on 77(sockets, symbolic links, or named pipes (FIFOs) on those systems that 78implement them) 79are intuited if they are defined in 80the system header file 81.In sys/stat.h . 82.Pp 83The magic number tests are used to check for files with data in 84particular fixed formats. 85The canonical example of this is a binary executable (compiled program) 86.Dv a.out 87file, whose format is defined in 88.In elf.h , 89.In a.out.h 90and possibly 91.In exec.h 92in the standard include directory. 93These files have a 94.Sq "magic number" 95stored in a particular place 96near the beginning of the file that tells the 97.Dv UNIX operating system 98that the file is a binary executable, and which of several types thereof. 99The concept of a 100.Sq "magic number" 101has been applied by extension to data files. 102Any file with some invariant identifier at a small fixed 103offset into the file can usually be described in this way. 104The information identifying these files is read from the compiled 105magic file 106.Pa __MAGIC__.mgc , 107or 108.Pa __MAGIC__ 109if the compile file does not exist. In addition 110.Nm 111will look in 112.Pa $HOME/.magic.mgc , 113or 114.Pa $HOME/.magic 115for magic entries. 116.Pp 117If a file does not match any of the entries in the magic file, 118it is examined to see if it seems to be a text file. 119ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 120(such as those used on Macintosh and IBM PC systems), 121UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 122character sets can be distinguished by the different 123ranges and sequences of bytes that constitute printable text 124in each set. 125If a file passes any of these tests, its character set is reported. 126ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 127as 128.Dq text 129because they will be mostly readable on nearly any terminal; 130UTF-16 and EBCDIC are only 131.Dq character data 132because, while 133they contain text, it is text that will require translation 134before it can be read. 135In addition, 136.Nm 137will attempt to determine other characteristics of text-type files. 138If the lines of a file are terminated by CR, CRLF, or NEL, instead 139of the Unix-standard LF, this will be reported. 140Files that contain embedded escape sequences or overstriking 141will also be identified. 142.Pp 143Once 144.Nm 145has determined the character set used in a text-type file, 146it will 147attempt to determine in what language the file is written. 148The language tests look for particular strings (cf 149.In names.h 150that can appear anywhere in the first few blocks of a file. 151For example, the keyword 152.Em .br 153indicates that the file is most likely a 154.Xr troff 1 155input file, just as the keyword 156.Em struct 157indicates a C program. 158These tests are less reliable than the previous 159two groups, so they are performed last. 160The language test routines also test for some miscellany 161(such as 162.Xr tar 1 163archives). 164.Pp 165Any file that cannot be identified as having been written 166in any of the character sets listed above is simply said to be ``data''. 167.Sh OPTIONS 168.Bl -tag -width indent 169.It Fl b , -brief 170Do not prepend filenames to output lines (brief mode). 171.It Fl c , -checking-printout 172Cause a checking printout of the parsed form of the magic file. 173This is usually used in conjunction with the 174.Fl m 175flag to debug a new magic file before installing it. 176.It Fl C , -compile 177Write a 178.Pa magic.mgc 179output file that contains a pre-parsed version of the magic file. 180.It Fl e , -exclude Ar testname 181Exclude the test named in 182.Ar testname 183from the list of tests made to determine the file type. Valid test names 184are: 185.Bl -tag -width 186.It apptype 187Check for 188.Dv EMX 189application type (only on EMX). 190.It ascii 191Check for various types of ascii files. 192.It compress 193Don't look for, or inside compressed files. 194.It elf 195Don't print elf details. 196.It fortran 197Don't look for fortran sequences inside ascii files. 198.It soft 199Don't consult magic files. 200.It tar 201Don't examine tar files. 202.It token 203Don't look for known tokens inside ascii files. 204.It troff 205Don't look for troff sequences inside ascii files. 206.El 207.It Fl f , -files-from Ar namefile 208Read the names of the files to be examined from 209.Ar namefile 210(one per line) 211before the argument list. 212Either 213.Ar namefile 214or at least one filename argument must be present; 215to test the standard input, use 216.Sq - 217as a filename argument. 218.It Fl F , -separator Ar separator 219Use the specified string as the separator between the filename and the 220file result returned. Defaults to 221.Sq \&: . 222.It Fl h , -no-dereference 223option causes symlinks not to be followed 224(on systems that support symbolic links). This is the default if the 225environment variable 226.Dv POSIXLY_CORRECT 227is not defined. 228.It Fl i , -mime 229Causes the file command to output mime type strings rather than the more 230traditional human readable ones. Thus it may say 231.Dq text/plain charset=us-ascii 232rather than 233.Dq ASCII text . 234In order for this option to work, file changes the way 235it handles files recognized by the command itself (such as many of the 236text file types, directories etc), and makes use of an alternative 237.Dq magic 238file. 239(See 240.Dq FILES 241section, below). 242.It Fl -mime-type , -mime-encoding 243Like 244.Fl i , 245but print only the specified element(s). 246.It Fl k , -keep-going 247Don't stop at the first match, keep going. 248.It Fl L , -dereference 249option causes symlinks to be followed, as the like-named option in 250.Xr ls 1 251(on systems that support symbolic links). 252This is the default if the environment variable 253.Dv POSIXLY_CORRECT 254is defined. 255.It Fl m , -magic-file Ar list 256Specify an alternate list of files containing magic numbers. 257This can be a single file, or a colon-separated list of files. 258If a compiled magic file is found alongside, it will be used instead. 259With the 260.Fl i 261or 262.Fl "mime" 263option, the program adds 264.Dq .mime 265to each file name. 266.It Fl n , -no-buffer 267Force stdout to be flushed after checking each file. 268This is only useful if checking a list of files. 269It is intended to be used by programs that want filetype output from a pipe. 270.It Fl N , -no-pad 271Don't pad filenames so that they align in the output. 272.It Fl p , -preserve-date 273On systems that support 274.Xr utime 2 275or 276.Xr utimes 2 , 277attempt to preserve the access time of files analyzed, to pretend that 278.Nm 279never read them. 280.It Fl r , -raw 281Don't translate unprintable characters to \eooo. 282Normally 283.Nm 284translates unprintable characters to their octal representation. 285.It Fl s , -special-files 286Normally, 287.Nm 288only attempts to read and determine the type of argument files which 289.Xr stat 2 290reports are ordinary files. 291This prevents problems, because reading special files may have peculiar 292consequences. 293Specifying the 294.Fl s 295option causes 296.Nm 297to also read argument files which are block or character special files. 298This is useful for determining the filesystem types of the data in raw 299disk partitions, which are block special files. 300This option also causes 301.Nm 302to disregard the file size as reported by 303.Xr stat 2 304since on some systems it reports a zero size for raw disk partitions. 305.It Fl v , -version 306Print the version of the program and exit. 307.It Fl z , -uncompress 308Try to look inside compressed files. 309.It Fl 0 , -print0 310Output a null character 311.Sq \e0 312after the end of the filename. Nice to 313.Xr cut 1 314the output. This does not affect the separator which is still printed. 315.It Fl -help 316Print a help message and exit. 317.El 318.Sh FILES 319.Bl -tag -width __MAGIC__.mime.mgc -compact 320.It Pa __MAGIC__.mgc 321Default compiled list of magic numbers 322.It Pa __MAGIC__ 323Default list of magic numbers 324.It Pa __MAGIC__.mime.mgc 325Default compiled list of magic numbers, used to output mime types when 326the 327.Fl i 328option is specified. 329.It Pa __MAGIC__.mime 330Default list of magic numbers, used to output mime types when the 331.Fl i 332option is specified. 333.El 334.Sh ENVIRONMENT 335The environment variable 336.Dv MAGIC 337can be used to set the default magic number file name. 338If that variable is set, then 339.Nm 340will not attempt to open 341.Pa $HOME/.magic . 342.Nm 343adds 344.Dq .mime 345and/or 346.Dq .mgc 347to the value of this variable as appropriate. 348The environment variable 349.Dv POSIXLY_CORRECT 350controls (on systems that support symbolic links), if 351.Nm 352will attempt to follow symlinks or not. If set, then 353.Nm 354follows symlink, otherwise it does not. This is also controlled 355by the 356.Fl L 357and 358.Fl h 359options. 360.Sh SEE ALSO 361.Xr magic __FSECTION__ , 362.Xr strings 1 , 363.Xr od 1 , 364.Xr hexdump 1 365.Sh STANDARDS CONFORMANCE 366This program is believed to exceed the System V Interface Definition 367of FILE(CMD), as near as one can determine from the vague language 368contained therein. 369Its behavior is mostly compatible with the System V program of the same name. 370This version knows more magic, however, so it will produce 371different (albeit more accurate) output in many cases. 372.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 373.Pp 374The one significant difference 375between this version and System V 376is that this version treats any white space 377as a delimiter, so that spaces in pattern strings must be escaped. 378For example, 379.Bd -literal -offset indent 380>10 string language impress\ (imPRESS data) 381.Ed 382.Pp 383in an existing magic file would have to be changed to 384.Bd -literal -offset indent 385>10 string language\e impress (imPRESS data) 386.Ed 387.Pp 388In addition, in this version, if a pattern string contains a backslash, 389it must be escaped. 390For example 391.Bd -literal -offset indent 3920 string \ebegindata Andrew Toolkit document 393.Ed 394.Pp 395in an existing magic file would have to be changed to 396.Bd -literal -offset indent 3970 string \e\ebegindata Andrew Toolkit document 398.Ed 399.Pp 400SunOS releases 3.2 and later from Sun Microsystems include a 401.Nm 402command derived from the System V one, but with some extensions. 403My version differs from Sun's only in minor ways. 404It includes the extension of the 405.Sq & 406operator, used as, 407for example, 408.Bd -literal -offset indent 409>16 long&0x7fffffff >0 not stripped 410.Ed 411.Sh MAGIC DIRECTORY 412The magic file entries have been collected from various sources, 413mainly USENET, and contributed by various authors. 414Christos Zoulas (address below) will collect additional 415or corrected magic file entries. 416A consolidation of magic file entries 417will be distributed periodically. 418.Pp 419The order of entries in the magic file is significant. 420Depending on what system you are using, the order that 421they are put together may be incorrect. 422If your old 423.Nm 424command uses a magic file, 425keep the old magic file around for comparison purposes 426(rename it to 427.Pa __MAGIC__.orig ). 428.Sh EXAMPLES 429.Bd -literal -offset indent 430$ file file.c file /dev/{wd0a,hda} 431file.c: C program text 432file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 433 dynamically linked (uses shared libs), stripped 434/dev/wd0a: block special (0/0) 435/dev/hda: block special (3/0) 436 437$ file -s /dev/wd0{b,d} 438/dev/wd0b: data 439/dev/wd0d: x86 boot sector 440 441$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 442/dev/hda: x86 boot sector 443/dev/hda1: Linux/i386 ext2 filesystem 444/dev/hda2: x86 boot sector 445/dev/hda3: x86 boot sector, extended partition table 446/dev/hda4: Linux/i386 ext2 filesystem 447/dev/hda5: Linux/i386 swap file 448/dev/hda6: Linux/i386 swap file 449/dev/hda7: Linux/i386 swap file 450/dev/hda8: Linux/i386 swap file 451/dev/hda9: empty 452/dev/hda10: empty 453 454$ file -i file.c file /dev/{wd0a,hda} 455file.c: text/x-c 456file: application/x-executable 457/dev/hda: application/x-not-regular-file 458/dev/wd0a: application/x-not-regular-file 459 460.Ed 461.Sh HISTORY 462There has been a 463.Nm 464command in every 465.Dv UNIX since at least Research Version 4 466(man page dated November, 1973). 467The System V version introduced one significant major change: 468the external list of magic number types. 469This slowed the program down slightly but made it a lot more flexible. 470.Pp 471This program, based on the System V version, 472was written by Ian Darwin <ian@darwinsys.com> 473without looking at anybody else's source code. 474.Pp 475John Gilmore revised the code extensively, making it better than 476the first version. 477Geoff Collyer found several inadequacies 478and provided some magic file entries. 479Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989. 480.Pp 481Guy Harris, guy@netapp.com, made many changes from 1993 to the present. 482.Pp 483Primary development and maintenance from 1990 to the present by 484Christos Zoulas (christos@astron.com). 485.Pp 486Altered by Chris Lowth, chris@lowth.com, 2000: 487Handle the 488.Fl i 489option to output mime type strings and using an alternative 490magic file and internal logic. 491.Pp 492Altered by Eric Fischer (enf@pobox.com), July, 2000, 493to identify character codes and attempt to identify the languages 494of non-ASCII files. 495.Pp 496The list of contributors to the "Magdir" directory (source for the 497.Pa __MAGIC__ 498file) is too long to include here. 499You know who you are; thank you. 500.Sh LEGAL NOTICE 501Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 502Covered by the standard Berkeley Software Distribution copyright; see the file 503LEGAL.NOTICE in the source distribution. 504.Pp 505The files 506.Dv tar.h 507and 508.Dv is_tar.c 509were written by John Gilmore from his public-domain 510.Xr tar 1 511program, and are not covered by the above license. 512.Sh BUGS 513There must be a better way to automate the construction of the Magic 514file from all the glop in Magdir. 515What is it? 516.\" Compilation support has been done 517.\" Better yet, the magic file should be compiled into binary (say, 518.\" .Xr ndbm 3 519.\" or, better yet, fixed-length 520.\" .Dv ASCII 521.\" strings for use in heterogenous network environments) for faster startup. 522.\" Then the program would run as fast as the Version 7 program of the same 523.\" name, with the flexibility of the System V version. 524.Pp 525.Nm 526uses several algorithms that favor speed over accuracy, 527thus it can be misled about the contents of 528text 529files. 530.Pp 531The support for text files (primarily for programming languages) 532is simplistic, inefficient and requires recompilation to update. 533.\" Else support has been done 534.\" There should be an 535.\" .Dv else 536.\" clause to follow a series of continuation lines. 537.\" .Pp 538.\" Regular expression support has been done 539.\" The magic file and keywords should have regular expression support. 540Their use of 541.Dv ASCII TAB 542as a field delimiter is ugly and makes 543it hard to edit the files, but is entrenched. 544.Pp 545It might be advisable to allow upper-case letters in keywords 546for e.g., 547.Xr troff 1 548commands vs man page macros. 549Regular expression support would make this easy. 550.Pp 551The program doesn't grok 552.Dv FORTRAN . 553It should be able to figure 554.Dv FORTRAN 555by seeing some keywords which 556appear indented at the start of line. 557Regular expression support would make this easy. 558.Pp 559The list of keywords in 560.Dv ascmagic 561probably belongs in the Magic file. 562This could be done by using some keyword like 563.Sq * 564for the offset value. 565.Pp 566.\" Sorting has been done. 567.\" Another optimization would be to sort 568.\" the magic file so that we can just run down all the 569.\" tests for the first byte, first word, first long, etc, once we 570.\" have fetched it. 571Complain about conflicts in the magic file entries. 572Make a rule that the magic entries sort based on file offset rather 573than position within the magic file? 574.Pp 575The program should provide a way to give an estimate 576of 577.Dq how good 578a guess is. 579We end up removing guesses (e.g. 580.Dq From\ 581as first 5 chars of file) because 582they are not as good as other guesses (e.g. 583.Dq Newsgroups: 584versus 585.Dq Return-Path: 586). 587Still, if the others don't pan out, it should be possible to use the 588first guess. 589.Pp 590This program is slower than some vendors' file commands. 591The new support for multiple character codes makes it even slower. 592.Pp 593This manual page, and particularly this section, is too long. 594.Sh AVAILABILITY 595You can obtain the original author's latest version by anonymous FTP 596on 597.Dv ftp.astron.com 598in the directory 599.Dv /pub/file/file-X.YZ.tar.gz 600