xref: /dflybsd-src/contrib/file/doc/file.man (revision 327e51cbf108c0f1619755494b30617e1089a42f)
1.\" $File: file.man,v 1.66 2007/10/23 19:58:59 christos Exp $
2.Dd January 8, 2007
3.Dt FILE __CSECTION__
4.Os
5.Sh NAME
6.Nm file
7.Nd determine file type
8.Sh SYNOPSIS
9.Nm
10.Op Fl bchikLnNprsvz
11.Op Fl mime-type
12.Op Fl mime-encoding
13.Op Fl f Ar namefile
14.Op Fl F Ar separator
15.Op Fl m Ar magicfiles
16.Ar file
17.Nm
18.Fl C
19.Op Fl m Ar magicfile
20.Sh DESCRIPTION
21This manual page documents version __VERSION__ of the
22.Nm
23command.
24.Pp
25.Nm
26tests each argument in an attempt to classify it.
27There are three sets of tests, performed in this order:
28filesystem tests, magic number tests, and language tests.
29The
30.Em first
31test that succeeds causes the file type to be printed.
32.Pp
33The type printed will usually contain one of the words
34.Em text
35(the file contains only
36printing characters and a few common control
37characters and is probably safe to read on an
38.Dv ASCII
39terminal),
40.Em executable
41(the file contains the result of compiling a program
42in a form understandable to some
43.Dv UNIX
44kernel or another),
45or
46.Em data
47meaning anything else (data is usually
48.Sq binary
49or non-printable).
50Exceptions are well-known file formats (core files, tar archives)
51that are known to contain binary data.
52When modifying the file
53.Pa __MAGIC__
54or the program itself, make sure to
55.Em "preserve these keywords" .
56People depend on knowing that all the readable files in a directory
57have the word
58.Dq text
59printed.
60Don't do as Berkeley did and change
61.Dq shell commands text
62to
63.Dq shell script .
64Note that the file
65.Pa __MAGIC__
66is built mechanically from a large number of small files in
67the subdirectory
68.Pa Magdir
69in the source distribution of this program.
70.Pp
71The filesystem tests are based on examining the return from a
72.Xr stat 2
73system call.
74The program checks to see if the file is empty,
75or if it's some sort of special file.
76Any known file types appropriate to the system you are running on
77(sockets, symbolic links, or named pipes (FIFOs) on those systems that
78implement them)
79are intuited if they are defined in
80the system header file
81.In sys/stat.h .
82.Pp
83The magic number tests are used to check for files with data in
84particular fixed formats.
85The canonical example of this is a binary executable (compiled program)
86.Dv a.out
87file, whose format is defined in
88.In elf.h ,
89.In a.out.h
90and possibly
91.In exec.h
92in the standard include directory.
93These files have a
94.Sq "magic number"
95stored in a particular place
96near the beginning of the file that tells the
97.Dv UNIX operating system
98that the file is a binary executable, and which of several types thereof.
99The concept of a
100.Sq "magic number"
101has been applied by extension to data files.
102Any file with some invariant identifier at a small fixed
103offset into the file can usually be described in this way.
104The information identifying these files is read from the compiled
105magic file
106.Pa __MAGIC__.mgc ,
107or
108.Pa __MAGIC__
109if the compile file does not exist. In addition
110.Nm
111will look in
112.Pa $HOME/.magic.mgc ,
113or
114.Pa $HOME/.magic
115for magic entries.
116.Pp
117If a file does not match any of the entries in the magic file,
118it is examined to see if it seems to be a text file.
119ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
120(such as those used on Macintosh and IBM PC systems),
121UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
122character sets can be distinguished by the different
123ranges and sequences of bytes that constitute printable text
124in each set.
125If a file passes any of these tests, its character set is reported.
126ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
127as
128.Dq text
129because they will be mostly readable on nearly any terminal;
130UTF-16 and EBCDIC are only
131.Dq character data
132because, while
133they contain text, it is text that will require translation
134before it can be read.
135In addition,
136.Nm
137will attempt to determine other characteristics of text-type files.
138If the lines of a file are terminated by CR, CRLF, or NEL, instead
139of the Unix-standard LF, this will be reported.
140Files that contain embedded escape sequences or overstriking
141will also be identified.
142.Pp
143Once
144.Nm
145has determined the character set used in a text-type file,
146it will
147attempt to determine in what language the file is written.
148The language tests look for particular strings (cf
149.In names.h
150that can appear anywhere in the first few blocks of a file.
151For example, the keyword
152.Em .br
153indicates that the file is most likely a
154.Xr troff 1
155input file, just as the keyword
156.Em struct
157indicates a C program.
158These tests are less reliable than the previous
159two groups, so they are performed last.
160The language test routines also test for some miscellany
161(such as
162.Xr tar 1
163archives).
164.Pp
165Any file that cannot be identified as having been written
166in any of the character sets listed above is simply said to be ``data''.
167.Sh OPTIONS
168.Bl -tag -width indent
169.It Fl b , -brief
170Do not prepend filenames to output lines (brief mode).
171.It Fl c , -checking-printout
172Cause a checking printout of the parsed form of the magic file.
173This is usually used in conjunction with the
174.Fl m
175flag to debug a new magic file before installing it.
176.It Fl C , -compile
177Write a
178.Pa magic.mgc
179output file that contains a pre-parsed version of the magic file.
180.It Fl e , -exclude Ar testname
181Exclude the test named in
182.Ar testname
183from the list of tests made to determine the file type. Valid test names
184are:
185.Bl -tag -width
186.It apptype
187Check for
188.Dv EMX
189application type (only on EMX).
190.It ascii
191Check for various types of ascii files.
192.It compress
193Don't look for, or inside compressed files.
194.It elf
195Don't print elf details.
196.It fortran
197Don't look for fortran sequences inside ascii files.
198.It soft
199Don't consult magic files.
200.It tar
201Don't examine tar files.
202.It token
203Don't look for known tokens inside ascii files.
204.It troff
205Don't look for troff sequences inside ascii files.
206.El
207.It Fl f , -files-from Ar namefile
208Read the names of the files to be examined from
209.Ar namefile
210(one per line)
211before the argument list.
212Either
213.Ar namefile
214or at least one filename argument must be present;
215to test the standard input, use
216.Sq -
217as a filename argument.
218.It Fl F , -separator Ar separator
219Use the specified string as the separator between the filename and the
220file result returned. Defaults to
221.Sq \&: .
222.It Fl h , -no-dereference
223option causes symlinks not to be followed
224(on systems that support symbolic links). This is the default if the
225environment variable
226.Dv POSIXLY_CORRECT
227is not defined.
228.It Fl i , -mime
229Causes the file command to output mime type strings rather than the more
230traditional human readable ones. Thus it may say
231.Dq text/plain charset=us-ascii
232rather than
233.Dq ASCII text .
234In order for this option to work, file changes the way
235it handles files recognized by the command itself (such as many of the
236text file types, directories etc), and makes use of an alternative
237.Dq magic
238file.
239(See
240.Dq FILES
241section, below).
242.It Fl -mime-type ,  -mime-encoding
243Like
244.Fl i ,
245but print only the specified element(s).
246.It Fl k , -keep-going
247Don't stop at the first match, keep going.
248.It Fl L , -dereference
249option causes symlinks to be followed, as the like-named option in
250.Xr ls 1
251(on systems that support symbolic links).
252This is the default if the environment variable
253.Dv POSIXLY_CORRECT
254is defined.
255.It Fl m , -magic-file Ar list
256Specify an alternate list of files containing magic numbers.
257This can be a single file, or a colon-separated list of files.
258If a compiled magic file is found alongside, it will be used instead.
259With the
260.Fl i
261or
262.Fl "mime"
263option, the program adds
264.Dq .mime
265to each file name.
266.It Fl n , -no-buffer
267Force stdout to be flushed after checking each file.
268This is only useful if checking a list of files.
269It is intended to be used by programs that want filetype output from a pipe.
270.It Fl N , -no-pad
271Don't pad filenames so that they align in the output.
272.It Fl p , -preserve-date
273On systems that support
274.Xr utime 2
275or
276.Xr utimes 2 ,
277attempt to preserve the access time of files analyzed, to pretend that
278.Nm
279never read them.
280.It Fl r , -raw
281Don't translate unprintable characters to \eooo.
282Normally
283.Nm
284translates unprintable characters to their octal representation.
285.It Fl s , -special-files
286Normally,
287.Nm
288only attempts to read and determine the type of argument files which
289.Xr stat 2
290reports are ordinary files.
291This prevents problems, because reading special files may have peculiar
292consequences.
293Specifying the
294.Fl s
295option causes
296.Nm
297to also read argument files which are block or character special files.
298This is useful for determining the filesystem types of the data in raw
299disk partitions, which are block special files.
300This option also causes
301.Nm
302to disregard the file size as reported by
303.Xr stat 2
304since on some systems it reports a zero size for raw disk partitions.
305.It Fl v , -version
306Print the version of the program and exit.
307.It Fl z , -uncompress
308Try to look inside compressed files.
309.It Fl 0 , -print0
310Output a null character
311.Sq \e0
312after the end of the filename. Nice to
313.Xr cut 1
314the output. This does not affect the separator which is still printed.
315.It Fl -help
316Print a help message and exit.
317.El
318.Sh FILES
319.Bl -tag -width __MAGIC__.mime.mgc -compact
320.It Pa __MAGIC__.mgc
321Default compiled list of magic numbers
322.It Pa __MAGIC__
323Default list of magic numbers
324.It Pa __MAGIC__.mime.mgc
325Default compiled list of magic numbers, used to output mime types when
326the
327.Fl i
328option is specified.
329.It Pa __MAGIC__.mime
330Default list of magic numbers, used to output mime types when the
331.Fl i
332option is specified.
333.El
334.Sh ENVIRONMENT
335The environment variable
336.Dv MAGIC
337can be used to set the default magic number file name.
338If that variable is set, then
339.Nm
340will not attempt to open
341.Pa $HOME/.magic .
342.Nm
343adds
344.Dq .mime
345and/or
346.Dq .mgc
347to the value of this variable as appropriate.
348The environment variable
349.Dv POSIXLY_CORRECT
350controls (on systems that support symbolic links), if
351.Nm
352will attempt to follow symlinks or not. If set, then
353.Nm
354follows symlink, otherwise it does not. This is also controlled
355by the
356.Fl L
357and
358.Fl h
359options.
360.Sh SEE ALSO
361.Xr magic __FSECTION__ ,
362.Xr strings 1 ,
363.Xr od 1 ,
364.Xr hexdump 1
365.Sh STANDARDS CONFORMANCE
366This program is believed to exceed the System V Interface Definition
367of FILE(CMD), as near as one can determine from the vague language
368contained therein.
369Its behavior is mostly compatible with the System V program of the same name.
370This version knows more magic, however, so it will produce
371different (albeit more accurate) output in many cases.
372.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
373.Pp
374The one significant difference
375between this version and System V
376is that this version treats any white space
377as a delimiter, so that spaces in pattern strings must be escaped.
378For example,
379.Bd -literal -offset indent
380>10	string	language impress\ 	(imPRESS data)
381.Ed
382.Pp
383in an existing magic file would have to be changed to
384.Bd -literal -offset indent
385>10	string	language\e impress	(imPRESS data)
386.Ed
387.Pp
388In addition, in this version, if a pattern string contains a backslash,
389it must be escaped.
390For example
391.Bd -literal -offset indent
3920	string		\ebegindata	Andrew Toolkit document
393.Ed
394.Pp
395in an existing magic file would have to be changed to
396.Bd -literal -offset indent
3970	string		\e\ebegindata	Andrew Toolkit document
398.Ed
399.Pp
400SunOS releases 3.2 and later from Sun Microsystems include a
401.Nm
402command derived from the System V one, but with some extensions.
403My version differs from Sun's only in minor ways.
404It includes the extension of the
405.Sq &
406operator, used as,
407for example,
408.Bd -literal -offset indent
409>16	long&0x7fffffff	>0		not stripped
410.Ed
411.Sh MAGIC DIRECTORY
412The magic file entries have been collected from various sources,
413mainly USENET, and contributed by various authors.
414Christos Zoulas (address below) will collect additional
415or corrected magic file entries.
416A consolidation of magic file entries
417will be distributed periodically.
418.Pp
419The order of entries in the magic file is significant.
420Depending on what system you are using, the order that
421they are put together may be incorrect.
422If your old
423.Nm
424command uses a magic file,
425keep the old magic file around for comparison purposes
426(rename it to
427.Pa __MAGIC__.orig ).
428.Sh EXAMPLES
429.Bd -literal -offset indent
430$ file file.c file /dev/{wd0a,hda}
431file.c:   C program text
432file:     ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
433          dynamically linked (uses shared libs), stripped
434/dev/wd0a: block special (0/0)
435/dev/hda: block special (3/0)
436
437$ file -s /dev/wd0{b,d}
438/dev/wd0b: data
439/dev/wd0d: x86 boot sector
440
441$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
442/dev/hda:   x86 boot sector
443/dev/hda1:  Linux/i386 ext2 filesystem
444/dev/hda2:  x86 boot sector
445/dev/hda3:  x86 boot sector, extended partition table
446/dev/hda4:  Linux/i386 ext2 filesystem
447/dev/hda5:  Linux/i386 swap file
448/dev/hda6:  Linux/i386 swap file
449/dev/hda7:  Linux/i386 swap file
450/dev/hda8:  Linux/i386 swap file
451/dev/hda9:  empty
452/dev/hda10: empty
453
454$ file -i file.c file /dev/{wd0a,hda}
455file.c:      text/x-c
456file:        application/x-executable
457/dev/hda:    application/x-not-regular-file
458/dev/wd0a:   application/x-not-regular-file
459
460.Ed
461.Sh HISTORY
462There has been a
463.Nm
464command in every
465.Dv UNIX since at least Research Version 4
466(man page dated November, 1973).
467The System V version introduced one significant major change:
468the external list of magic number types.
469This slowed the program down slightly but made it a lot more flexible.
470.Pp
471This program, based on the System V version,
472was written by Ian Darwin <ian@darwinsys.com>
473without looking at anybody else's source code.
474.Pp
475John Gilmore revised the code extensively, making it better than
476the first version.
477Geoff Collyer found several inadequacies
478and provided some magic file entries.
479Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
480.Pp
481Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
482.Pp
483Primary development and maintenance from 1990 to the present by
484Christos Zoulas (christos@astron.com).
485.Pp
486Altered by Chris Lowth, chris@lowth.com, 2000:
487Handle the
488.Fl i
489option to output mime type strings and using an alternative
490magic file and internal logic.
491.Pp
492Altered by Eric Fischer (enf@pobox.com), July, 2000,
493to identify character codes and attempt to identify the languages
494of non-ASCII files.
495.Pp
496The list of contributors to the "Magdir" directory (source for the
497.Pa __MAGIC__
498file) is too long to include here.
499You know who you are; thank you.
500.Sh LEGAL NOTICE
501Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
502Covered by the standard Berkeley Software Distribution copyright; see the file
503LEGAL.NOTICE in the source distribution.
504.Pp
505The files
506.Dv tar.h
507and
508.Dv is_tar.c
509were written by John Gilmore from his public-domain
510.Xr tar 1
511program, and are not covered by the above license.
512.Sh BUGS
513There must be a better way to automate the construction of the Magic
514file from all the glop in Magdir.
515What is it?
516.\" Compilation support has been done
517.\" Better yet, the magic file should be compiled into binary (say,
518.\" .Xr ndbm 3
519.\" or, better yet, fixed-length
520.\" .Dv ASCII
521.\" strings for use in heterogenous network environments) for faster startup.
522.\" Then the program would run as fast as the Version 7 program of the same
523.\" name, with the flexibility of the System V version.
524.Pp
525.Nm
526uses several algorithms that favor speed over accuracy,
527thus it can be misled about the contents of
528text
529files.
530.Pp
531The support for text files (primarily for programming languages)
532is simplistic, inefficient and requires recompilation to update.
533.\" Else support has been done
534.\" There should be an
535.\" .Dv else
536.\" clause to follow a series of continuation lines.
537.\" .Pp
538.\" Regular expression support has been done
539.\" The magic file and keywords should have regular expression support.
540Their use of
541.Dv ASCII TAB
542as a field delimiter is ugly and makes
543it hard to edit the files, but is entrenched.
544.Pp
545It might be advisable to allow upper-case letters in keywords
546for e.g.,
547.Xr troff 1
548commands vs man page macros.
549Regular expression support would make this easy.
550.Pp
551The program doesn't grok
552.Dv FORTRAN .
553It should be able to figure
554.Dv FORTRAN
555by seeing some keywords which
556appear indented at the start of line.
557Regular expression support would make this easy.
558.Pp
559The list of keywords in
560.Dv ascmagic
561probably belongs in the Magic file.
562This could be done by using some keyword like
563.Sq *
564for the offset value.
565.Pp
566.\" Sorting has been done.
567.\" Another optimization would be to sort
568.\" the magic file so that we can just run down all the
569.\" tests for the first byte, first word, first long, etc, once we
570.\" have fetched it.
571Complain about conflicts in the magic file entries.
572Make a rule that the magic entries sort based on file offset rather
573than position within the magic file?
574.Pp
575The program should provide a way to give an estimate
576of
577.Dq how good
578a guess is.
579We end up removing guesses (e.g.
580.Dq From\
581as first 5 chars of file) because
582they are not as good as other guesses (e.g.
583.Dq Newsgroups:
584versus
585.Dq Return-Path:
586).
587Still, if the others don't pan out, it should be possible to use the
588first guess.
589.Pp
590This program is slower than some vendors' file commands.
591The new support for multiple character codes makes it even slower.
592.Pp
593This manual page, and particularly this section, is too long.
594.Sh AVAILABILITY
595You can obtain the original author's latest version by anonymous FTP
596on
597.Dv ftp.astron.com
598in the directory
599.Dv /pub/file/file-X.YZ.tar.gz
600