xref: /netbsd-src/external/bsd/file/dist/doc/file.1 (revision e89934bbf778a6d6d6894877c4da59d0c7835b0f)
1.\"	$NetBSD: file.1,v 1.18 2017/02/10 17:53:24 christos Exp $
2.\"
3.\" $File: file.man,v 1.125 2017/01/03 11:24:46 christos Exp $
4.Dd October 19, 2016
5.Dt FILE 1
6.Os
7.Sh NAME
8.Nm file
9.Nd determine file type
10.Sh SYNOPSIS
11.Nm
12.Bk -words
13.Op Fl bcdEhiklLNnprsvzZ0
14.Op Fl Fl apple
15.Op Fl Fl extension
16.Op Fl Fl mime-encoding
17.Op Fl Fl mime-type
18.Op Fl e Ar testname
19.Op Fl F Ar separator
20.Op Fl f Ar namefile
21.Op Fl m Ar magicfiles
22.Op Fl P Ar name=value
23.Ar
24.Ek
25.Nm
26.Fl C
27.Op Fl m Ar magicfiles
28.Nm
29.Op Fl Fl help
30.Sh DESCRIPTION
31This manual page documents version 5.30 of the
32.Nm
33command.
34.Pp
35.Nm
36tests each argument in an attempt to classify it.
37There are three sets of tests, performed in this order:
38filesystem tests, magic tests, and language tests.
39The
40.Em first
41test that succeeds causes the file type to be printed.
42.Pp
43The type printed will usually contain one of the words
44.Em text
45(the file contains only
46printing characters and a few common control
47characters and is probably safe to read on an
48.Dv ASCII
49terminal),
50.Em executable
51(the file contains the result of compiling a program
52in a form understandable to some
53.Tn UNIX
54kernel or another),
55or
56.Em data
57meaning anything else (data is usually
58.Dq binary
59or non-printable).
60Exceptions are well-known file formats (core files, tar archives)
61that are known to contain binary data.
62When modifying magic files or the program itself, make sure to
63.Em "preserve these keywords" .
64Users depend on knowing that all the readable files in a directory
65have the word
66.Dq text
67printed.
68Don't do as Berkeley did and change
69.Dq shell commands text
70to
71.Dq shell script .
72.Pp
73The filesystem tests are based on examining the return from a
74.Xr stat 2
75system call.
76The program checks to see if the file is empty,
77or if it's some sort of special file.
78Any known file types appropriate to the system you are running on
79(sockets, symbolic links, or named pipes (FIFOs) on those systems that
80implement them)
81are intuited if they are defined in the system header file
82.In sys/stat.h .
83.Pp
84The magic tests are used to check for files with data in
85particular fixed formats.
86The canonical example of this is a binary executable (compiled program)
87.Dv a.out
88file, whose format is defined in
89.In elf.h ,
90.In a.out.h
91and possibly
92.In exec.h
93in the standard include directory.
94These files have a
95.Dq "magic number"
96stored in a particular place
97near the beginning of the file that tells the
98.Tn UNIX
99operating system
100that the file is a binary executable, and which of several types thereof.
101The concept of a
102.Dq "magic"
103has been applied by extension to data files.
104Any file with some invariant identifier at a small fixed
105offset into the file can usually be described in this way.
106The information identifying these files is read from the compiled
107magic file
108.Pa /usr/share/misc/magic.mgc ,
109or the files in the directory
110.Pa /usr/share/misc/magic
111if the compiled file does not exist.
112In addition, if
113.Pa $HOME/.magic.mgc
114or
115.Pa $HOME/.magic
116exists, it will be used in preference to the system magic files.
117.Pp
118If a file does not match any of the entries in the magic file,
119it is examined to see if it seems to be a text file.
120ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
121(such as those used on Macintosh and IBM PC systems),
122UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
123character sets can be distinguished by the different
124ranges and sequences of bytes that constitute printable text
125in each set.
126If a file passes any of these tests, its character set is reported.
127ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
128as
129.Dq text
130because they will be mostly readable on nearly any terminal;
131UTF-16 and EBCDIC are only
132.Dq character data
133because, while
134they contain text, it is text that will require translation
135before it can be read.
136In addition,
137.Nm
138will attempt to determine other characteristics of text-type files.
139If the lines of a file are terminated by CR, CRLF, or NEL, instead
140of the Unix-standard LF, this will be reported.
141Files that contain embedded escape sequences or overstriking
142will also be identified.
143.Pp
144Once
145.Nm
146has determined the character set used in a text-type file,
147it will
148attempt to determine in what language the file is written.
149The language tests look for particular strings (cf.
150.In names.h )
151that can appear anywhere in the first few blocks of a file.
152For example, the keyword
153.Em .br
154indicates that the file is most likely a
155.Xr troff 1
156input file, just as the keyword
157.Em struct
158indicates a C program.
159These tests are less reliable than the previous
160two groups, so they are performed last.
161The language test routines also test for some miscellany
162(such as
163.Xr tar 1
164archives).
165.Pp
166Any file that cannot be identified as having been written
167in any of the character sets listed above is simply said to be
168.Dq data .
169.Sh OPTIONS
170.Bl -tag -width indent
171.It Fl Fl apple
172Causes the file command to output the file type and creator code as
173used by older MacOS versions. The code consists of eight letters,
174the first describing the file type, the latter the creator.
175.It Fl b , Fl Fl brief
176Do not prepend filenames to output lines (brief mode).
177.It Fl C , Fl Fl compile
178Write a
179.Pa magic.mgc
180output file that contains a pre-parsed version of the magic file or directory.
181.It Fl c , Fl Fl checking-printout
182Cause a checking printout of the parsed form of the magic file.
183This is usually used in conjunction with the
184.Fl m
185flag to debug a new magic file before installing it.
186.It Fl d
187Prints internal debugging information to stderr.
188.It Fl E
189On filesystem errors (file not found etc), instead of handling the error
190as regular output as POSIX mandates and keep going, issue an error message
191and exit.
192.It Fl e , Fl Fl exclude Ar testname
193Exclude the test named in
194.Ar testname
195from the list of tests made to determine the file type.
196Valid test names are:
197.Bl -tag -width compress
198.It apptype
199.Dv EMX
200application type (only on EMX).
201.It ascii
202Various types of text files (this test will try to guess the text
203encoding, irrespective of the setting of the
204.Sq encoding
205option).
206.It encoding
207Different text encodings for soft magic tests.
208.It tokens
209Ignored for backwards compatibility.
210.It cdf
211Prints details of Compound Document Files.
212.It compress
213Checks for, and looks inside, compressed files.
214.It elf
215Prints ELF file details, provided soft magic tests are enabled and the
216elf magic is found.
217.It soft
218Consults magic files.
219.It tar
220Examines tar files.
221.It text
222A synonym for
223.Sq ascii .
224.El
225.It Fl Fl extension
226Print a slash-separated list of valid extensions for the file type found.
227.It Fl F , Fl Fl separator Ar separator
228Use the specified string as the separator between the filename and the
229file result returned.
230Defaults to
231.Sq \&: .
232.It Fl f , Fl Fl files-from Ar namefile
233Read the names of the files to be examined from
234.Ar namefile
235(one per line)
236before the argument list.
237Either
238.Ar namefile
239or at least one filename argument must be present;
240to test the standard input, use
241.Sq -
242as a filename argument.
243Please note that
244.Ar namefile
245is unwrapped and the enclosed filenames are processed when this option is
246encountered and before any further options processing is done.
247This allows one to process multiple lists of files with different command line
248arguments on the same
249.Nm
250invocation.
251Thus if you want to set the delimiter, you need to do it before you specify
252the list of files, like:
253.Dq Fl F Ar @ Fl f Ar namefile ,
254instead of:
255.Dq Fl f Ar namefile Fl F Ar @ .
256.It Fl h , Fl Fl no-dereference
257option causes symlinks not to be followed
258(on systems that support symbolic links).
259This is the default if the environment variable
260.Dv POSIXLY_CORRECT
261is not defined.
262.It Fl i , Fl Fl mime
263Causes the file command to output mime type strings rather than the more
264traditional human readable ones.
265Thus it may say
266.Sq text/plain; charset=us-ascii
267rather than
268.Dq ASCII text .
269.It Fl Fl mime-type , Fl Fl mime-encoding
270Like
271.Fl i ,
272but print only the specified element(s).
273.It Fl k , Fl Fl keep-going
274Don't stop at the first match, keep going.
275Subsequent matches will be
276have the string
277.Sq "\[rs]012\- "
278prepended.
279(If you want a newline, see the
280.Fl r
281option.)
282The magic pattern with the highest strength (see the
283.Fl l
284option) comes first.
285.It Fl l , Fl Fl list
286Shows a list of patterns and their strength sorted descending by
287.Xr magic 4
288strength
289which is used for the matching (see also the
290.Fl k
291option).
292.It Fl L , Fl Fl dereference
293option causes symlinks to be followed, as the like-named option in
294.Xr ls 1
295(on systems that support symbolic links).
296This is the default if the environment variable
297.Ev POSIXLY_CORRECT
298is defined.
299.It Fl m , Fl Fl magic-file Ar magicfiles
300Specify an alternate list of files and directories containing magic.
301This can be a single item, or a colon-separated list.
302If a compiled magic file is found alongside a file or directory,
303it will be used instead.
304.It Fl N , Fl Fl no-pad
305Don't pad filenames so that they align in the output.
306.It Fl n , Fl Fl no-buffer
307Force stdout to be flushed after checking each file.
308This is only useful if checking a list of files.
309It is intended to be used by programs that want filetype output from a pipe.
310.It Fl p , Fl Fl preserve-date
311On systems that support
312.Xr utime 3
313or
314.Xr utimes 2 ,
315attempt to preserve the access time of files analyzed, to pretend that
316.Nm
317never read them.
318.It Fl P , Fl Fl parameter Ar name=value
319Set various parameter limits.
320.Bl -column "elf_phnum" "Default" "XXXXXXXXXXXXXXXXXXXXXXXXXXX" -offset indent
321.It Sy "Name" Ta Sy "Default" Ta Sy "Explanation"
322.It Li indir Ta 15 Ta recursion limit for indirect magic
323.It Li name Ta 30 Ta use count limit for name/use magic
324.It Li elf_notes Ta 256 Ta max ELF notes processed
325.It Li elf_phnum Ta 128 Ta max ELF program sections processed
326.It Li elf_shnum Ta 32768 Ta max ELF sections processed
327.It Li regex Ta 8192 Ta length limit for regex searches
328.It Li bytes Ta 1048576 Ta max number of bytes to read from file
329.El
330.It Fl r , Fl Fl raw
331Don't translate unprintable characters to \eooo.
332Normally
333.Nm
334translates unprintable characters to their octal representation.
335.It Fl s , Fl Fl special-files
336Normally,
337.Nm
338only attempts to read and determine the type of argument files which
339.Xr stat 2
340reports are ordinary files.
341This prevents problems, because reading special files may have peculiar
342consequences.
343Specifying the
344.Fl s
345option causes
346.Nm
347to also read argument files which are block or character special files.
348This is useful for determining the filesystem types of the data in raw
349disk partitions, which are block special files.
350This option also causes
351.Nm
352to disregard the file size as reported by
353.Xr stat 2
354since on some systems it reports a zero size for raw disk partitions.
355.It Fl v , Fl Fl version
356Print the version of the program and exit.
357.It Fl z , Fl Fl uncompress
358Try to look inside compressed files.
359.It Fl Z , Fl Fl uncompress-noreport
360Try to look inside compressed files, but report information about the contents
361only not the compression.
362.It Fl 0 , Fl Fl print0
363Output a null character
364.Sq \e0
365after the end of the filename.
366Nice to
367.Xr cut 1
368the output.
369This does not affect the separator, which is still printed.
370.Pp
371If this option is repeated more than once, then
372.Nm
373prints just the filename followed by a NUL followed by the description
374(or ERROR: text) followed by a second NUL for each entry.
375.It Fl -help
376Print a help message and exit.
377.El
378.Sh FILES
379.Bl -tag -width /usr/share/misc/magic.mgc -compact
380.It Pa /usr/share/misc/magic.mgc
381Default compiled list of magic.
382.It Pa /usr/share/misc/magic
383Directory containing default magic files.
384.El
385.Sh ENVIRONMENT
386The environment variable
387.Ev MAGIC
388can be used to set the default magic file name.
389If that variable is set, then
390.Nm
391will not attempt to open
392.Pa $HOME/.magic .
393.Nm
394adds
395.Dq Pa .mgc
396to the value of this variable as appropriate.
397However,
398.Pa file
399has to exist in order for
400.Pa file.mime
401to be considered.
402The environment variable
403.Ev POSIXLY_CORRECT
404controls (on systems that support symbolic links), whether
405.Nm
406will attempt to follow symlinks or not.
407If set, then
408.Nm
409follows symlink, otherwise it does not.
410This is also controlled by the
411.Fl L
412and
413.Fl h
414options.
415.Sh SEE ALSO
416.Xr hexdump 1 ,
417.Xr od 1 ,
418.Xr strings 1 ,
419.Xr magic 5
420.Sh STANDARDS CONFORMANCE
421This program is believed to exceed the System V Interface Definition
422of FILE(CMD), as near as one can determine from the vague language
423contained therein.
424Its behavior is mostly compatible with the System V program of the same name.
425This version knows more magic, however, so it will produce
426different (albeit more accurate) output in many cases.
427.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
428.Pp
429The one significant difference
430between this version and System V
431is that this version treats any white space
432as a delimiter, so that spaces in pattern strings must be escaped.
433For example,
434.Bd -literal -offset indent
435\*[Gt]10	string	language impress\ 	(imPRESS data)
436.Ed
437.Pp
438in an existing magic file would have to be changed to
439.Bd -literal -offset indent
440\*[Gt]10	string	language\e impress	(imPRESS data)
441.Ed
442.Pp
443In addition, in this version, if a pattern string contains a backslash,
444it must be escaped.
445For example
446.Bd -literal -offset indent
4470	string		\ebegindata	Andrew Toolkit document
448.Ed
449.Pp
450in an existing magic file would have to be changed to
451.Bd -literal -offset indent
4520	string		\e\ebegindata	Andrew Toolkit document
453.Ed
454.Pp
455SunOS releases 3.2 and later from Sun Microsystems include a
456.Nm
457command derived from the System V one, but with some extensions.
458This version differs from Sun's only in minor ways.
459It includes the extension of the
460.Sq \*[Am]
461operator, used as,
462for example,
463.Bd -literal -offset indent
464\*[Gt]16	long\*[Am]0x7fffffff	\*[Gt]0		not stripped
465.Ed
466.Sh MAGIC DIRECTORY
467The magic file entries have been collected from various sources,
468mainly USENET, and contributed by various authors.
469Christos Zoulas (address below) will collect additional
470or corrected magic file entries.
471A consolidation of magic file entries
472will be distributed periodically.
473.Pp
474The order of entries in the magic file is significant.
475Depending on what system you are using, the order that
476they are put together may be incorrect.
477If your old
478.Nm
479command uses a magic file,
480keep the old magic file around for comparison purposes
481(rename it to
482.Pa /usr/share/misc/magic.orig ) .
483.Sh EXAMPLES
484.Bd -literal -offset indent
485$ file file.c file /dev/{wd0a,hda}
486file.c:   C program text
487file:     ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
488          dynamically linked (uses shared libs), stripped
489/dev/wd0a: block special (0/0)
490/dev/hda: block special (3/0)
491
492$ file -s /dev/wd0{b,d}
493/dev/wd0b: data
494/dev/wd0d: x86 boot sector
495
496$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
497/dev/hda:   x86 boot sector
498/dev/hda1:  Linux/i386 ext2 filesystem
499/dev/hda2:  x86 boot sector
500/dev/hda3:  x86 boot sector, extended partition table
501/dev/hda4:  Linux/i386 ext2 filesystem
502/dev/hda5:  Linux/i386 swap file
503/dev/hda6:  Linux/i386 swap file
504/dev/hda7:  Linux/i386 swap file
505/dev/hda8:  Linux/i386 swap file
506/dev/hda9:  empty
507/dev/hda10: empty
508
509$ file -i file.c file /dev/{wd0a,hda}
510file.c:      text/x-c
511file:        application/x-executable
512/dev/hda:    application/x-not-regular-file
513/dev/wd0a:   application/x-not-regular-file
514
515.Ed
516.Sh HISTORY
517There has been a
518.Nm
519command in every
520.Dv UNIX since at least Research Version 4
521(man page dated November, 1973).
522The System V version introduced one significant major change:
523the external list of magic types.
524This slowed the program down slightly but made it a lot more flexible.
525.Pp
526This program, based on the System V version,
527was written by Ian Darwin
528.Aq ian@darwinsys.com
529without looking at anybody else's source code.
530.Pp
531John Gilmore revised the code extensively, making it better than
532the first version.
533Geoff Collyer found several inadequacies
534and provided some magic file entries.
535Contributions of the
536.Sq \*[Am]
537operator by Rob McMahon,
538.Aq cudcv@warwick.ac.uk ,
5391989.
540.Pp
541Guy Harris,
542.Aq guy@netapp.com ,
543made many changes from 1993 to the present.
544.Pp
545Primary development and maintenance from 1990 to the present by
546Christos Zoulas
547.Aq christos@astron.com .
548.Pp
549Altered by Chris Lowth
550.Aq chris@lowth.com ,
5512000: handle the
552.Fl i
553option to output mime type strings, using an alternative
554magic file and internal logic.
555.Pp
556Altered by Eric Fischer
557.Aq enf@pobox.com ,
558July, 2000,
559to identify character codes and attempt to identify the languages
560of non-ASCII files.
561.Pp
562Altered by Reuben Thomas
563.Aq rrt@sc3d.org ,
5642007-2011, to improve MIME support, merge MIME and non-MIME magic,
565support directories as well as files of magic, apply many bug fixes,
566update and fix a lot of magic, improve the build system, improve the
567documentation, and rewrite the Python bindings in pure Python.
568.Pp
569The list of contributors to the
570.Sq magic
571directory (magic files)
572is too long to include here.
573You know who you are; thank you.
574Many contributors are listed in the source files.
575.Sh LEGAL NOTICE
576Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
577Covered by the standard Berkeley Software Distribution copyright; see the file
578COPYING in the source distribution.
579.Pp
580The files
581.Pa tar.h
582and
583.Pa is_tar.c
584were written by John Gilmore from his public-domain
585.Xr tar 1
586program, and are not covered by the above license.
587.Sh RETURN CODE
588.Nm
589returns 0 on success, and non-zero on error.
590.Sh BUGS
591Please report bugs and send patches to the bug tracker at
592.Pa http://bugs.gw.com/
593or the mailing list at
594.Aq file@mx.gw.com
595(visit
596.Pa http://mx.gw.com/mailman/listinfo/file
597first to subscribe).
598.Sh TODO
599Fix output so that tests for MIME and APPLE flags are not needed all
600over the place, and actual output is only done in one place.
601This needs a design.
602Suggestion: push possible outputs on to a list, then pick the
603last-pushed (most specific, one hopes) value at the end, or
604use a default if the list is empty.
605This should not slow down evaluation.
606.Pp
607The handling of
608.Dv MAGIC_CONTINUE
609and printing \e012- between entries is clumsy and complicated; refactor
610and centralize.
611.Pp
612Some of the encoding logic is hard-coded in encoding.c and can be moved
613to the magic files if we had a !:charset annotation
614.Pp
615Continue to squash all magic bugs.
616See Debian BTS for a good source.
617.Pp
618Store arbitrarily long strings, for example for %s patterns, so that
619they can be printed out.
620Fixes Debian bug #271672.
621This can be done by allocating strings in a string pool, storing the
622string pool at the end of the magic file and converting all the string
623pointers to relative offsets from the string pool.
624.Pp
625Add syntax for relative offsets after current level (Debian bug #466037).
626.Pp
627Make file -ki work, i.e. give multiple MIME types.
628.Pp
629Add a zip library so we can peek inside Office2007 documents to
630print more details about their contents.
631.Pp
632Add an option to print URLs for the sources of the file descriptions.
633.Pp
634Combine script searches and add a way to map executable names to MIME
635types (e.g. have a magic value for !:mime which causes the resulting
636string to be looked up in a table).
637This would avoid adding the same magic repeatedly for each new
638hash-bang interpreter.
639.Pp
640When a file descriptor is available, we can skip and adjust the buffer
641instead of the hacky buffer management we do now.
642.Pp
643Fix
644.Dq name
645and
646.Dq use
647to check for consistency at compile time (duplicate
648.Dq name ,
649.Dq use
650pointing to undefined
651.Dq name
652).
653Make
654.Dq name
655/
656.Dq use
657more efficient by keeping a sorted list of names.
658Special-case ^ to flip endianness in the parser so that it does not
659have to be escaped, and document it.
660.Pp
661If the offsets specified internally in the file exceed the buffer size
662(
663.Dv HOWMANY
664variable in file.h), then we don't seek to that offset, but we give up.
665It would be better if buffer managements was done when the file descriptor
666is available so move around the file.
667One must be careful though because this has performance (and thus security
668considerations).
669.Sh AVAILABILITY
670You can obtain the original author's latest version by anonymous FTP
671on
672.Pa ftp.astron.com
673in the directory
674.Pa /pub/file/file-X.YZ.tar.gz .
675