doc2ps [ file.doc ]
wdoc2txt [ file.doc ]
xls2txt [ file.xls ]
aux/olefs [ -m mtpt ] file.doc
aux/mswordstrings mtpt /WordDocument
aux/msexceltables [ -qaDnt ] [ -d delim ] [ -c column-range ] [ -w worksheet-range ] mtpt /Workbook
Microsoft Office documents are stored in OLE (Object Linking and Embedding) format, which is a scaled down version of Microsoft's FAT file system. Olefs presents the contents of an MS Office document as a file system on mtpt , which defaults to /mnt/doc . Mswordstrings or msexceltables may then be used to parse the files inside, extracting a text stream. Msexceltables may be given options to control the formatting of its output. .TF "\fL-d delim"
-a Attempt conversion of non-tabular sheets in the workbook (charts).
-d " delim Sets the inter-field delimiter to the string delim , by default a single space.
-D Enables debugging output.
-c " range Range is a comma-separated list of column numbers and ranges. Ranges are separated by dashes. Limit processing to just those columns named; by default all columns are output.
-n Disables field padding to column width.
-q Disable quoting of textural fields (see quote (2).)
-t Truncate fields to the column width.
-w " range Range is a comma-separated list of worksheet numbers and ranges, this limits the sheets output using the same syntax as the -c option above. Suppressed chart pages are always included in the sheet count.
0
.EX aux/olefs report.xls msexceltables -q -w 1,7,9-14 -c 3-5 -n -d '@' /mnt/doc/Workbook > rpt.txt unmount /mnt/doc
/rc/bin doc2txt , doc2ps , wdoc2txt, and xls2txt
``Microsoft Word 97 Binary File Format'', at Microsoft's developer (MSDN) home page.
``LAOLA Binary Structures'', http://user.cs.tu-berlin.de/~schwartz/pmh
``OpenOffice.Org's Excel Documentation'',