xref: /netbsd-src/external/gpl3/gcc.old/dist/libstdc++-v3/doc/html/manual/fstreams.html (revision 36ac495d2b3ea2b9d96377b2143ebfedac224b92)
1*36ac495dSmrg<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2*36ac495dSmrg<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>File Based Streams</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="io.html" title="Chapter 13.  Input and Output" /><link rel="prev" href="stringstreams.html" title="Memory Based Streams" /><link rel="next" href="io_and_c.html" title="Interacting with C" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">File Based Streams</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="stringstreams.html">Prev</a> </td><th width="60%" align="center">Chapter 13. 
3*36ac495dSmrg  Input and Output
4*36ac495dSmrg
5*36ac495dSmrg</th><td width="20%" align="right"> <a accesskey="n" href="io_and_c.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="std.io.filestreams"></a>File Based Streams</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.io.filestreams.copying_a_file"></a>Copying a File</h3></div></div></div><p>
6*36ac495dSmrg  </p><p>So you want to copy a file quickly and easily, and most important,
7*36ac495dSmrg      completely portably.  And since this is C++, you have an open
8*36ac495dSmrg      ifstream (call it IN) and an open ofstream (call it OUT):
9*36ac495dSmrg   </p><pre class="programlisting">
10*36ac495dSmrg   #include &lt;fstream&gt;
11*36ac495dSmrg
12*36ac495dSmrg   std::ifstream  IN ("input_file");
13*36ac495dSmrg   std::ofstream  OUT ("output_file"); </pre><p>Here's the easiest way to get it completely wrong:
14*36ac495dSmrg   </p><pre class="programlisting">
15*36ac495dSmrg   OUT &lt;&lt; IN;</pre><p>For those of you who don't already know why this doesn't work
16*36ac495dSmrg      (probably from having done it before), I invite you to quickly
17*36ac495dSmrg      create a simple text file called "input_file" containing
18*36ac495dSmrg      the sentence
19*36ac495dSmrg   </p><pre class="programlisting">
20*36ac495dSmrg      The quick brown fox jumped over the lazy dog.</pre><p>surrounded by blank lines.  Code it up and try it.  The contents
21*36ac495dSmrg      of "output_file" may surprise you.
22*36ac495dSmrg   </p><p>Seriously, go do it.  Get surprised, then come back.  It's worth it.
23*36ac495dSmrg   </p><p>The thing to remember is that the <code class="code">basic_[io]stream</code> classes
24*36ac495dSmrg      handle formatting, nothing else.  In particular, they break up on
25*36ac495dSmrg      whitespace.  The actual reading, writing, and storing of data is
26*36ac495dSmrg      handled by the <code class="code">basic_streambuf</code> family.  Fortunately, the
27*36ac495dSmrg      <code class="code">operator&lt;&lt;</code> is overloaded to take an ostream and
28*36ac495dSmrg      a pointer-to-streambuf, in order to help with just this kind of
29*36ac495dSmrg      "dump the data verbatim" situation.
30*36ac495dSmrg   </p><p>Why a <span class="emphasis"><em>pointer</em></span> to streambuf and not just a streambuf?  Well,
31*36ac495dSmrg      the [io]streams hold pointers (or references, depending on the
32*36ac495dSmrg      implementation) to their buffers, not the actual
33*36ac495dSmrg      buffers.  This allows polymorphic behavior on the chapter of the buffers
34*36ac495dSmrg      as well as the streams themselves.  The pointer is easily retrieved
35*36ac495dSmrg      using the <code class="code">rdbuf()</code> member function.  Therefore, the easiest
36*36ac495dSmrg      way to copy the file is:
37*36ac495dSmrg   </p><pre class="programlisting">
38*36ac495dSmrg   OUT &lt;&lt; IN.rdbuf();</pre><p>So what <span class="emphasis"><em>was</em></span> happening with OUT&lt;&lt;IN?  Undefined
39*36ac495dSmrg      behavior, since that particular &lt;&lt; isn't defined by the Standard.
40*36ac495dSmrg      I have seen instances where it is implemented, but the character
41*36ac495dSmrg      extraction process removes all the whitespace, leaving you with no
42*36ac495dSmrg      blank lines and only "Thequickbrownfox...".  With
43*36ac495dSmrg      libraries that do not define that operator, IN (or one of IN's
44*36ac495dSmrg      member pointers) sometimes gets converted to a void*, and the output
45*36ac495dSmrg      file then contains a perfect text representation of a hexadecimal
46*36ac495dSmrg      address (quite a big surprise).  Others don't compile at all.
47*36ac495dSmrg   </p><p>Also note that none of this is specific to o<span class="emphasis"><em>*f*</em></span>streams.
48*36ac495dSmrg      The operators shown above are all defined in the parent
49*36ac495dSmrg      basic_ostream class and are therefore available with all possible
50*36ac495dSmrg      descendants.
51*36ac495dSmrg   </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.io.filestreams.binary"></a>Binary Input and Output</h3></div></div></div><p>
52*36ac495dSmrg    </p><p>The first and most important thing to remember about binary I/O is
53*36ac495dSmrg      that opening a file with <code class="code">ios::binary</code> is not, repeat
54*36ac495dSmrg      <span class="emphasis"><em>not</em></span>, the only thing you have to do.  It is not a silver
55*36ac495dSmrg      bullet, and will not allow you to use the <code class="code">&lt;&lt;/&gt;&gt;</code>
56*36ac495dSmrg      operators of the normal fstreams to do binary I/O.
57*36ac495dSmrg   </p><p>Sorry.  Them's the breaks.
58*36ac495dSmrg   </p><p>This isn't going to try and be a complete tutorial on reading and
59*36ac495dSmrg      writing binary files (because "binary"
60*36ac495dSmrg      covers a lot of ground), but we will try and clear
61*36ac495dSmrg      up a couple of misconceptions and common errors.
62*36ac495dSmrg   </p><p>First, <code class="code">ios::binary</code> has exactly one defined effect, no more
63*36ac495dSmrg      and no less.  Normal text mode has to be concerned with the newline
64*36ac495dSmrg      characters, and the runtime system will translate between (for
65*36ac495dSmrg      example) '\n' and the appropriate end-of-line sequence (LF on Unix,
66*36ac495dSmrg      CRLF on DOS, CR on Macintosh, etc).  (There are other things that
67*36ac495dSmrg      normal mode does, but that's the most obvious.)  Opening a file in
68*36ac495dSmrg      binary mode disables this conversion, so reading a CRLF sequence
69*36ac495dSmrg      under Windows won't accidentally get mapped to a '\n' character, etc.
70*36ac495dSmrg      Binary mode is not supposed to suddenly give you a bitstream, and
71*36ac495dSmrg      if it is doing so in your program then you've discovered a bug in
72*36ac495dSmrg      your vendor's compiler (or some other chapter of the C++ implementation,
73*36ac495dSmrg      possibly the runtime system).
74*36ac495dSmrg   </p><p>Second, using <code class="code">&lt;&lt;</code> to write and <code class="code">&gt;&gt;</code> to
75*36ac495dSmrg      read isn't going to work with the standard file stream classes, even
76*36ac495dSmrg      if you use <code class="code">skipws</code> during reading.  Why not?  Because
77*36ac495dSmrg      ifstream and ofstream exist for the purpose of <span class="emphasis"><em>formatting</em></span>,
78*36ac495dSmrg      not reading and writing.  Their job is to interpret the data into
79*36ac495dSmrg      text characters, and that's exactly what you don't want to happen
80*36ac495dSmrg      during binary I/O.
81*36ac495dSmrg   </p><p>Third, using the <code class="code">get()</code> and <code class="code">put()/write()</code> member
82*36ac495dSmrg      functions still aren't guaranteed to help you.  These are
83*36ac495dSmrg      "unformatted" I/O functions, but still character-based.
84*36ac495dSmrg      (This may or may not be what you want, see below.)
85*36ac495dSmrg   </p><p>Notice how all the problems here are due to the inappropriate use
86*36ac495dSmrg      of <span class="emphasis"><em>formatting</em></span> functions and classes to perform something
87*36ac495dSmrg      which <span class="emphasis"><em>requires</em></span> that formatting not be done?  There are a
88*36ac495dSmrg      seemingly infinite number of solutions, and a few are listed here:
89*36ac495dSmrg   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="quote">“<span class="quote">Derive your own fstream-type classes and write your own
90*36ac495dSmrg	  &lt;&lt;/&gt;&gt; operators to do binary I/O on whatever data
91*36ac495dSmrg	  types you're using.</span>”</span>
92*36ac495dSmrg	</p><p>
93*36ac495dSmrg	  This is a Bad Thing, because while
94*36ac495dSmrg	  the compiler would probably be just fine with it, other humans
95*36ac495dSmrg	  are going to be confused.  The overloaded bitshift operators
96*36ac495dSmrg	  have a well-defined meaning (formatting), and this breaks it.
97*36ac495dSmrg	</p></li><li class="listitem"><p>
98*36ac495dSmrg	  <span class="quote">“<span class="quote">Build the file structure in memory, then
99*36ac495dSmrg	  <code class="code">mmap()</code> the file and copy the
100*36ac495dSmrg	  structure.
101*36ac495dSmrg	</span>”</span>
102*36ac495dSmrg	</p><p>
103*36ac495dSmrg	  Well, this is easy to make work, and easy to break, and is
104*36ac495dSmrg	  pretty equivalent to using <code class="code">::read()</code> and
105*36ac495dSmrg	  <code class="code">::write()</code> directly, and makes no use of the
106*36ac495dSmrg	  iostream library at all...
107*36ac495dSmrg	  </p></li><li class="listitem"><p>
108*36ac495dSmrg	  <span class="quote">“<span class="quote">Use streambufs, that's what they're there for.</span>”</span>
109*36ac495dSmrg	</p><p>
110*36ac495dSmrg	  While not trivial for the beginner, this is the best of all
111*36ac495dSmrg	  solutions.  The streambuf/filebuf layer is the layer that is
112*36ac495dSmrg	  responsible for actual I/O.  If you want to use the C++
113*36ac495dSmrg	  library for binary I/O, this is where you start.
114*36ac495dSmrg	</p></li></ul></div><p>How to go about using streambufs is a bit beyond the scope of this
115*36ac495dSmrg      document (at least for now), but while streambufs go a long way,
116*36ac495dSmrg      they still leave a couple of things up to you, the programmer.
117*36ac495dSmrg      As an example, byte ordering is completely between you and the
118*36ac495dSmrg      operating system, and you have to handle it yourself.
119*36ac495dSmrg   </p><p>Deriving a streambuf or filebuf
120*36ac495dSmrg      class from the standard ones, one that is specific to your data
121*36ac495dSmrg      types (or an abstraction thereof) is probably a good idea, and
122*36ac495dSmrg      lots of examples exist in journals and on Usenet.  Using the
123*36ac495dSmrg      standard filebufs directly (either by declaring your own or by
124*36ac495dSmrg      using the pointer returned from an fstream's <code class="code">rdbuf()</code>)
125*36ac495dSmrg      is certainly feasible as well.
126*36ac495dSmrg   </p><p>One area that causes problems is trying to do bit-by-bit operations
127*36ac495dSmrg      with filebufs.  C++ is no different from C in this respect:  I/O
128*36ac495dSmrg      must be done at the byte level.  If you're trying to read or write
129*36ac495dSmrg      a few bits at a time, you're going about it the wrong way.  You
130*36ac495dSmrg      must read/write an integral number of bytes and then process the
131*36ac495dSmrg      bytes.  (For example, the streambuf functions take and return
132*36ac495dSmrg      variables of type <code class="code">int_type</code>.)
133*36ac495dSmrg   </p><p>Another area of problems is opening text files in binary mode.
134*36ac495dSmrg      Generally, binary mode is intended for binary files, and opening
135*36ac495dSmrg      text files in binary mode means that you now have to deal with all of
136*36ac495dSmrg      those end-of-line and end-of-file problems that we mentioned before.
137*36ac495dSmrg   </p><p>
138*36ac495dSmrg      An instructive thread from comp.lang.c++.moderated delved off into
139*36ac495dSmrg      this topic starting more or less at
140*36ac495dSmrg      <a class="link" href="https://groups.google.com/forum/#!topic/comp.std.c++/D4e0q9eVSoc" target="_top">this post</a>
141*36ac495dSmrg      and continuing to the end of the thread. (The subject heading is "binary iostreams" on both comp.std.c++
142*36ac495dSmrg      and comp.lang.c++.moderated.) Take special note of the replies by James Kanze and Dietmar Kühl.
143*36ac495dSmrg   </p><p>Briefly, the problems of byte ordering and type sizes mean that
144*36ac495dSmrg      the unformatted functions like <code class="code">ostream::put()</code> and
145*36ac495dSmrg      <code class="code">istream::get()</code> cannot safely be used to communicate
146*36ac495dSmrg      between arbitrary programs, or across a network, or from one
147*36ac495dSmrg      invocation of a program to another invocation of the same program
148*36ac495dSmrg      on a different platform, etc.
149*36ac495dSmrg   </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="stringstreams.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="io.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="io_and_c.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Memory Based Streams </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Interacting with C</td></tr></table></div></body></html>