xref: /netbsd-src/external/gpl3/gcc/dist/libstdc++-v3/doc/html/manual/fstreams.html (revision b17d1066a7e7247cfc01a45f6ada19302e1cc150)
14fee23f9Smrg<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2d79abf08Smrg<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>File Based Streams</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="io.html" title="Chapter 13.  Input and Output" /><link rel="prev" href="stringstreams.html" title="Memory Based Streams" /><link rel="next" href="io_and_c.html" title="Interacting with C" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">File Based Streams</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="stringstreams.html">Prev</a> </td><th width="60%" align="center">Chapter 13. 
34fee23f9Smrg  Input and Output
44fee23f9Smrg
548fb7bfaSmrg</th><td width="20%" align="right"> <a accesskey="n" href="io_and_c.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="std.io.filestreams"></a>File Based Streams</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.io.filestreams.copying_a_file"></a>Copying a File</h3></div></div></div><p>
64fee23f9Smrg  </p><p>So you want to copy a file quickly and easily, and most important,
74fee23f9Smrg      completely portably.  And since this is C++, you have an open
84fee23f9Smrg      ifstream (call it IN) and an open ofstream (call it OUT):
94fee23f9Smrg   </p><pre class="programlisting">
104fee23f9Smrg   #include &lt;fstream&gt;
114fee23f9Smrg
124fee23f9Smrg   std::ifstream  IN ("input_file");
134fee23f9Smrg   std::ofstream  OUT ("output_file"); </pre><p>Here's the easiest way to get it completely wrong:
144fee23f9Smrg   </p><pre class="programlisting">
154fee23f9Smrg   OUT &lt;&lt; IN;</pre><p>For those of you who don't already know why this doesn't work
164fee23f9Smrg      (probably from having done it before), I invite you to quickly
174fee23f9Smrg      create a simple text file called "input_file" containing
184fee23f9Smrg      the sentence
194fee23f9Smrg   </p><pre class="programlisting">
204fee23f9Smrg      The quick brown fox jumped over the lazy dog.</pre><p>surrounded by blank lines.  Code it up and try it.  The contents
214fee23f9Smrg      of "output_file" may surprise you.
224fee23f9Smrg   </p><p>Seriously, go do it.  Get surprised, then come back.  It's worth it.
234fee23f9Smrg   </p><p>The thing to remember is that the <code class="code">basic_[io]stream</code> classes
244d5abbe8Smrg      handle formatting, nothing else.  In particular, they break up on
254fee23f9Smrg      whitespace.  The actual reading, writing, and storing of data is
264fee23f9Smrg      handled by the <code class="code">basic_streambuf</code> family.  Fortunately, the
274fee23f9Smrg      <code class="code">operator&lt;&lt;</code> is overloaded to take an ostream and
284fee23f9Smrg      a pointer-to-streambuf, in order to help with just this kind of
294fee23f9Smrg      "dump the data verbatim" situation.
304fee23f9Smrg   </p><p>Why a <span class="emphasis"><em>pointer</em></span> to streambuf and not just a streambuf?  Well,
314fee23f9Smrg      the [io]streams hold pointers (or references, depending on the
324fee23f9Smrg      implementation) to their buffers, not the actual
3348fb7bfaSmrg      buffers.  This allows polymorphic behavior on the chapter of the buffers
344fee23f9Smrg      as well as the streams themselves.  The pointer is easily retrieved
354fee23f9Smrg      using the <code class="code">rdbuf()</code> member function.  Therefore, the easiest
364fee23f9Smrg      way to copy the file is:
374fee23f9Smrg   </p><pre class="programlisting">
384fee23f9Smrg   OUT &lt;&lt; IN.rdbuf();</pre><p>So what <span class="emphasis"><em>was</em></span> happening with OUT&lt;&lt;IN?  Undefined
394d5abbe8Smrg      behavior, since that particular &lt;&lt; isn't defined by the Standard.
404fee23f9Smrg      I have seen instances where it is implemented, but the character
414fee23f9Smrg      extraction process removes all the whitespace, leaving you with no
424fee23f9Smrg      blank lines and only "Thequickbrownfox...".  With
434fee23f9Smrg      libraries that do not define that operator, IN (or one of IN's
444fee23f9Smrg      member pointers) sometimes gets converted to a void*, and the output
454fee23f9Smrg      file then contains a perfect text representation of a hexadecimal
464fee23f9Smrg      address (quite a big surprise).  Others don't compile at all.
474fee23f9Smrg   </p><p>Also note that none of this is specific to o<span class="emphasis"><em>*f*</em></span>streams.
484fee23f9Smrg      The operators shown above are all defined in the parent
494fee23f9Smrg      basic_ostream class and are therefore available with all possible
504fee23f9Smrg      descendants.
5148fb7bfaSmrg   </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.io.filestreams.binary"></a>Binary Input and Output</h3></div></div></div><p>
5248fb7bfaSmrg    </p><p>The first and most important thing to remember about binary I/O is
5348fb7bfaSmrg      that opening a file with <code class="code">ios::binary</code> is not, repeat
5448fb7bfaSmrg      <span class="emphasis"><em>not</em></span>, the only thing you have to do.  It is not a silver
5548fb7bfaSmrg      bullet, and will not allow you to use the <code class="code">&lt;&lt;/&gt;&gt;</code>
5648fb7bfaSmrg      operators of the normal fstreams to do binary I/O.
5748fb7bfaSmrg   </p><p>Sorry.  Them's the breaks.
5848fb7bfaSmrg   </p><p>This isn't going to try and be a complete tutorial on reading and
5948fb7bfaSmrg      writing binary files (because "binary"
6048fb7bfaSmrg      covers a lot of ground), but we will try and clear
6148fb7bfaSmrg      up a couple of misconceptions and common errors.
6248fb7bfaSmrg   </p><p>First, <code class="code">ios::binary</code> has exactly one defined effect, no more
6348fb7bfaSmrg      and no less.  Normal text mode has to be concerned with the newline
6448fb7bfaSmrg      characters, and the runtime system will translate between (for
6548fb7bfaSmrg      example) '\n' and the appropriate end-of-line sequence (LF on Unix,
6648fb7bfaSmrg      CRLF on DOS, CR on Macintosh, etc).  (There are other things that
6748fb7bfaSmrg      normal mode does, but that's the most obvious.)  Opening a file in
6848fb7bfaSmrg      binary mode disables this conversion, so reading a CRLF sequence
6948fb7bfaSmrg      under Windows won't accidentally get mapped to a '\n' character, etc.
7048fb7bfaSmrg      Binary mode is not supposed to suddenly give you a bitstream, and
7148fb7bfaSmrg      if it is doing so in your program then you've discovered a bug in
7248fb7bfaSmrg      your vendor's compiler (or some other chapter of the C++ implementation,
7348fb7bfaSmrg      possibly the runtime system).
7448fb7bfaSmrg   </p><p>Second, using <code class="code">&lt;&lt;</code> to write and <code class="code">&gt;&gt;</code> to
7548fb7bfaSmrg      read isn't going to work with the standard file stream classes, even
7648fb7bfaSmrg      if you use <code class="code">skipws</code> during reading.  Why not?  Because
7748fb7bfaSmrg      ifstream and ofstream exist for the purpose of <span class="emphasis"><em>formatting</em></span>,
7848fb7bfaSmrg      not reading and writing.  Their job is to interpret the data into
7948fb7bfaSmrg      text characters, and that's exactly what you don't want to happen
8048fb7bfaSmrg      during binary I/O.
8148fb7bfaSmrg   </p><p>Third, using the <code class="code">get()</code> and <code class="code">put()/write()</code> member
8248fb7bfaSmrg      functions still aren't guaranteed to help you.  These are
8348fb7bfaSmrg      "unformatted" I/O functions, but still character-based.
8448fb7bfaSmrg      (This may or may not be what you want, see below.)
8548fb7bfaSmrg   </p><p>Notice how all the problems here are due to the inappropriate use
8648fb7bfaSmrg      of <span class="emphasis"><em>formatting</em></span> functions and classes to perform something
8748fb7bfaSmrg      which <span class="emphasis"><em>requires</em></span> that formatting not be done?  There are a
8848fb7bfaSmrg      seemingly infinite number of solutions, and a few are listed here:
8948fb7bfaSmrg   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="quote">“<span class="quote">Derive your own fstream-type classes and write your own
9048fb7bfaSmrg	  &lt;&lt;/&gt;&gt; operators to do binary I/O on whatever data
9148fb7bfaSmrg	  types you're using.</span>”</span>
9248fb7bfaSmrg	</p><p>
9348fb7bfaSmrg	  This is a Bad Thing, because while
9448fb7bfaSmrg	  the compiler would probably be just fine with it, other humans
9548fb7bfaSmrg	  are going to be confused.  The overloaded bitshift operators
9648fb7bfaSmrg	  have a well-defined meaning (formatting), and this breaks it.
9748fb7bfaSmrg	</p></li><li class="listitem"><p>
9848fb7bfaSmrg	  <span class="quote">“<span class="quote">Build the file structure in memory, then
9948fb7bfaSmrg	  <code class="code">mmap()</code> the file and copy the
10048fb7bfaSmrg	  structure.
10148fb7bfaSmrg	</span>”</span>
10248fb7bfaSmrg	</p><p>
10348fb7bfaSmrg	  Well, this is easy to make work, and easy to break, and is
10448fb7bfaSmrg	  pretty equivalent to using <code class="code">::read()</code> and
10548fb7bfaSmrg	  <code class="code">::write()</code> directly, and makes no use of the
10648fb7bfaSmrg	  iostream library at all...
10748fb7bfaSmrg	  </p></li><li class="listitem"><p>
10848fb7bfaSmrg	  <span class="quote">“<span class="quote">Use streambufs, that's what they're there for.</span>”</span>
10948fb7bfaSmrg	</p><p>
11048fb7bfaSmrg	  While not trivial for the beginner, this is the best of all
11148fb7bfaSmrg	  solutions.  The streambuf/filebuf layer is the layer that is
11248fb7bfaSmrg	  responsible for actual I/O.  If you want to use the C++
11348fb7bfaSmrg	  library for binary I/O, this is where you start.
11448fb7bfaSmrg	</p></li></ul></div><p>How to go about using streambufs is a bit beyond the scope of this
11548fb7bfaSmrg      document (at least for now), but while streambufs go a long way,
11648fb7bfaSmrg      they still leave a couple of things up to you, the programmer.
11748fb7bfaSmrg      As an example, byte ordering is completely between you and the
11848fb7bfaSmrg      operating system, and you have to handle it yourself.
11948fb7bfaSmrg   </p><p>Deriving a streambuf or filebuf
12048fb7bfaSmrg      class from the standard ones, one that is specific to your data
12148fb7bfaSmrg      types (or an abstraction thereof) is probably a good idea, and
12248fb7bfaSmrg      lots of examples exist in journals and on Usenet.  Using the
12348fb7bfaSmrg      standard filebufs directly (either by declaring your own or by
12448fb7bfaSmrg      using the pointer returned from an fstream's <code class="code">rdbuf()</code>)
12548fb7bfaSmrg      is certainly feasible as well.
12648fb7bfaSmrg   </p><p>One area that causes problems is trying to do bit-by-bit operations
12748fb7bfaSmrg      with filebufs.  C++ is no different from C in this respect:  I/O
12848fb7bfaSmrg      must be done at the byte level.  If you're trying to read or write
12948fb7bfaSmrg      a few bits at a time, you're going about it the wrong way.  You
13048fb7bfaSmrg      must read/write an integral number of bytes and then process the
13148fb7bfaSmrg      bytes.  (For example, the streambuf functions take and return
13248fb7bfaSmrg      variables of type <code class="code">int_type</code>.)
13348fb7bfaSmrg   </p><p>Another area of problems is opening text files in binary mode.
13448fb7bfaSmrg      Generally, binary mode is intended for binary files, and opening
13548fb7bfaSmrg      text files in binary mode means that you now have to deal with all of
13648fb7bfaSmrg      those end-of-line and end-of-file problems that we mentioned before.
13748fb7bfaSmrg   </p><p>
13848fb7bfaSmrg      An instructive thread from comp.lang.c++.moderated delved off into
13948fb7bfaSmrg      this topic starting more or less at
140*b17d1066Smrg      <a class="link" href="https://groups.google.com/forum/#!topic/comp.std.c++/D4e0q9eVSoc" target="_top">this post</a>
141*b17d1066Smrg      and continuing to the end of the thread. (The subject heading is "binary iostreams" on both comp.std.c++
14248fb7bfaSmrg      and comp.lang.c++.moderated.) Take special note of the replies by James Kanze and Dietmar Kühl.
14348fb7bfaSmrg   </p><p>Briefly, the problems of byte ordering and type sizes mean that
14448fb7bfaSmrg      the unformatted functions like <code class="code">ostream::put()</code> and
14548fb7bfaSmrg      <code class="code">istream::get()</code> cannot safely be used to communicate
14648fb7bfaSmrg      between arbitrary programs, or across a network, or from one
14748fb7bfaSmrg      invocation of a program to another invocation of the same program
14848fb7bfaSmrg      on a different platform, etc.
14948fb7bfaSmrg   </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="stringstreams.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="io.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="io_and_c.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Memory Based Streams </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Interacting with C</td></tr></table></div></body></html>