1*36ac495dSmrg<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2*36ac495dSmrg<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>File Based Streams</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="io.html" title="Chapter 13. Input and Output" /><link rel="prev" href="stringstreams.html" title="Memory Based Streams" /><link rel="next" href="io_and_c.html" title="Interacting with C" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">File Based Streams</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="stringstreams.html">Prev</a> </td><th width="60%" align="center">Chapter 13. 3*36ac495dSmrg Input and Output 4*36ac495dSmrg 5*36ac495dSmrg</th><td width="20%" align="right"> <a accesskey="n" href="io_and_c.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="std.io.filestreams"></a>File Based Streams</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.io.filestreams.copying_a_file"></a>Copying a File</h3></div></div></div><p> 6*36ac495dSmrg </p><p>So you want to copy a file quickly and easily, and most important, 7*36ac495dSmrg completely portably. And since this is C++, you have an open 8*36ac495dSmrg ifstream (call it IN) and an open ofstream (call it OUT): 9*36ac495dSmrg </p><pre class="programlisting"> 10*36ac495dSmrg #include <fstream> 11*36ac495dSmrg 12*36ac495dSmrg std::ifstream IN ("input_file"); 13*36ac495dSmrg std::ofstream OUT ("output_file"); </pre><p>Here's the easiest way to get it completely wrong: 14*36ac495dSmrg </p><pre class="programlisting"> 15*36ac495dSmrg OUT << IN;</pre><p>For those of you who don't already know why this doesn't work 16*36ac495dSmrg (probably from having done it before), I invite you to quickly 17*36ac495dSmrg create a simple text file called "input_file" containing 18*36ac495dSmrg the sentence 19*36ac495dSmrg </p><pre class="programlisting"> 20*36ac495dSmrg The quick brown fox jumped over the lazy dog.</pre><p>surrounded by blank lines. Code it up and try it. The contents 21*36ac495dSmrg of "output_file" may surprise you. 22*36ac495dSmrg </p><p>Seriously, go do it. Get surprised, then come back. It's worth it. 23*36ac495dSmrg </p><p>The thing to remember is that the <code class="code">basic_[io]stream</code> classes 24*36ac495dSmrg handle formatting, nothing else. In particular, they break up on 25*36ac495dSmrg whitespace. The actual reading, writing, and storing of data is 26*36ac495dSmrg handled by the <code class="code">basic_streambuf</code> family. Fortunately, the 27*36ac495dSmrg <code class="code">operator<<</code> is overloaded to take an ostream and 28*36ac495dSmrg a pointer-to-streambuf, in order to help with just this kind of 29*36ac495dSmrg "dump the data verbatim" situation. 30*36ac495dSmrg </p><p>Why a <span class="emphasis"><em>pointer</em></span> to streambuf and not just a streambuf? Well, 31*36ac495dSmrg the [io]streams hold pointers (or references, depending on the 32*36ac495dSmrg implementation) to their buffers, not the actual 33*36ac495dSmrg buffers. This allows polymorphic behavior on the chapter of the buffers 34*36ac495dSmrg as well as the streams themselves. The pointer is easily retrieved 35*36ac495dSmrg using the <code class="code">rdbuf()</code> member function. Therefore, the easiest 36*36ac495dSmrg way to copy the file is: 37*36ac495dSmrg </p><pre class="programlisting"> 38*36ac495dSmrg OUT << IN.rdbuf();</pre><p>So what <span class="emphasis"><em>was</em></span> happening with OUT<<IN? Undefined 39*36ac495dSmrg behavior, since that particular << isn't defined by the Standard. 40*36ac495dSmrg I have seen instances where it is implemented, but the character 41*36ac495dSmrg extraction process removes all the whitespace, leaving you with no 42*36ac495dSmrg blank lines and only "Thequickbrownfox...". With 43*36ac495dSmrg libraries that do not define that operator, IN (or one of IN's 44*36ac495dSmrg member pointers) sometimes gets converted to a void*, and the output 45*36ac495dSmrg file then contains a perfect text representation of a hexadecimal 46*36ac495dSmrg address (quite a big surprise). Others don't compile at all. 47*36ac495dSmrg </p><p>Also note that none of this is specific to o<span class="emphasis"><em>*f*</em></span>streams. 48*36ac495dSmrg The operators shown above are all defined in the parent 49*36ac495dSmrg basic_ostream class and are therefore available with all possible 50*36ac495dSmrg descendants. 51*36ac495dSmrg </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.io.filestreams.binary"></a>Binary Input and Output</h3></div></div></div><p> 52*36ac495dSmrg </p><p>The first and most important thing to remember about binary I/O is 53*36ac495dSmrg that opening a file with <code class="code">ios::binary</code> is not, repeat 54*36ac495dSmrg <span class="emphasis"><em>not</em></span>, the only thing you have to do. It is not a silver 55*36ac495dSmrg bullet, and will not allow you to use the <code class="code"><</>></code> 56*36ac495dSmrg operators of the normal fstreams to do binary I/O. 57*36ac495dSmrg </p><p>Sorry. Them's the breaks. 58*36ac495dSmrg </p><p>This isn't going to try and be a complete tutorial on reading and 59*36ac495dSmrg writing binary files (because "binary" 60*36ac495dSmrg covers a lot of ground), but we will try and clear 61*36ac495dSmrg up a couple of misconceptions and common errors. 62*36ac495dSmrg </p><p>First, <code class="code">ios::binary</code> has exactly one defined effect, no more 63*36ac495dSmrg and no less. Normal text mode has to be concerned with the newline 64*36ac495dSmrg characters, and the runtime system will translate between (for 65*36ac495dSmrg example) '\n' and the appropriate end-of-line sequence (LF on Unix, 66*36ac495dSmrg CRLF on DOS, CR on Macintosh, etc). (There are other things that 67*36ac495dSmrg normal mode does, but that's the most obvious.) Opening a file in 68*36ac495dSmrg binary mode disables this conversion, so reading a CRLF sequence 69*36ac495dSmrg under Windows won't accidentally get mapped to a '\n' character, etc. 70*36ac495dSmrg Binary mode is not supposed to suddenly give you a bitstream, and 71*36ac495dSmrg if it is doing so in your program then you've discovered a bug in 72*36ac495dSmrg your vendor's compiler (or some other chapter of the C++ implementation, 73*36ac495dSmrg possibly the runtime system). 74*36ac495dSmrg </p><p>Second, using <code class="code"><<</code> to write and <code class="code">>></code> to 75*36ac495dSmrg read isn't going to work with the standard file stream classes, even 76*36ac495dSmrg if you use <code class="code">skipws</code> during reading. Why not? Because 77*36ac495dSmrg ifstream and ofstream exist for the purpose of <span class="emphasis"><em>formatting</em></span>, 78*36ac495dSmrg not reading and writing. Their job is to interpret the data into 79*36ac495dSmrg text characters, and that's exactly what you don't want to happen 80*36ac495dSmrg during binary I/O. 81*36ac495dSmrg </p><p>Third, using the <code class="code">get()</code> and <code class="code">put()/write()</code> member 82*36ac495dSmrg functions still aren't guaranteed to help you. These are 83*36ac495dSmrg "unformatted" I/O functions, but still character-based. 84*36ac495dSmrg (This may or may not be what you want, see below.) 85*36ac495dSmrg </p><p>Notice how all the problems here are due to the inappropriate use 86*36ac495dSmrg of <span class="emphasis"><em>formatting</em></span> functions and classes to perform something 87*36ac495dSmrg which <span class="emphasis"><em>requires</em></span> that formatting not be done? There are a 88*36ac495dSmrg seemingly infinite number of solutions, and a few are listed here: 89*36ac495dSmrg </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="quote">“<span class="quote">Derive your own fstream-type classes and write your own 90*36ac495dSmrg <</>> operators to do binary I/O on whatever data 91*36ac495dSmrg types you're using.</span>”</span> 92*36ac495dSmrg </p><p> 93*36ac495dSmrg This is a Bad Thing, because while 94*36ac495dSmrg the compiler would probably be just fine with it, other humans 95*36ac495dSmrg are going to be confused. The overloaded bitshift operators 96*36ac495dSmrg have a well-defined meaning (formatting), and this breaks it. 97*36ac495dSmrg </p></li><li class="listitem"><p> 98*36ac495dSmrg <span class="quote">“<span class="quote">Build the file structure in memory, then 99*36ac495dSmrg <code class="code">mmap()</code> the file and copy the 100*36ac495dSmrg structure. 101*36ac495dSmrg </span>”</span> 102*36ac495dSmrg </p><p> 103*36ac495dSmrg Well, this is easy to make work, and easy to break, and is 104*36ac495dSmrg pretty equivalent to using <code class="code">::read()</code> and 105*36ac495dSmrg <code class="code">::write()</code> directly, and makes no use of the 106*36ac495dSmrg iostream library at all... 107*36ac495dSmrg </p></li><li class="listitem"><p> 108*36ac495dSmrg <span class="quote">“<span class="quote">Use streambufs, that's what they're there for.</span>”</span> 109*36ac495dSmrg </p><p> 110*36ac495dSmrg While not trivial for the beginner, this is the best of all 111*36ac495dSmrg solutions. The streambuf/filebuf layer is the layer that is 112*36ac495dSmrg responsible for actual I/O. If you want to use the C++ 113*36ac495dSmrg library for binary I/O, this is where you start. 114*36ac495dSmrg </p></li></ul></div><p>How to go about using streambufs is a bit beyond the scope of this 115*36ac495dSmrg document (at least for now), but while streambufs go a long way, 116*36ac495dSmrg they still leave a couple of things up to you, the programmer. 117*36ac495dSmrg As an example, byte ordering is completely between you and the 118*36ac495dSmrg operating system, and you have to handle it yourself. 119*36ac495dSmrg </p><p>Deriving a streambuf or filebuf 120*36ac495dSmrg class from the standard ones, one that is specific to your data 121*36ac495dSmrg types (or an abstraction thereof) is probably a good idea, and 122*36ac495dSmrg lots of examples exist in journals and on Usenet. Using the 123*36ac495dSmrg standard filebufs directly (either by declaring your own or by 124*36ac495dSmrg using the pointer returned from an fstream's <code class="code">rdbuf()</code>) 125*36ac495dSmrg is certainly feasible as well. 126*36ac495dSmrg </p><p>One area that causes problems is trying to do bit-by-bit operations 127*36ac495dSmrg with filebufs. C++ is no different from C in this respect: I/O 128*36ac495dSmrg must be done at the byte level. If you're trying to read or write 129*36ac495dSmrg a few bits at a time, you're going about it the wrong way. You 130*36ac495dSmrg must read/write an integral number of bytes and then process the 131*36ac495dSmrg bytes. (For example, the streambuf functions take and return 132*36ac495dSmrg variables of type <code class="code">int_type</code>.) 133*36ac495dSmrg </p><p>Another area of problems is opening text files in binary mode. 134*36ac495dSmrg Generally, binary mode is intended for binary files, and opening 135*36ac495dSmrg text files in binary mode means that you now have to deal with all of 136*36ac495dSmrg those end-of-line and end-of-file problems that we mentioned before. 137*36ac495dSmrg </p><p> 138*36ac495dSmrg An instructive thread from comp.lang.c++.moderated delved off into 139*36ac495dSmrg this topic starting more or less at 140*36ac495dSmrg <a class="link" href="https://groups.google.com/forum/#!topic/comp.std.c++/D4e0q9eVSoc" target="_top">this post</a> 141*36ac495dSmrg and continuing to the end of the thread. (The subject heading is "binary iostreams" on both comp.std.c++ 142*36ac495dSmrg and comp.lang.c++.moderated.) Take special note of the replies by James Kanze and Dietmar Kühl. 143*36ac495dSmrg </p><p>Briefly, the problems of byte ordering and type sizes mean that 144*36ac495dSmrg the unformatted functions like <code class="code">ostream::put()</code> and 145*36ac495dSmrg <code class="code">istream::get()</code> cannot safely be used to communicate 146*36ac495dSmrg between arbitrary programs, or across a network, or from one 147*36ac495dSmrg invocation of a program to another invocation of the same program 148*36ac495dSmrg on a different platform, etc. 149*36ac495dSmrg </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="stringstreams.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="io.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="io_and_c.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Memory Based Streams </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Interacting with C</td></tr></table></div></body></html>