xref: /llvm-project/libcxx/docs/DesignDocs/FileTimeType.rst (revision 11e2975810acd6abde9071818e03634d99492b54)
1==============
2File Time Type
3==============
4
5.. contents::
6   :local:
7
8.. _file-time-type-motivation:
9
10Motivation
11==========
12
13The filesystem library provides interfaces for getting and setting the last
14write time of a file or directory. The interfaces use the ``file_time_type``
15type, which is a specialization of ``chrono::time_point`` for the
16"filesystem clock". According to [fs.filesystem.syn]
17
18  trivial-clock is an implementation-defined type that satisfies the
19  Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of
20  representing and measuring file time values. Implementations should ensure
21  that the resolution and range of file_time_type reflect the operating
22  system dependent resolution and range of file time values.
23
24
25On POSIX systems, file times are represented using the ``timespec`` struct,
26which is defined as follows:
27
28.. code-block:: cpp
29
30  struct timespec {
31    time_t tv_sec;
32    long   tv_nsec;
33  };
34
35To represent the range and resolution of ``timespec``, we need to (A) have
36nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``).
37
38As the standard requires us to use the ``chrono`` interface, we have to define
39our own filesystem clock which specifies the period and representation of
40the time points and duration it provides. It will look like this:
41
42.. code-block:: cpp
43
44  struct _FilesystemClock {
45    using period = nano;
46    using rep = TBD; // What is this?
47
48    using duration = chrono::duration<rep, period>;
49    using time_point = chrono::time_point<_FilesystemClock>;
50
51    // ... //
52  };
53
54  using file_time_type = _FilesystemClock::time_point;
55
56
57To get nanosecond resolution, we simply define ``period`` to be ``std::nano``.
58But what type can we use as the arithmetic representation that is capable
59of representing the range of the ``timespec`` struct?
60
61Problems To Consider
62====================
63
64Before considering solutions, let's consider the problems they should solve,
65and how important solving those problems are:
66
67
68Having a Smaller Range than ``timespec``
69----------------------------------------
70
71One solution to the range problem is to simply reduce the resolution of
72``file_time_type`` to be less than that of nanoseconds. This is what libc++'s
73initial implementation of ``file_time_type`` did; it's also what
74``std::system_clock`` does. As a result, it can represent time points about
75292 thousand years on either side of the epoch, as opposed to only 292 years
76at nanosecond resolution.
77
78``timespec`` can represent time points +/- 292 billion years from the epoch
79(just in case you needed a time point 200 billion years before the big bang,
80and with nanosecond resolution).
81
82To get the same range, we would need to drop our resolution to that of seconds
83to come close to having the same range.
84
85This begs the question, is the range problem "really a problem"? Sane usages
86of file time stamps shouldn't exceed +/- 300 years, so should we care to support it?
87
88I believe the answer is yes. We're not designing the filesystem time API, we're
89providing glorified C++ wrappers for it. If the underlying API supports
90a value, then we should too. Our wrappers should not place artificial restrictions
91on users that are not present in the underlying filesystem.
92
93Having a smaller range that the underlying filesystem forces the
94implementation to report ``value_too_large`` errors when it encounters a time
95point that it can't represent. This can cause the call to ``last_write_time``
96to throw in cases where the user was confident the call should succeed. (See below)
97
98
99.. code-block:: cpp
100
101  #include <filesystem>
102  using namespace std::filesystem;
103
104  // Set the times using the system interface.
105  void set_file_times(const char* path, struct timespec ts) {
106    timespec both_times[2];
107    both_times[0] = ts;
108    both_times[1] = ts;
109    int result = ::utimensat(AT_FDCWD, path, both_times, 0);
110    assert(result != -1);
111  }
112
113  // Called elsewhere to set the file time to something insane, and way
114  // out of the 300 year range we might expect.
115  void some_bad_persons_code() {
116    struct timespec new_times;
117    new_times.tv_sec = numeric_limits<time_t>::max();
118    new_times.tv_nsec = 0;
119    set_file_times("/tmp/foo", new_times); // OK, supported by most FSes
120  }
121
122  int main(int, char**) {
123    path p = "/tmp/foo";
124    file_status st = status(p);
125    if (!exists(st) || !is_regular_file(st))
126      return 1;
127    if ((st.permissions() & perms::others_read) == perms::none)
128      return 1;
129    // It seems reasonable to assume this call should succeed.
130    file_time_type tp = last_write_time(p); // BAD! Throws value_too_large.
131    return 0;
132  }
133
134
135Having a Smaller Resolution than ``timespec``
136---------------------------------------------
137
138As mentioned in the previous section, one way to solve the range problem
139is by reducing the resolution. But matching the range of ``timespec`` using a
14064 bit representation requires limiting the resolution to seconds.
141
142So we might ask: Do users "need" nanosecond precision? Is seconds not good enough?
143I limit my consideration of the point to this: Why was it not good enough for
144the underlying system interfaces? If it wasn't good enough for them, then it
145isn't good enough for us. Our job is to match the filesystems range and
146representation, not design it.
147
148
149Having a Larger Range than ``timespec``
150----------------------------------------
151
152We should also consider the opposite problem of having a ``file_time_type``
153that is able to represent a larger range than ``timespec``. At least in
154this case ``last_write_time`` can be used to get and set all possible values
155supported by the underlying filesystem; meaning ``last_write_time(p)`` will
156never throw an overflow error when retrieving a value.
157
158However, this introduces a new problem, where users are allowed to attempt to
159create a time point beyond what the filesystem can represent. Two particular
160values which cause this are ``file_time_type::min()`` and
161``file_time_type::max()``. As a result, the following code would throw:
162
163.. code-block:: cpp
164
165  void test() {
166    last_write_time("/tmp/foo", file_time_type::max()); // Throws
167    last_write_time("/tmp/foo", file_time_type::min()); // Throws.
168  }
169
170Apart from cases explicitly using ``min`` and ``max``, I don't see users taking
171a valid time point, adding a couple hundred billions of years in error,
172and then trying to update a file's write time to that value very often.
173
174Compared to having a smaller range, this problem seems preferable. At least
175now we can represent any time point the filesystem can, so users won't be forced
176to revert back to system interfaces to avoid limitations in the C++ STL.
177
178I posit that we should only consider this concern *after* we have something
179with at least the same range and resolution of the underlying filesystem. The
180latter two problems are much more important to solve.
181
182Potential Solutions And Their Complications
183===========================================
184
185Source Code Portability Across Implementations
186-----------------------------------------------
187
188As we've discussed, ``file_time_type`` needs a representation that uses more
189than 64 bits. The possible solutions include using ``__int128_t``, emulating a
190128 bit integer using a class, or potentially defining a ``timespec`` like
191arithmetic type. All three will allow us to, at minimum, match the range
192and resolution, and the last one might even allow us to match them exactly.
193
194But when considering these potential solutions we need to consider more than
195just the values they can represent. We need to consider the effects they will
196have on users and their code. For example, each of them breaks the following
197code in some way:
198
199.. code-block:: cpp
200
201  // Bug caused by an unexpected 'rep' type returned by count.
202  void print_time(path p) {
203    // __int128_t doesn't have streaming operators, and neither would our
204    // custom arithmetic types.
205    cout << last_write_time(p).time_since_epoch().count() << endl;
206  }
207
208  // Overflow during creation bug.
209  file_time_type timespec_to_file_time_type(struct timespec ts) {
210    // woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation
211    // this may overflow before it's converted to a file_time_type.
212    auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec);
213    return file_time_type(dur);
214  }
215
216  file_time_type correct_timespec_to_file_time_type(struct timespec ts) {
217    // This is the correct version of the above example, where we
218    // avoid using the chrono typedefs as they're not sufficient.
219    // Can we expect users to avoid this bug?
220    using fs_seconds = chrono::duration<file_time_type::rep>;
221    using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>;
222    auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec);
223    return file_time_type(dur);
224  }
225
226  // Implicit truncation during conversion bug.
227  intmax_t get_time_in_seconds(path p) {
228    using fs_seconds = duration<file_time_type::rep, ratio<1, 1> >;
229    auto tp = last_write_time(p);
230
231    // This works with truncation for __int128_t, but what does it do for
232    // our custom arithmetic types.
233    return duration_cast<fs_seconds>().count();
234  }
235
236
237Each of the above examples would require a user to adjust their filesystem code
238to the particular eccentricities of the representation, hopefully only in such
239a way that the code is still portable across implementations.
240
241At least some of the above issues are unavoidable, no matter what
242representation we choose. But some representations may be quirkier than others,
243and, as I'll argue later, using an actual arithmetic type (``__int128_t``)
244provides the least aberrant behavior.
245
246
247Chrono and ``timespec`` Emulation.
248----------------------------------
249
250One of the options we've considered is using something akin to ``timespec``
251to represent the ``file_time_type``. It only seems natural seeing as that's
252what the underlying system uses, and because it might allow us to match
253the range and resolution exactly. But would it work with chrono? And could
254it still act at all like a ``timespec`` struct?
255
256For ease of consideration, let's consider what the implementation might
257look like.
258
259.. code-block:: cpp
260
261  struct fs_timespec_rep {
262    fs_timespec_rep(long long v)
263      : tv_sec(v / nano::den), tv_nsec(v % nano::den)
264    { }
265  private:
266    time_t tv_sec;
267    long tv_nsec;
268  };
269  bool operator==(fs_timespec_rep, fs_timespec_rep);
270  fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep);
271  // ... arithmetic operators ... //
272
273The first thing to notice is that we can't construct ``fs_timespec_rep`` like
274a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to
275constructing it from a single 64 bit integer.
276
277We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values
278directly. A ``chrono::duration`` represents its value as a tick period and a
279number of ticks stored using ``rep``. The representation is unaware of the
280tick period it is being used to represent, but ``timespec`` is setup to assume
281a nanosecond tick period; which is the only case where the names ``tv_sec``
282and ``tv_nsec`` match the values they store.
283
284When we convert a nanosecond duration to seconds, ``fs_timespec_rep`` will
285use ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec`` the
286remaining seconds. Let's consider how this might cause a bug were users allowed
287to manipulate the fields directly.
288
289.. code-block:: cpp
290
291  template <class Period>
292  timespec convert_to_timespec(duration<fs_time_rep, Period> dur) {
293    fs_timespec_rep rep = dur.count();
294    return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds.
295  }
296
297  template <class Duration>
298  Duration convert_to_duration(timespec ts) {
299    Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds.
300    return file_time_type(dur);
301    file_time_type tp = last_write_time(p);
302    auto dur =
303  }
304
305  time_t extract_seconds(file_time_type tp) {
306    // Converting to seconds is a silly bug, but I could see it happening.
307    using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>;
308    auto secs = duration_cast<Secs>(tp.time_since_epoch());
309    // tv_sec is now representing gigaseconds.
310    return secs.count().tv_sec; // Oops!
311  }
312
313Despite ``fs_timespec_rep`` not being usable in any manner resembling
314``timespec``, it still might buy us our goal of matching its range exactly,
315right?
316
317Sort of. Chrono provides a specialization point which specifies the minimum
318and maximum values for a custom representation. It looks like this:
319
320.. code-block:: cpp
321
322  template <>
323  struct duration_values<fs_timespec_rep> {
324    static fs_timespec_rep zero();
325    static fs_timespec_rep min();
326    static fs_timespec_rep max() { // assume friendship.
327      fs_timespec_rep val;
328      val.tv_sec = numeric_limits<time_t>::max();
329      val.tv_nsec = nano::den - 1;
330      return val;
331    }
332  };
333
334Notice that ``duration_values`` doesn't tell the representation what tick
335period it's actually representing. This would indeed correctly limit the range
336of ``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But
337nanoseconds isn't the only tick period it will be used to represent. For
338example:
339
340.. code-block:: cpp
341
342  void test() {
343    using rep = file_time_type::rep;
344    using fs_nsec = duration<rep, nano>;
345    using fs_sec = duration<rep>;
346    fs_nsec nsecs(fs_seconds::max()); // Truncates
347  }
348
349Though the above example may appear silly, I think it follows from the incorrect
350notion that using a ``timespec`` rep in chrono actually makes it act as if it
351were an actual ``timespec``.
352
353Interactions with 32 bit ``time_t``
354-----------------------------------
355
356Up until now we've only be considering cases where ``time_t`` is 64 bits, but what
357about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case
358for 32 bit builds).
359
360When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit
361``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation
362involved. And nor should we, as it would suffer from the numerous complications
363described by this paper.
364
365Obviously our implementation for 32-bit builds should act as similarly to the
36664-bit build as possible. Code which compiles in one, should compile in the other.
367This consideration is important when choosing between ``__int128_t`` and
368emulating ``timespec``. The solution which provides the most uniformity with
369the least eccentricity is the preferable one.
370
371Summary
372=======
373
374The ``file_time_type`` time point is used to represent the write times for files.
375Its job is to act as part of a C++ wrapper for less ideal system interfaces. The
376underlying filesystem uses the ``timespec`` struct for the same purpose.
377
378However, the initial implementation of ``file_time_type`` could not represent
379either the range or resolution of ``timespec``, making it unsuitable. Fixing
380this requires an implementation which uses more than 64 bits to store the
381time point.
382
383We primarily considered two solutions: Using ``__int128_t`` and using a
384arithmetic emulation of ``timespec``. Each has its pros and cons, and both
385come with more than one complication.
386
387The Potential Solutions
388-----------------------
389
390``long long`` - The Status Quo
391~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
392
393Pros:
394
395* As a type ``long long`` plays the nicest with others:
396
397  * It works with streaming operators and other library entities which support
398    builtin integer types, but don't support ``__int128_t``.
399  * Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs.
400
401Cons:
402
403* It cannot provide the same resolution as ``timespec`` unless we limit it
404  to a range of +/- 300 years from the epoch.
405* It cannot provide the same range as ``timespec`` unless we limit its resolution
406  to seconds.
407* ``last_write_time`` has to report an error when the time reported by the filesystem
408  is unrepresentable.
409
410__int128_t
411~~~~~~~~~~~
412
413Pros:
414
415* It is an integer type.
416* It makes the implementation simple and efficient.
417* Acts exactly like other arithmetic types.
418* Can be implicitly converted to a builtin integer type by the user.
419
420  * This is important for doing things like:
421
422    .. code-block:: cpp
423
424      void c_interface_using_time_t(const char* p, time_t);
425
426      void foo(path p) {
427        file_time_type tp = last_write_time(p);
428        time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count();
429        c_interface_using_time_t(p.c_str(), secs);
430      }
431
432Cons:
433
434* It isn't always available (but on 64 bit machines, it normally is).
435* It causes ``file_time_type`` to have a larger range than ``timespec``.
436* It doesn't always act the same as other builtin integer types. For example
437  with ``cout`` or ``to_string``.
438* Allows implicit truncation to 64 bit integers.
439* It can be implicitly converted to a builtin integer type by the user,
440  truncating its value.
441
442Arithmetic ``timespec`` Emulation
443~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
444
445Pros:
446
447* It has the exact same range and resolution of ``timespec`` when representing
448  a nanosecond tick period.
449* It's always available, unlike ``__int128_t``.
450
451Cons:
452
453* It has a larger range when representing any period longer than a nanosecond.
454* Doesn't actually allow users to use it like a ``timespec``.
455* The required representation of using ``tv_sec`` to store the giga tick count
456  and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer,
457  but complicates a lot.
458* It isn't a builtin integer type, and can't be used anything like one.
459* Chrono can be made to work with it, but not nicely.
460* Emulating arithmetic classes come with their own host of problems regarding
461  overload resolution (Each operator needs three SFINAE constrained versions of
462  it in order to act like builtin integer types).
463* It offers little over simply using ``__int128_t``.
464* It acts the most differently than implementations using an actual integer type,
465  which has a high chance of breaking source compatibility.
466
467
468Selected Solution - Using ``__int128_t``
469=========================================
470
471The solution I selected for libc++ is using ``__int128_t`` when available,
472and otherwise falling back to using ``long long`` with nanosecond precision.
473
474When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation
475provides same resolution and a greater range than ``timespec``. Otherwise
476it still provides the same resolution, but is limited to a range of +/- 300
477years. This final case should be rather rare, as ``__int128_t``
478is normally available in 64-bit builds, and ``time_t`` is normally 32-bits
479during 32-bit builds.
480
481Although falling back to ``long long`` and nanosecond precision is less than
482ideal, it also happens to be the implementation provided by both libstdc++
483and MSVC. (So that makes it better, right?)
484
485Although the ``timespec`` emulation solution is feasible and would largely
486do what we want, it comes with too many complications, potential problems
487and discrepancies when compared to "normal" chrono time points and durations.
488
489An emulation of a builtin arithmetic type using a class is never going to act
490exactly the same, and the difference will be felt by users. It's not reasonable
491to expect them to tolerate and work around these differences. And once
492we commit to an ABI it will be too late to change. Committing to this seems
493risky.
494
495Therefore, ``__int128_t`` seems like the better solution.
496