1*4d6fc14bSjoerg============================================= 2*4d6fc14bSjoergEnable std::unique_ptr [[clang::trivial_abi]] 3*4d6fc14bSjoerg============================================= 4*4d6fc14bSjoerg 5*4d6fc14bSjoergBackground 6*4d6fc14bSjoerg========== 7*4d6fc14bSjoerg 8*4d6fc14bSjoergConsider the follow snippets 9*4d6fc14bSjoerg 10*4d6fc14bSjoerg 11*4d6fc14bSjoerg.. code-block:: cpp 12*4d6fc14bSjoerg 13*4d6fc14bSjoerg void raw_func(Foo* raw_arg) { ... } 14*4d6fc14bSjoerg void smart_func(std::unique_ptr<Foo> smart_arg) { ... } 15*4d6fc14bSjoerg 16*4d6fc14bSjoerg Foo* raw_ptr_retval() { ... } 17*4d6fc14bSjoerg std::unique_ptr<Foo*> smart_ptr_retval() { ... } 18*4d6fc14bSjoerg 19*4d6fc14bSjoerg 20*4d6fc14bSjoerg 21*4d6fc14bSjoergThe argument ``raw_arg`` could be passed in a register but ``smart_arg`` could not, due to current 22*4d6fc14bSjoergimplementation. 23*4d6fc14bSjoerg 24*4d6fc14bSjoergSpecifically, in the ``smart_arg`` case, the caller secretly constructs a temporary ``std::unique_ptr`` 25*4d6fc14bSjoergin its stack-frame, and then passes a pointer to it to the callee in a hidden parameter. 26*4d6fc14bSjoergSimilarly, the return value from ``smart_ptr_retval`` is secretly allocated in the caller and 27*4d6fc14bSjoergpassed as a secret reference to the callee. 28*4d6fc14bSjoerg 29*4d6fc14bSjoerg 30*4d6fc14bSjoergGoal 31*4d6fc14bSjoerg=================== 32*4d6fc14bSjoerg 33*4d6fc14bSjoerg``std::unique_ptr`` is passed directly in a register. 34*4d6fc14bSjoerg 35*4d6fc14bSjoergDesign 36*4d6fc14bSjoerg====== 37*4d6fc14bSjoerg 38*4d6fc14bSjoerg* Annotate the two definitions of ``std::unique_ptr`` with ``clang::trivial_abi`` attribute. 39*4d6fc14bSjoerg* Put the attribuate behind a flag because this change has potential compilation and runtime breakages. 40*4d6fc14bSjoerg 41*4d6fc14bSjoerg 42*4d6fc14bSjoergThis comes with some side effects: 43*4d6fc14bSjoerg 44*4d6fc14bSjoerg* ``std::unique_ptr`` parameters will now be destroyed by callees, rather than callers. 45*4d6fc14bSjoerg It is worth noting that destruction by callee is not unique to the use of trivial_abi attribute. 46*4d6fc14bSjoerg In most Microsoft's ABIs, arguments are always destroyed by the callee. 47*4d6fc14bSjoerg 48*4d6fc14bSjoerg Consequently, this may change the destruction order for function parameters to an order that is non-conforming to the standard. 49*4d6fc14bSjoerg For example: 50*4d6fc14bSjoerg 51*4d6fc14bSjoerg 52*4d6fc14bSjoerg .. code-block:: cpp 53*4d6fc14bSjoerg 54*4d6fc14bSjoerg struct A { ~A(); }; 55*4d6fc14bSjoerg struct B { ~B(); }; 56*4d6fc14bSjoerg struct C { C(A, unique_ptr<B>, A) {} }; 57*4d6fc14bSjoerg C c{{}, make_unique<B>, {}}; 58*4d6fc14bSjoerg 59*4d6fc14bSjoerg 60*4d6fc14bSjoerg In a conforming implementation, the destruction order for C::C's parameters is required to be ``~A(), ~B(), ~A()`` but with this mode enabled, we'll instead see ``~B(), ~A(), ~A()``. 61*4d6fc14bSjoerg 62*4d6fc14bSjoerg* Reduced code-size. 63*4d6fc14bSjoerg 64*4d6fc14bSjoerg 65*4d6fc14bSjoergPerformance impact 66*4d6fc14bSjoerg------------------ 67*4d6fc14bSjoerg 68*4d6fc14bSjoergGoogle has measured performance improvements of up to 1.6% on some large server macrobenchmarks, and a small reduction in binary sizes. 69*4d6fc14bSjoerg 70*4d6fc14bSjoergThis also affects null pointer optimization 71*4d6fc14bSjoerg 72*4d6fc14bSjoergClang's optimizer can now figure out when a `std::unique_ptr` is known to contain *non*-null. 73*4d6fc14bSjoerg(Actually, this has been a *missed* optimization all along.) 74*4d6fc14bSjoerg 75*4d6fc14bSjoerg 76*4d6fc14bSjoerg.. code-block:: cpp 77*4d6fc14bSjoerg 78*4d6fc14bSjoerg struct Foo { 79*4d6fc14bSjoerg ~Foo(); 80*4d6fc14bSjoerg }; 81*4d6fc14bSjoerg std::unique_ptr<Foo> make_foo(); 82*4d6fc14bSjoerg void do_nothing(const Foo&) 83*4d6fc14bSjoerg 84*4d6fc14bSjoerg void bar() { 85*4d6fc14bSjoerg auto x = make_foo(); 86*4d6fc14bSjoerg do_nothing(*x); 87*4d6fc14bSjoerg } 88*4d6fc14bSjoerg 89*4d6fc14bSjoerg 90*4d6fc14bSjoergWith this change, ``~Foo()`` will be called even if ``make_foo`` returns ``unique_ptr<Foo>(nullptr)``. 91*4d6fc14bSjoergThe compiler can now assume that ``x.get()`` cannot be null by the end of ``bar()``, because 92*4d6fc14bSjoergthe deference of ``x`` would be UB if it were ``nullptr``. (This dereference would not have caused 93*4d6fc14bSjoerga segfault, because no load is generated for dereferencing a pointer to a reference. This can be detected with ``-fsanitize=null``). 94*4d6fc14bSjoerg 95*4d6fc14bSjoerg 96*4d6fc14bSjoergPotential breakages 97*4d6fc14bSjoerg------------------- 98*4d6fc14bSjoerg 99*4d6fc14bSjoergThe following breakages were discovered by enabling this change and fixing the resulting issues in a large code base. 100*4d6fc14bSjoerg 101*4d6fc14bSjoerg- Compilation failures 102*4d6fc14bSjoerg 103*4d6fc14bSjoerg - Function definitions now require complete type ``T`` for parameters with type ``std::unique_ptr<T>``. The following code will no longer compile. 104*4d6fc14bSjoerg 105*4d6fc14bSjoerg .. code-block:: cpp 106*4d6fc14bSjoerg 107*4d6fc14bSjoerg class Foo; 108*4d6fc14bSjoerg void func(std::unique_ptr<Foo> arg) { /* never use `arg` directly */ } 109*4d6fc14bSjoerg 110*4d6fc14bSjoerg - Fix: Remove forward-declaration of ``Foo`` and include its proper header. 111*4d6fc14bSjoerg 112*4d6fc14bSjoerg- Runtime Failures 113*4d6fc14bSjoerg 114*4d6fc14bSjoerg - Lifetime of ``std::unique_ptr<>`` arguments end earlier (at the end of the callee's body, rather than at the end of the full expression containing the call). 115*4d6fc14bSjoerg 116*4d6fc14bSjoerg .. code-block:: cpp 117*4d6fc14bSjoerg 118*4d6fc14bSjoerg util::Status run_worker(std::unique_ptr<Foo>); 119*4d6fc14bSjoerg void func() { 120*4d6fc14bSjoerg std::unique_ptr<Foo> smart_foo = ...; 121*4d6fc14bSjoerg Foo* owned_foo = smart_foo.get(); 122*4d6fc14bSjoerg // Currently, the following would "work" because the argument to run_worker() is deleted at the end of func() 123*4d6fc14bSjoerg // With the new calling convention, it will be deleted at the end of run_worker(), 124*4d6fc14bSjoerg // making this an access to freed memory. 125*4d6fc14bSjoerg owned_foo->Bar(run_worker(std::move(smart_foo))); 126*4d6fc14bSjoerg ^ 127*4d6fc14bSjoerg // <<<Crash expected here 128*4d6fc14bSjoerg } 129*4d6fc14bSjoerg 130*4d6fc14bSjoerg - Lifetime of local *returned* ``std::unique_ptr<>`` ends earlier. 131*4d6fc14bSjoerg 132*4d6fc14bSjoerg Spot the bug: 133*4d6fc14bSjoerg 134*4d6fc14bSjoerg .. code-block:: cpp 135*4d6fc14bSjoerg 136*4d6fc14bSjoerg std::unique_ptr<Foo> create_and_subscribe(Bar* subscriber) { 137*4d6fc14bSjoerg auto foo = std::make_unique<Foo>(); 138*4d6fc14bSjoerg subscriber->sub([&foo] { foo->do_thing();} ); 139*4d6fc14bSjoerg return foo; 140*4d6fc14bSjoerg } 141*4d6fc14bSjoerg 142*4d6fc14bSjoerg One could point out this is an obvious stack-use-after return bug. 143*4d6fc14bSjoerg With the current calling convention, running this code with ASAN enabled, however, would not yield any "issue". 144*4d6fc14bSjoerg So is this a bug in ASAN? (Spoiler: No) 145*4d6fc14bSjoerg 146*4d6fc14bSjoerg This currently would "work" only because the storage for ``foo`` is in the caller's stackframe. 147*4d6fc14bSjoerg In other words, ``&foo`` in callee and ``&foo`` in the caller are the same address. 148*4d6fc14bSjoerg 149*4d6fc14bSjoergASAN can be used to detect both of these. 150