1=================================== 2Expected Differences vs DXC and FXC 3=================================== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11HLSL currently has two reference compilers, the `DirectX Shader Compiler (DXC) 12<https://github.com/microsoft/DirectXShaderCompiler/>`_ and the 13`Effect-Compiler (FXC) <https://learn.microsoft.com/en-us/windows/win32/direct3dtools/fxc>`_. 14The two reference compilers do not fully agree. Some known disagreements in the 15references are tracked on 16`DXC's GitHub 17<https://github.com/microsoft/DirectXShaderCompiler/issues?q=is%3Aopen+is%3Aissue+label%3Afxc-disagrees>`_, 18but many more are known to exist. 19 20HLSL as implemented by Clang will also not fully match either of the reference 21implementations, it is instead being written to match the `draft language 22specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_. 23 24This document is a non-exhaustive collection the known differences between 25Clang's implementation of HLSL and the existing reference compilers. 26 27General Principles 28------------------ 29 30Most of the intended differences between Clang and the earlier reference 31compilers are focused on increased consistency and correctness. Both reference 32compilers do not always apply language rules the same in all contexts. 33 34Clang also deviates from the reference compilers by providing different 35diagnostics, both in terms of the textual messages and the contexts in which 36diagnostics are produced. While striving for a high level of source 37compatibility with conforming HLSL code, Clang may produce earlier and more 38robust diagnostics for incorrect code or reject code that a reference compiler 39incorrectly accepted. 40 41Language Version 42================ 43 44Clang targets language compatibility for HLSL 2021 as implemented by DXC. 45Language features that were removed in earlier versions of HLSL may be added on 46a case-by-case basis, but are not planned for the initial implementation. 47 48Overload Resolution 49=================== 50 51Clang's HLSL implementation adopts C++ overload resolution rules as proposed for 52HLSL 202x based on proposal 53`0007 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0007-const-instance-methods.md>`_ 54and 55`0008 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_. 56 57The largest difference between Clang and DXC's overload resolution is the 58algorithm used for identifying best-match overloads. There are more details 59about the algorithmic differences in the :ref:`multi_argument_overloads` section 60below. There are three high level differences that should be highlighted: 61 62* **There should be no cases** where DXC and Clang both successfully 63 resolve an overload where the resolved overload is different between the two. 64* There are cases where Clang will successfully resolve an overload that DXC 65 wouldn't because we've trimmed the overload set in Clang to remove ambiguity. 66* There are cases where DXC will successfully resolve an overload that Clang 67 will not for two reasons: (1) DXC only generates partial overload sets for 68 builtin functions and (2) DXC resolves cases that probably should be ambiguous. 69 70Clang's implementation extends standard overload resolution rules to HLSL 71library functionality. This causes subtle changes in overload resolution 72behavior between Clang and DXC. Some examples include: 73 74.. code-block:: c++ 75 76 void halfOrInt16(half H); 77 void halfOrInt16(uint16_t U); 78 void halfOrInt16(int16_t I); 79 80 void takesDoubles(double, double, double); 81 82 cbuffer CB { 83 bool B; 84 uint U; 85 int I; 86 float X, Y, Z; 87 double3 R, G; 88 } 89 90 void takesSingleDouble(double); 91 void takesSingleDouble(vector<double, 1>); 92 93 void scalarOrVector(double); 94 void scalarOrVector(vector<double, 2>); 95 96 export void call() { 97 half H; 98 halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t). 99 100 #ifndef IGNORE_ERRORS 101 halfOrInt16(U); // All: Fails with call ambiguous between int16_t and uint16_t 102 // overloads 103 104 // asfloat16 is a builtin with overloads for half, int16_t, and uint16_t. 105 H = asfloat16(I); // DXC: Fails to resolve overload for int. 106 // Clang: Resolves to asfloat16(int16_t). 107 H = asfloat16(U); // DXC: Fails to resolve overload for int. 108 // Clang: Resolves to asfloat16(uint16_t). 109 #endif 110 H = asfloat16(0x01); // DXC: Resolves to asfloat16(half). 111 // Clang: Resolves to asfloat16(uint16_t). 112 113 takesDoubles(X, Y, Z); // Works on all compilers 114 #ifndef IGNORE_ERRORS 115 fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to 116 // double. 117 // Clang: Resolves to fma(double,double,double). 118 119 double D = dot(R, G); // DXC: Resolves to dot(double3, double3), fails DXIL Validation. 120 // FXC: Expands to compute double dot product with fmul/fadd 121 // Clang: Fails to resolve as ambiguous against 122 // dot(half, half) or dot(float, float) 123 #endif 124 125 #ifndef IGNORE_ERRORS 126 tan(B); // DXC: resolves to tan(float). 127 // Clang: Fails to resolve, ambiguous between integer types. 128 129 #endif 130 131 double D; 132 takesSingleDouble(D); // All: Fails to resolve ambiguous conversions. 133 takesSingleDouble(R); // All: Fails to resolve ambiguous conversions. 134 135 scalarOrVector(D); // All: Resolves to scalarOrVector(double). 136 scalarOrVector(R); // All: Fails to resolve ambiguous conversions. 137 } 138 139.. note:: 140 141 In Clang, a conscious decision was made to exclude the ``dot(vector<double,N>, vector<double,N>)`` 142 overload and allow overload resolution to resolve the 143 ``vector<float,N>`` overload. This approach provides ``-Wconversion`` 144 diagnostic notifying the user of the conversion rather than silently altering 145 precision relative to the other overloads (as FXC does) or generating code 146 that will fail validation (as DXC does). 147 148.. _multi_argument_overloads: 149 150Multi-Argument Overloads 151------------------------ 152 153In addition to the differences in single-element conversions, Clang and DXC 154differ dramatically in multi-argument overload resolution. C++ multi-argument 155overload resolution behavior (or something very similar) is required to 156implement 157`non-member operator overloading <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_. 158 159Clang adopts the C++ inspired language from the 160`draft HLSL specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_, 161where an overload ``f1`` is a better candidate than ``f2`` if for all arguments the 162conversion sequences is not worse than the corresponding conversion sequence and 163for at least one argument it is better. 164 165.. code-block:: c++ 166 167 cbuffer CB { 168 int I; 169 float X; 170 float4 V; 171 } 172 173 void twoParams(int, int); 174 void twoParams(float, float); 175 void threeParams(float, float, float); 176 void threeParams(float4, float4, float4); 177 178 export void call() { 179 twoParams(I, X); // DXC: resolves twoParams(int, int). 180 // Clang: Fails to resolve ambiguous conversions. 181 182 threeParams(X, V, V); // DXC: resolves threeParams(float4, float4, float4). 183 // Clang: Fails to resolve ambiguous conversions. 184 } 185 186For the examples above since ``twoParams`` called with mixed parameters produces 187implicit conversion sequences that are { ExactMatch, FloatingIntegral } and { 188FloatingIntegral, ExactMatch }. In both cases an argument has a worse conversion 189in the other sequence, so the overload is ambiguous. 190 191In the ``threeParams`` example the sequences are { ExactMatch, VectorTruncation, 192VectorTruncation } or { VectorSplat, ExactMatch, ExactMatch }, again in both 193cases at least one parameter has a worse conversion in the other sequence, so 194the overload is ambiguous. 195 196.. note:: 197 198 The behavior of DXC documented below is undocumented so this is gleaned from 199 observation and a bit of reading the source. 200 201DXC's approach for determining the best overload produces an integer score value 202for each implicit conversion sequence for each argument expression. Scores for 203casts are based on a bitmask construction that is complicated to reverse 204engineer. It seems that: 205 206* Exact match is 0 207* Dimension increase is 1 208* Promotion is 2 209* Integral -> Float conversion is 4 210* Float -> Integral conversion is 8 211* Cast is 16 212 213The masks are or'd against each other to produce a score for the cast. 214 215The scores of each conversion sequence are then summed to generate a score for 216the overload candidate. The overload candidate with the lowest score is the best 217candidate. If more than one overload are matched for the lowest score the call 218is ambiguous. 219