1*4724848cSchristos=pod 2*4724848cSchristos 3*4724848cSchristos=encoding utf8 4*4724848cSchristos 5*4724848cSchristos=head1 NAME 6*4724848cSchristos 7*4724848cSchristospassphrase-encoding 8*4724848cSchristos- How diverse parts of OpenSSL treat pass phrases character encoding 9*4724848cSchristos 10*4724848cSchristos=head1 DESCRIPTION 11*4724848cSchristos 12*4724848cSchristosIn a modern world with all sorts of character encodings, the treatment of pass 13*4724848cSchristosphrases has become increasingly complex. 14*4724848cSchristosThis manual page attempts to give an overview over how this problem is 15*4724848cSchristoscurrently addressed in different parts of the OpenSSL library. 16*4724848cSchristos 17*4724848cSchristos=head2 The general case 18*4724848cSchristos 19*4724848cSchristosThe OpenSSL library doesn't treat pass phrases in any special way as a general 20*4724848cSchristosrule, and trusts the application or user to choose a suitable character set 21*4724848cSchristosand stick to that throughout the lifetime of affected objects. 22*4724848cSchristosThis means that for an object that was encrypted using a pass phrase encoded in 23*4724848cSchristosISO-8859-1, that object needs to be decrypted using a pass phrase encoded in 24*4724848cSchristosISO-8859-1. 25*4724848cSchristosUsing the wrong encoding is expected to cause a decryption failure. 26*4724848cSchristos 27*4724848cSchristos=head2 PKCS#12 28*4724848cSchristos 29*4724848cSchristosPKCS#12 is a bit different regarding pass phrase encoding. 30*4724848cSchristosThe standard stipulates that the pass phrase shall be encoded as an ASN.1 31*4724848cSchristosBMPString, which consists of the code points of the basic multilingual plane, 32*4724848cSchristosencoded in big endian (UCS-2 BE). 33*4724848cSchristos 34*4724848cSchristosOpenSSL tries to adapt to this requirements in one of the following manners: 35*4724848cSchristos 36*4724848cSchristos=over 4 37*4724848cSchristos 38*4724848cSchristos=item 1. 39*4724848cSchristos 40*4724848cSchristosTreats the received pass phrase as UTF-8 encoded and tries to re-encode it to 41*4724848cSchristosUTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000 42*4724848cSchristosto U+FFFF, but becomes an expansion for any other character), or failing that, 43*4724848cSchristosproceeds with step 2. 44*4724848cSchristos 45*4724848cSchristos=item 2. 46*4724848cSchristos 47*4724848cSchristosAssumes that the pass phrase is encoded in ASCII or ISO-8859-1 and 48*4724848cSchristosopportunistically prepends each byte with a zero byte to obtain the UCS-2 49*4724848cSchristosencoding of the characters, which it stores as a BMPString. 50*4724848cSchristos 51*4724848cSchristosNote that since there is no check of your locale, this may produce UCS-2 / 52*4724848cSchristosUTF-16 characters that do not correspond to the original pass phrase characters 53*4724848cSchristosfor other character sets, such as any ISO-8859-X encoding other than 54*4724848cSchristosISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical" 55*4724848cSchristoscharacters in the 0x80-0x9F range). 56*4724848cSchristos 57*4724848cSchristos=back 58*4724848cSchristos 59*4724848cSchristosOpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why 60*4724848cSchristosOpenSSL still does this, to be able to read files produced with older versions. 61*4724848cSchristos 62*4724848cSchristosIt should be noted that this approach isn't entirely fault free. 63*4724848cSchristos 64*4724848cSchristosA pass phrase encoded in ISO-8859-2 could very well have a sequence such as 65*4724848cSchristos0xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE" 66*4724848cSchristosand "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would 67*4724848cSchristosbe misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN 68*4724848cSchristosSMALL LETTER I WITH DIAERESIS) I<if the pass phrase doesn't contain anything that 69*4724848cSchristoswould be invalid UTF-8>. 70*4724848cSchristosA pass phrase that contains this kind of byte sequence will give a different 71*4724848cSchristosoutcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0. 72*4724848cSchristos 73*4724848cSchristos 0x00 0xC3 0x00 0xAF # OpenSSL older than 1.1.0 74*4724848cSchristos 0x00 0xEF # OpenSSL 1.1.0 and newer 75*4724848cSchristos 76*4724848cSchristosOn the same accord, anything encoded in UTF-8 that was given to OpenSSL older 77*4724848cSchristosthan 1.1.0 was misinterpreted as ISO-8859-1 sequences. 78*4724848cSchristos 79*4724848cSchristos=head2 OSSL_STORE 80*4724848cSchristos 81*4724848cSchristosL<ossl_store(7)> acts as a general interface to access all kinds of objects, 82*4724848cSchristospotentially protected with a pass phrase, a PIN or something else. 83*4724848cSchristosThis API stipulates that pass phrases should be UTF-8 encoded, and that any 84*4724848cSchristosother pass phrase encoding may give undefined results. 85*4724848cSchristosThis API relies on the application to ensure UTF-8 encoding, and doesn't check 86*4724848cSchristosthat this is the case, so what it gets, it will also pass to the underlying 87*4724848cSchristosloader. 88*4724848cSchristos 89*4724848cSchristos=head1 RECOMMENDATIONS 90*4724848cSchristos 91*4724848cSchristosThis section assumes that you know what pass phrase was used for encryption, 92*4724848cSchristosbut that it may have been encoded in a different character encoding than the 93*4724848cSchristosone used by your current input method. 94*4724848cSchristosFor example, the pass phrase may have been used at a time when your default 95*4724848cSchristosencoding was ISO-8859-1 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 96*4724848cSchristos0xEF 0x76 0x65), and you're now in an environment where your default encoding 97*4724848cSchristosis UTF-8 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76 98*4724848cSchristos0x65). 99*4724848cSchristosWhenever it's mentioned that you should use a certain character encoding, it 100*4724848cSchristosshould be understood that you either change the input method to use the 101*4724848cSchristosmentioned encoding when you type in your pass phrase, or use some suitable tool 102*4724848cSchristosto convert your pass phrase from your default encoding to the target encoding. 103*4724848cSchristos 104*4724848cSchristosAlso note that the sub-sections below discuss human readable pass phrases. 105*4724848cSchristosThis is particularly relevant for PKCS#12 objects, where human readable pass 106*4724848cSchristosphrases are assumed. 107*4724848cSchristosFor other objects, it's as legitimate to use any byte sequence (such as a 108*4724848cSchristossequence of bytes from `/dev/urandom` that's been saved away), which makes any 109*4724848cSchristoscharacter encoding discussion irrelevant; in such cases, simply use the same 110*4724848cSchristosbyte sequence as it is. 111*4724848cSchristos 112*4724848cSchristos=head2 Creating new objects 113*4724848cSchristos 114*4724848cSchristosFor creating new pass phrase protected objects, make sure the pass phrase is 115*4724848cSchristosencoded using UTF-8. 116*4724848cSchristosThis is default on most modern Unixes, but may involve an effort on other 117*4724848cSchristosplatforms. 118*4724848cSchristosSpecifically for Windows, setting the environment variable 119*4724848cSchristosC<OPENSSL_WIN32_UTF8> will have anything entered on [Windows] console prompt 120*4724848cSchristosconverted to UTF-8 (command line and separately prompted pass phrases alike). 121*4724848cSchristos 122*4724848cSchristos=head2 Opening existing objects 123*4724848cSchristos 124*4724848cSchristosFor opening pass phrase protected objects where you know what character 125*4724848cSchristosencoding was used for the encryption pass phrase, make sure to use the same 126*4724848cSchristosencoding again. 127*4724848cSchristos 128*4724848cSchristosFor opening pass phrase protected objects where the character encoding that was 129*4724848cSchristosused is unknown, or where the producing application is unknown, try one of the 130*4724848cSchristosfollowing: 131*4724848cSchristos 132*4724848cSchristos=over 4 133*4724848cSchristos 134*4724848cSchristos=item 1. 135*4724848cSchristos 136*4724848cSchristosTry the pass phrase that you have as it is in the character encoding of your 137*4724848cSchristosenvironment. 138*4724848cSchristosIt's possible that its byte sequence is exactly right. 139*4724848cSchristos 140*4724848cSchristos=item 2. 141*4724848cSchristos 142*4724848cSchristosConvert the pass phrase to UTF-8 and try with the result. 143*4724848cSchristosSpecifically with PKCS#12, this should open up any object that was created 144*4724848cSchristosaccording to the specification. 145*4724848cSchristos 146*4724848cSchristos=item 3. 147*4724848cSchristos 148*4724848cSchristosDo a naïve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and try 149*4724848cSchristoswith the result. 150*4724848cSchristosThis differs from the previous attempt because ISO-8859-1 maps directly to 151*4724848cSchristosU+0000 to U+00FF, which other non-UTF-8 character sets do not. 152*4724848cSchristos 153*4724848cSchristosThis also takes care of the case when a UTF-8 encoded string was used with 154*4724848cSchristosOpenSSL older than 1.1.0. 155*4724848cSchristos(for example, C<ï>, which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3 156*4724848cSchristos0x83 0xC2 0xAF when re-encoded in the naïve manner. 157*4724848cSchristosThe conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the 158*4724848cSchristoserroneous/non-compliant encoding used by OpenSSL older than 1.1.0) 159*4724848cSchristos 160*4724848cSchristos=back 161*4724848cSchristos 162*4724848cSchristos=head1 SEE ALSO 163*4724848cSchristos 164*4724848cSchristosL<evp(7)>, 165*4724848cSchristosL<ossl_store(7)>, 166*4724848cSchristosL<EVP_BytesToKey(3)>, L<EVP_DecryptInit(3)>, 167*4724848cSchristosL<PEM_do_header(3)>, 168*4724848cSchristosL<PKCS12_parse(3)>, L<PKCS12_newpass(3)>, 169*4724848cSchristosL<d2i_PKCS8PrivateKey_bio(3)> 170*4724848cSchristos 171*4724848cSchristos=head1 COPYRIGHT 172*4724848cSchristos 173*4724848cSchristosCopyright 2018-2020 The OpenSSL Project Authors. All Rights Reserved. 174*4724848cSchristos 175*4724848cSchristosLicensed under the OpenSSL license (the "License"). You may not use 176*4724848cSchristosthis file except in compliance with the License. You can obtain a copy 177*4724848cSchristosin the file LICENSE in the source distribution or at 178*4724848cSchristosL<https://www.openssl.org/source/license.html>. 179*4724848cSchristos 180*4724848cSchristos=cut 181