OpenSSL tries to adapt to this requirements in one of the following manners:
- Treats the received pass phrase as UTF-8 encoded and tries to re-encode it to UTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000 to U+FFFF, but becomes an expansion for any other character), or failing that, proceeds with step 2.
- Assumes that the pass phrase is encoded in ASCII or ISO-8859-1 and
opportunistically prepends each byte with a zero byte to obtain the UCS-2
encoding of the characters, which it stores as a BMPString.
Note that since there is no check of your locale, this may produce UCS-2 / UTF-16 characters that do not correspond to the original pass phrase characters for other character sets, such as any ISO-8859-X encoding other than ISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical" characters in the 0x80-0x9F range).
OpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why OpenSSL still does this, to be able to read files produced with older versions.
It should be noted that this approach isn't entirely fault free.
A pass phrase encoded in ISO-8859-2 could very well have a sequence such as 0xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE" and "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would be misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN SMALL LETTER I WITH DIAERESIS) if the pass phrase doesn't contain anything that would be invalid UTF-8. A pass phrase that contains this kind of byte sequence will give a different outcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0.
0x00 0xC3 0x00 0xAF # OpenSSL older than 1.1.0 0x00 0xEF # OpenSSL 1.1.0 and newer
On the same accord, anything encoded in UTF-8 that was given to OpenSSL older than 1.1.0 was misinterpreted as ISO-8859-1 sequences.
Also note that the sub-sections below discuss human readable pass phrases. This is particularly relevant for PKCS#12 objects, where human readable pass phrases are assumed. For other objects, it's as legitimate to use any byte sequence (such as a sequence of bytes from `/dev/urandom` that's been saved away), which makes any character encoding discussion irrelevant; in such cases, simply use the same byte sequence as it is.
For opening pass phrase protected objects where the character encoding that was used is unknown, or where the producing application is unknown, try one of the following:
- Try the pass phrase that you have as it is in the character encoding of your environment. It's possible that its byte sequence is exactly right.
- Convert the pass phrase to UTF-8 and try with the result. Specifically with PKCS#12, this should open up any object that was created according to the specification.
- Do a naieve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and
try with the result. This differs from the previous attempt because
ISO-8859-1 maps directly to U+0000 to U+00FF, which other non-UTF-8
character sets do not.
This also takes care of the case when a UTF-8 encoded string was used with OpenSSL older than 1.1.0. (for example, "ie", which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3 0x83 0xC2 0xAF when re-encoded in the naieve manner. The conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the erroneous/non-compliant encoding used by OpenSSL older than 1.1.0)
Licensed under the OpenSSL license (the "License"). You may not use this file except in compliance with the License. You can obtain a copy in the file LICENSE in the source distribution or at https://www.openssl.org/source/license.html.