Doc. no.: N4449
Date: 2015-04-09
Project: Programming Language C++, Library Evolution Working Group
Reply-to: Zhihao Yuan <zhihao.yuan at rackspace dot com>

Message Digest Library for C++

Motivation

Cryptographic hash functions hold irreplaceable roles in a large variety of applications, since security and data integrity are topics that cannot be dismissed to the applications involving data exchanging. Being compared to the ordinary hash functions being used with the unordered containers, the cryptographic functions are significantly different in purposes and design, which result in different properties on interface, implementation, and performance. In order to satisfy such demands in C++, these functions should be standardized as a part of the C++ standard library.

Scope

This paper focuses on exposing the core functionalities of the cryptographic hash functions in an easy-to-use and efficient way; some use cases that can be provided in a higher level abstraction are not covered, including but not limited to Base64 digest, fingerprinting user-defined types, and key derivation.

This paper does not discuss std::hash, because to co-work with the unordered containers is not a main purpose of cryptographic hash functions.

This paper has no direct relationship with N3980, but some functionalities described here can serve as the underlying implementation of a HashAlgorithm. More details can be found in Design Decisions.

Examples

namespace hashlib = std::hashlib;

hashlib::sha1 h;

assert(h.hexdigest() == "da39a3ee5e6b4b0d3255bfef95601890afd80709");

h.update("C-string");
h.update(std::string());

auto h2 = h;

assert(h2 == h);                    // hasher is equationally complete
assert(h2.digest() == h.digest());  // semantics same as above

h.update(buf, sizeof(buf));         // sized input

assert(h.digest() == h.digest());   // can get result multiple times

using rmd160 = hashlib::hasher<rmd160_provider>;  // user-defined

Design Decisions

Interface

The design is unashamedly stolen from Python’s hashlib module[1]. The details are as follows:

[](auto&&... ts) { return hashlib::sha1(ts...).digest(); }
[&](auto&&... ts) { h.update(ts...); }
log(h);                   // user can insert a log line without making
v.push_back(h.digest());  // the rest of the code UB
                          // and the log line does not need to be log(sha1(h))

Or, if one has a hash value of a file block with a known starting point in a file, and we want to know the size of the block, we can retrieve and compare the digest for each byte we consumed.

This goal is achieved by finalizing the hash on a copy of the underlying hash context. The state sizes of the cryptographic hash functions are constrained[3] (actual context sizes: 96 for SHA1, 208 or 216 for SHA512) – even hashing 1 byte is more costly than copying the context. Considering the aforementioned benefits, I claim that the “overhead” is worthy.

Extensibility

The rich interface introduced above is featured by the class template hashlib::hasher<HashProvider>. A user can add a new hasher by instantiating the template with a struct satisfying the HashProvider requirements by describing the implementation of the hash function in terms of a context_type and some core operations involving such a type.

The details are discussed in Technical Specifications.

Choice of algorithms

I propose to require the following message digest algorithms in the standard:

Standard library implementations need to make all the required algorithms available. If an implementation can achieve this by shipping a header and rely on the base system libraries, we can save an enormous effort to implement and to maintain those algorithms. The design of the message digest library in this proposal made this approach easier, and the choice of algorithms also took the algorithm availability on major platforms into consideration (the last 3 are for reference):

 GNU(*)FreeBSDAppleWindowsAndroidJava.NetPython
 libclibmdCommonCryptoCryptoAPIOpenSSL 
MD5
SHA1
SHA224
SHA256
SHA384
SHA512
RMD160

* These functions are private to crypt(3).

MD5 is included for supporting existing services; any algorithms weaker than that are excluded.

Based on the research done so far, I believe that the aforementioned 4 algorithms are the most important ones which should be required to support, and possible to support (without duplicated work) on the major platforms.

Technical Specifications

This section gives the layout of a wording.

Message digests

This subclause defines a class template hasher as a common interface to the cryptographic hash and message digest algorithms, and the typedefs sha1, sha256, sha512, and md5 for the unspecified specializations of hasher to implement, respectively, the FIPS secure hash algorithms SHA1, SHA256, and SHA512 [FIPS 180-2] as well as RSA’s MD5 algorithm [RFC 1321].

Through out this subclause, to specialize a template with a template type parameter named HashProvider, the corresponding template argument shall meet the HashProvider requirements.

HashProvider requirements

A class H meets the HashProvider requirements if the expressions shown in the Table below are valid and have the indicated semantics. In that Table and throughout this section:

a) C is a trivially copyable type to hold the state of the message digest algorithm;

b) c is an lvalue of C;

c) p is a value of type const char*;

d) n is a value of size_t;

e) md is a value of type unsigned char*.

Expression Return type Assertion/note
Pre/post-condition
H::context_type C C is CopyConstructible and Destructible.
H::digest_size size_t
H::block_size size_t
H::init(&c) post: c is ready to accept data input.
H::update(&c, p, n) Requires: p points to at least n contiguous bytes of input.
Effects: Hashes the n bytes whose the first byte is designated by p.
pre: c is ready to accept data input.
H::final(md, &c) Requires: md points to an object having contiguous space for digest_size bytes of output.
Effects: Places the message digest in md.
pre: c is ready to accept data input.
post: c is not ready to accept data input.

Header <hashlib> synopsis

namespace std::hashlib {  // N4026

  template <typename HashProvider>
  struct hasher
  {
    static constexpr auto digest_size = HashProvider::digest_size;
    static constexpr auto block_size  = HashProvider::block_size;

    typedef typename HashProvider::context_type      context_type;
    typedef std::array<unsigned char, digest_size>   digest_type;

    hasher() noexcept;

    explicit hasher(char const* s);
    explicit hasher(char const* s, size_t n);

    template <typename StringLike>
      explicit hasher(StringLike const& bytes) noexcept;

    void update(char const* s);
    void update(char const* s, size_t n);

    template <typename StringLike>
      void update(StringLike const& bytes) noexcept;

    digest_type digest() const noexcept;

    template <typename CharT = char,
              typename Traits = char_traits<CharT>,
              typename Allocator = allocator<CharT>>
      basic_string<CharT, Traits, Allocator>
        hexdigest() const;

  private:
    context_type ctx_;  // exposition only
  };

  template <typename HashProvider>
  bool operator==(hasher<HashProvider> const& a,
                  hasher<HashProvider> const& b) noexcept;

  template <typename HashProvider>
  bool operator!=(hasher<HashProvider> const& a,
                  hasher<HashProvider> const& b) noexcept;

  template <typename CharT, typename Traits,
            typename HashProvider>
    basic_ostream<CharT, Traits>&
      operator<<(basic_ostream<CharT, Traits>& os,
                 hasher<HashProvider> const& h);

  typedef hasher<unspecified>   md5;
  typedef hasher<unspecified>   sha1;
  typedef hasher<unspecified>   sha256;
  typedef hasher<unspecified>   sha512;

}

Class template hasher

hasher() noexcept;

Effects: Constructs an object of hasher by calling HashProvider::init(&ctx_).

explicit hasher(char const* s);

Effects: Equivalent to calling update(s) on a default initialized *this.

explicit hasher(char const* s, size_t n);

Effects: Equivalent to calling update(s, n) on a default initialized *this.

template <typename StringLike>
  explicit hasher(StringLike const& bytes) noexcept;

Effects: Equivalent to calling update(bytes) on a default initialized *this.

void update(char const* s);

Effects: Equivalent to update(s, strlen(s)).

void update(char const* s, size_t n);

Effects: Equivalent to HashProvider::update(&ctx_, s, n).

template <typename StringLike>
  void update(StringLike const& bytes) noexcept;

Effects: Equivalent to update(bytes.data(), bytes.size()).

digest_type digest() const noexcept;

Let md be a default constructed object of digest_type.

Effects: Equivalent to

    auto tmp_ctx = ctx_;
    HashProvider::final(md.data(), &tmp_ctx);

Returns: md.

template <typename CharT = char,
          typename Traits = char_traits<CharT>,
          typename Allocator = allocator<CharT>>
  basic_string<CharT, Traits, Allocator>
    hexdigest() const;

TBD

template <typename HashProvider>
bool operator==(hasher<HashProvider> const& a,
                hasher<HashProvider> const& b) noexcept;

Returns: a.digest() == b.digest().

template <typename HashProvider>
bool operator!=(hasher<HashProvider> const& a,
                hasher<HashProvider> const& b) noexcept;

Returns: !(a == b).

template <typename CharT, typename Traits,
          typename HashProvider>
  basic_ostream<CharT, Traits>&
    operator<<(basic_ostream<CharT, Traits>& os,
               hasher<HashProvider> const& h);

Effects: Equivalent to os << h.template hexdigest<CharT, Traits>().

Implementation

A prototype is available at https://github.com/lichray/cpp-deuceclient/blob/master/include/deuceclient/hashlib.h.

References

[1]hashlib – Secure hashes and message digests.” The Python Standard Library. https://docs.python.org/2/library/hashlib.html

[2] Hinnant, Howard E., Vinnie Falco, and John Bytheway. “Proposed Wording.” Types Don’t Know #. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n3980.html#wording

[3] “SHA-1.” Wikipedia: The Free Encyclopedia. http://en.wikipedia.org/wiki/SHA-1#Comparison_of_SHA_functions