Formatted output

Published Proposal,

ISO/IEC JTC1/SC22/WG21 14882: Programming Language — C++

"Привет, κόσμος!"
1. Introduction

A new I/O-agnostic text formatting library was introduced in C++20 ([FORMAT]). This paper proposes integrating it with standard I/O facilities via a simple and intuitive API achieving the following goals:

9. Motivating examples

Consider a common task of printing formatted text to stdout:

C++20 Proposed
std::cout << std::format("Hello, {}!", name);
std::print("Hello, {}!", name);

The proposed std::print function improves usability, avoids allocating a temporary std::string object and calling operator<< which performs formatted I/O on text that is already formatted. The number of function calls is reduced to one which, together with std::vformat-like type erasure, results in much smaller binary code (see § 13 Binary code).

Existing alternatives in C++20:

Code Comments
std::cout << "Hello, " << name << "!";
Requires even more formatted I/O function calls; message is interleaved with parameters; can result in interleaved output.
std::printf("Hello, %s!", name);
Only works if name is a null-terminated character string.
auto msg = std::format("Hello, {}!", name);
std::fputs(msg.c_str(), stdout);
Constructs a temporary string; requires a call to c_str() and a separate I/O function call, although potentially cheaper than operator<<.

Another problem is formatting of Unicode text:

std::cout << "Привет, κόσμος!";
If the source and execution encoding is UTF-8 this will produce the expected output on most GNU/Linux and macOS systems. Unfortunately on Windows it is almost guaranteed to produce mojibake despite the fact that the system is fully capable of printing Unicode, for example
Привет, κόσμος!
even when compiled with /utf-8 using Visual C++ ([MSVC-UTF8]). This happens because the terminal assumes code page 437 in this case independently of the execution encoding.

With the proposed paper

std::print("Привет, κόσμος!");
will print "Привет, κόσμος!" as expected allowing programmers to write Unicode text portably using standard facilities. This will bring C++ on par with other languages where such functionality has been available for a long time. For comparison this just works in Python 3.8 on Windows with the same active code page and console settings:
>>> print("Привет, κόσμος!")
Привет, κόσμος!

This problem is independent of formatting char8_t strings but the same solution applies there. Adding charN_t and wchar_t overloads will be explored in a separate paper in a more general context.

10. API and naming

Many programming languages provide functions for printing text to standard output, often combined with formatting:

Language Function(s)
C printf [N2176]
C#/.NET Console.Write [DOTNET-WRITE]
COBOL DISPLAY statement [N0147]
Fortran print and write statements [N2162]
Go Printf [GO-FMT]
Java PrintStream.format, PrintStream.print, PrintStream.printf [JAVA-PRINT]
JavaScript console.log [WHATWG-CONSOLE]
Perl printf [PERL-PRINTF]
Python print statement or function [PY-FUNC]
R print [R-PRINT]
Ruby print and printf [RUBY-PRINT]
Rust print! [RUST-PRINT]
Swift print [SWIFT-PRINT]

Variations of print[f] appear to be the most popular naming choice for this functionality. It is either provided as a free function (most common) or a member function (less common) together with a global object representing standard output stream. Notable exceptions are COBOL, Fortran, and Python 2 which have dedicated language statements and Rust where print! is a function-like macro.

We propose adding a free function called print with overloads for writing to the standard output (the default) and an explicitly passed output stream object. The default output stream can be either stdout or std::cout. We propose using stdout for the following reasons:

In some languages like Python print only provides the default formatting although some formatting control may be achieved by other means such as named arguments and/or doing formatting manually via str.format or string interpolation. The current paper doesn’t propose such a default formatting facility. A search in a large Python codebase revealed that over 70% of print calls either take a string literal or use interpolation and this doesn’t even account for string variables. These use cases are covered by the current proposal with better usability and potentially better performance because there are no separate formatting function calls.

Since stdout doesn’t have an associated locale we propose using the current global locale for locale-specific formatting which is consistent with format. With cout or another explicitly passed stream, the stream’s locale will be used. In all cases the default formatting is locale-independent.

Another option is to make print a member function of basic_ostream. This would make usage somewhat more awkward:

std::cout.print("Hello, {}!", name);
A free function can also be overloaded to take FILE* to simplify migration (possibly automated) of code from printf to the new facility.

There are multiple approaches to appending a trailing newline:

We propose not appending a newline automatically for consistency with printf and iostreams:

std::print("Hello, {}!", name);    // doesn’t print a newline
std::print("Hello, {}!\n", name);  // prints a newline

Additionally we can provide a function that appends a newline:

std::println("Hello, {}!", name);  // prints a newline

Although println doesn’t provide much usability improvement compared to print with explicit '\n', it has been an occasionally requested feature in the fmt library ([FMT]).

Another question is which header non-ostream overloads of formatted output functions should go to. Possible options:

Earlier versions of the paper proposed <io> analogous to <cstdio> so that the future I/O facilities that don’t depend on ostream could be added there. This was changed to a more narrow-focused <print> but <io> can be added in the future once a symmetric input facility becomes available. Using <ostream> is undesirable because this header and its transitive dependencies are very big (42 thousand lines preprocessed on libc++):

% echo '#include <ostream>' | clang++ -E -x c++ - | wc -l

It also pulls in a lot of unrelated symbols such as ostream insertion operators, global cout, cerr, clog variables and their wchar_t counterparts.

ostream overloads are added to the <ostream> header.

11. Unicode

We can prevent mojibake in the Unicode example by detecting if the string literal encoding is UTF-8 and dispatching to a different function that correctly handles Unicode, for example:

constexpr bool is_utf8() {
  const unsigned char micro[] = "\u00B5";
  return sizeof(micro) == 3 && micro[0] == 0xC2 && micro[1] == 0xB5;

template <typename... Args>
void print(string_view fmt, const Args&... args) {
  if (is_utf8())
    vprint_unicode(fmt, make_format_args(args...));
    vprint_nonunicode(fmt, make_format_args(args...));
where the vprint_unicode function formats and prints text in UTF-8 using the native system API that supports Unicode and vprint_nonunicode does the same for other encodings. The latter ensures that interoperability with code using legacy encodings is preserved even though print is a new API and it is not strictly necessary. If calling the system API requires transcoding we propose substituting invalid code units with U+FFFD � REPLACEMENT CHARACTER which is consistent with the treatment of malformed UTF-8 in UTF-8-native terminals. For example
#include <stdio.h>

int main() {
  puts("\xc3\x28"); // Invalid 2 Octet Sequence
prints ( in iTerm2 and ?( in macOS Terminal. So whether transcoding is done or not in the UTF-8 case, you will normally get similar observed behavior.

In Visual C++ is_utf8 will return true if the literal (execution) encoding is UTF-8, which is enabled by the /execution-charset:utf-8 compiler flags or other means, and false otherwise. Literal encoding detection can be implemented in a more elegant way using [P1885].

Note that ANSI escape codes for specifying coding systems ([ISO2022]) are not considered a native system API that supports Unicode for the purposes of this proposal.

We propose using the literal encoding for the following reasons:

  1. Consistency with the design of std::format which is locale-independent by default ([P0645]) and disallows implicitly mixing encodings e.g. passing a narrow string into a wide std::format is ill-formed.

  2. Consistency with the encoding used for width estimation ([P1868]). The standard wording doesn’t mention the literal encoding explicitly but the fact that the format strings are either literals or other compile-time strings ([P2216]) makes it the only conformant option.

  3. Safety: the result of formatted_size does not depend on the global locale by default and a buffer allocated with this size can be passed safely to format_to even if the locale has been changed in the meantime, possibly from another thread.

  4. Implementation and usage experience.

  5. In the vast majority of cases format strings are literals. For example, analyzing a sample of 100 printf calls from [CODESEARCH] showed that 98 of them are string literals and 2 are string literals wrapped in the _ gettext macro.

  6. The active code page and the terminal encoding being unrelated on popular Windows localizations such as Russian where the former is CP1251 while the latter is CP866. Instead of assuming one encoding regardless of the string origin which would often result in mojibake, an explicit encoding indication can be done via the standard extension API, e.g. (exposition only)

    print("Привет, {}!", locale_enc(string_in_locale_encoding));
    This is already possible to implement by providing appropriate std::formatter specializations.

This approach has been implemented in the fmt library ([FMT]), successfully tested and used on a variety of platforms.

Users can sometimes restrict the set of used characters to the common subset among multiple encodings (often ASCII) in which case encoding becomes mostly irrelevant. Such "polyglot" strings are fully supported for legacy encodings and partially supported for UTF-8 by the current proposal even though mixing encodings in such a way is a clearly bad practice from a general software engineering point of view.

Here’s an example output on Windows:

At the same time interoperability with legacy code is preserved when literal encoding is not UTF-8. In particular, in case of EBCDIC, Shift JIS or a non-Unicode Windows code page, print will perform no transcoding and the text will be printed as is.

The following table summarizes the behavior of formatted output facilities in different programming languages:

Linux macOS Windows
Language Terminal Redirect Terminal Redirect Terminal Redirect
C Correct UTF-8 Correct UTF-8 Wrong UTF-8
Go Correct UTF-8 Correct UTF-8 Correct UTF-8
Java Correct UTF-8* Correct UTF-8* Wrong CP1251 (lossy)
JavaScript Correct UTF-8* Correct UTF-8* Correct UTF-8*
Python Correct UTF-8* Correct UTF-8* Correct Error
Rust Correct UTF-8 Correct UTF-8 Correct UTF-8

* - the output is transcoded from a different UTF representation.

Correct means that the test message "Привет, κόσμος!" was fully readable in the terminal output. None of the tested language facilities were able to produce readable output when piped through the standard findstr command on Windows. Java gave the worst results producing both mojibake and replacement characters in this case: "╧ЁштхЄ, ??????!". Most other languages produced valid UTF-8 when the output of findstr was redirected to a file.

The current paper proposes following C, Go, JavaScript and Rust and preserving the original encoding (modulo UTF conversion). The only difference compared to printf is that we fix the console output on Windows. Java’s approach is problematic for the following reasons:

The full listings of test programs are given in Appendix A: Unicode tests.

12. Performance

All the performance benefits of std::format ([FORMAT]) automatically carry over to this proposal. In particular, locale-independence by default reduces global state and makes formatting more efficient compared to stdio and iostreams. There are fewer function calls (see § 13 Binary code) and no shared formatting state compared to iostreams.

The following benchmark compares the reference implementation of print with printf and ostream. This benchmark formats a simple message and prints it to the output stream redirected to /dev/null. It uses the Google Benchmark library [GOOGLE-BENCH] to measure timings:

#include <cstdio>
#include <iostream>

#include <benchmark/benchmark.h>
#include <fmt/ostream.h>

void printf(benchmark::State& s) {
  while (s.KeepRunning())
    std::printf("The answer is %d.\n", 42);

void ostream(benchmark::State& s) {
  while (s.KeepRunning())
    std::cout << "The answer is " << 42 << ".\n";

void print(benchmark::State& s) {
  while (s.KeepRunning())
    fmt::print("The answer is {}.\n", 42);

void print_cout(benchmark::State& s) {
  while (s.KeepRunning())
    fmt::print(std::cout, "The answer is {}.\n", 42);

void print_cout_sync(benchmark::State& s) {
  while (s.KeepRunning())
    fmt::print(std::cout, "The answer is {}.\n", 42);


The benchmark was compiled with Apple clang version 11.0.0 (clang-1100.0.33.17) with -O3 -DNDEBUG and run on macOS 10.15.4. Below are the results:

Run on (8 X 2800 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 8388K (x1)
Load Average: 1.83, 1.88, 1.82
Benchmark                Time             CPU   Iterations
printf                87.0 ns         86.9 ns      7834009
ostream                255 ns          255 ns      2746434
print                 78.4 ns         78.3 ns      9095989
print_cout            89.4 ns         89.4 ns      7702973
print_cout_sync       91.5 ns         91.4 ns      7903889

Both print and printf are ~3 times faster than cout even with synchronization to the standard C streams turned off. print is 14% faster when printing to stdout than to cout. For this reason and because print doesn’t use formatting facilities of ostream we propose using stdout as the default output stream and providing an overload for writing to ostream.

On Windows 10 with Visual C++ 2019 the results are similar although the difference between print writing to stdout and cout is smaller with stdout being 7% faster:

Run on (1 X 2808 MHz CPU )
CPU Caches:
  L1 Data 32K (x1)
  L1 Instruction 32K (x1)
  L2 Unified 262K (x1)
  L3 Unified 8388K (x1)
Benchmark                Time             CPU   Iterations
printf                 835 ns          816 ns       746667
ostream               2410 ns         2400 ns       280000
print                  580 ns          572 ns      1120000
print_cout             623 ns          614 ns      1120000
print_cout_sync        615 ns          614 ns      1120000

13. Binary code

We propose minimizing per-call binary code size by applying the type erasure mechanism from [P0645]. In this approach all the formatting and printing logic is implemented in a non-variadic function vprint. Inline variadic print function only constructs a format_args object, representing an array of type-erased argument references, and passes it to vprint*. Here is a simplified example:

void vprint(string_view fmt, format_args args);

template<class... Args>
  inline void print(string_view fmt, const Args&... args) {
    return vprint(fmt, make_format_args(args...));

We provide vprint* overloads so that users can apply the same technique to their own code. For example:

void vlog(log_level level, string_view fmt, format_args args) {
  // Print the log level and use vprint* overloads to format and print the
  // message.

template<class... Args>
  inline void log(log_level level, string_view fmt, const Args&... args) {
    return vlog(level, fmt, make_format_args(args...));

Here vlog that implements the logging logic is not parameterized on formatting argument types resulting in less code bloat compared to a naive templated version. As a real-world example, this technique has been applied in the Folly Logger ([FOLLY]) bringing ~5x binary size reduction per logging function call.

Below we compare the reference implementation of print to standard formatting facilities. All the code snippets are compiled with clang (Apple clang version 11.0.0 clang-1100.0.33.17) with -O3 -DNDEBUG -c -std=c++17 and the resulting binaries are disassembled with objdump -S:

void printf_test(const char* name) {
  printf("Hello, %s!", name);
       0:       55      pushq   %rbp
       1:       48 89 e5        movq    %rsp, %rbp
       4:       48 89 fe        movq    %rdi, %rsi
       7:       48 8d 3d 08 00 00 00    leaq    8(%rip), %rdi
       e:       31 c0   xorl    %eax, %eax
      10:       5d      popq    %rbp
      11:       e9 00 00 00 00  jmp     0 <__Z11printf_testPKc+0x16>
void ostream_test(const char* name) {
  std::cout << "Hello, " << name << "!";
       0:       55      pushq   %rbp
       1:       48 89 e5        movq    %rsp, %rbp
       4:       41 56   pushq   %r14
       6:       53      pushq   %rbx
       7:       48 89 fb        movq    %rdi, %rbx
       a:       48 8b 3d 00 00 00 00    movq    (%rip), %rdi
      11:       48 8d 35 6c 03 00 00    leaq    876(%rip), %rsi
      18:       ba 07 00 00 00  movl    $7, %edx
      1d:       e8 00 00 00 00  callq   0 <__Z12ostream_testPKc+0x22>
      22:       49 89 c6        movq    %rax, %r14
      25:       48 89 df        movq    %rbx, %rdi
      28:       e8 00 00 00 00  callq   0 <__Z12ostream_testPKc+0x2d>
      2d:       4c 89 f7        movq    %r14, %rdi
      30:       48 89 de        movq    %rbx, %rsi
      33:       48 89 c2        movq    %rax, %rdx
      36:       e8 00 00 00 00  callq   0 <__Z12ostream_testPKc+0x3b>
      3b:       48 8d 35 4a 03 00 00    leaq    842(%rip), %rsi
      42:       ba 01 00 00 00  movl    $1, %edx
      47:       48 89 c7        movq    %rax, %rdi
      4a:       5b      popq    %rbx
      4b:       41 5e   popq    %r14
      4d:       5d      popq    %rbp
      4e:       e9 00 00 00 00  jmp     0 <__Z12ostream_testPKc+0x53>
      53:       66 2e 0f 1f 84 00 00 00 00 00   nopw    %cs:(%rax,%rax)
      5d:       0f 1f 00        nopl    (%rax)
void print_test(const char* name) {
  print("Hello, {}!", name);
       0:	55 	pushq	%rbp
       1:	48 89 e5 	movq	%rsp, %rbp
       4:	48 83 ec 10 	subq	$16, %rsp
       8:	48 89 7d f0 	movq	%rdi, -16(%rbp)
       c:	48 8d 3d 19 00 00 00 	leaq	25(%rip), %rdi
      13:	48 8d 4d f0 	leaq	-16(%rbp), %rcx
      17:	be 0a 00 00 00 	movl	$10, %esi
      1c:	ba 0d 00 00 00 	movl	$13, %edx
      21:	e8 00 00 00 00 	callq	0 <__Z10print_testPKc+0x26>
      26:	48 83 c4 10 	addq	$16, %rsp
      2a:	5d 	popq	%rbp
      2b:	c3 	retq

The code generated for the print_test function that uses the reference implementation of print described in this proposal is more than 2x smaller than the ostream code and has one function call instead of three. The printf code is further 2x smaller but doesn’t have any error handling. Adding error handling would make its code size closer to that of print.

The following factors contribute to the difference in binary code size between print and printf:

14. Impact on existing code

The current proposal adds new functions to the headers <print> and <ostream> and should have no impact on existing code.

15. Implementation

The proposed print function has been implemented in the open-source fmt library [FMT] and has been in use for about 6 years.

Rust’s standard output facility uses essentially the same approach for preventing mojibake when printing to console on Windows ([RUST-STDIO]). The main difference is that invalid code units are reported as errors in Rust.

LLVM’s raw_ostream [LLVM-OSTREAM] also implements this approach when writing to console on Windows. The main difference is that in case of invalid UTF-8 it falls back on writing raw (not transcoded) data.

16. Wording

Add an entry for __cpp_lib_print to section "Header <version> synopsis [version.syn]", in a place that respects the table’s current alphabetic order:

#define __cpp_lib_print **placeholder**  // also in <print> and <ostream>

Add the header <print> to the "C++ library headers" table in [headers], in a place that respects the table’s current alphabetic order.

Add after subsection "Header synopsis [iomanip.syn]":

?.?.? Header <print> synopsis [print.syn]
namespace std {
  template<class... Args>
    void print(format-string<Args...> fmt, Args&&... args);
  template<class... Args>
    void print(FILE* stream,format-string<Args...> fmt, Args&&... args);

  template<class... Args>
    void println(format-string<Args...> fmt, Args&&... args);
  template<class... Args>
    void println(FILE* stream,format-string<Args...> fmt, Args&&... args);

  void vprint_unicode(string_view fmt, format_args args);
  void vprint_unicode(FILE* stream, string_view fmt, format_args args);

  void vprint_nonunicode(string_view fmt, format_args args);
  void vprint_nonunicode(FILE* stream, string_view fmt, format_args args);

Modify subsection "Header <ostream> synopsis [ostream.syn]":

template<class charT, class traits, class T>
basic_ostream<charT, traits>& operator<<(basic_ostream<charT, traits>&& os, const T& x);

template<class... Args>
  void print(ostream& os, format-string<Args...> fmt, Args&&... args);
template<class... Args>
  void println(ostream& os, format-string<Args...> fmt, Args&&... args);

void vprint_unicode(ostream& os, string_view fmt, format_args args);
void vprint_nonunicode(ostream& os, string_view fmt, format_args args);

Add a new subsection to "Formatting and manipulators [iostream.format]":

?.?.? Print functions [print.fun]
template<class... Args>
  void print(format-string<Args...> fmt, Args&&... args);

1 Effects: Equivalent to:

  print(stdout, fmt, std::forward<Args>(args)...);
template<class... Args>
  void print(FILE* stream,format-string<Args...> fmt, Args&&... args);

2 Effects: If the ordinary literal encoding ([lex.charset]) is UTF-8, equivalent to:

  vprint_unicode(stream, fmt.str, make_format_args(std::forward<Args>(args)...));
Otherwise, equivalent to:
  vprint_nonunicode(stream, fmt.str, make_format_args(std::forward<Args>(args)...));
template<class... Args>
  void println(format-string<Args...> fmt, Args&&... args);

3 Effects: Equivalent to:

  println(stdout, fmt, std::forward<Args>(args)...);
template<class... Args>
  void println(FILE* stream,format-string<Args...> fmt, Args&&... args);

4 Effects: Equivalent to:

  print(stream, "{}\n", format(fmt, std::forward<Args>(args)...));
void vprint_unicode(string_view fmt, format_args args);

5 Effects: Equivalent to:

  vprint_unicode(stdout, fmt, args);
void vprint_unicode(FILE* stream, string_view fmt, format_args args);
6 Preconditions: stream is a valid pointer to an output C stream. 7 Effects: The function initializes an automatic variable via
string out = vformat(fmt, args);
If stream refers to a terminal capable of displaying Unicode, writes out to the terminal using the native Unicode API; if out contains invalid code units, the behavior is undefined and implementations are encouraged to diagnose it. Otherwise writes out to stream unchanged.

[ Note: On POSIX and Windows, stream referring to a terminal means that, respectively, isatty(fileno(stream)) and GetConsoleMode(_get_osfhandle(_fileno(stream)), ...) return nonzero. — end note ]

[ Note: On Windows, the native Unicode API is WriteConsoleW. — end note ]

8 Throws: Any exception thrown by the call to vformat ([format.err.report]). system_error if writing to the terminal or stream fails. May throw bad_alloc.

9 Recommended practice: If invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with U+FFFD REPLACEMENT CHARACTER per The Unicode Standard Version 14.0 – Core Specification, Chapter 3.9.

void vprint_nonunicode(string_view fmt, format_args args);

10 Effects: Equivalent to:

  vprint_nonunicode(stdout, fmt, args);
void vprint_nonunicode(FILE* stream, string_view fmt, format_args args);
11 Preconditions: stream is a valid pointer to an output C stream. 12 Effects: Writes the result of vformat(fmt, args) to stream.

13 Throws: Any exception thrown by the call to vformat ([format.err.report]). system_error if writing to stream fails. May throw bad_alloc.

Add subsection "Print [ostream.formatted.print]" to "Formatted output functions [ostream.formatted]":

template<class... Args>
  void print(ostream& os,format-string<Args...> fmt, Args&&... args);

1 Effects: If the ordinary literal encoding ([lex.charset]) is UTF-8, equivalent to:

  vprint_unicode(os, fmt.str, make_format_args(std::forward<Args>(args)...));
Otherwise, equivalent to:
  vprint_nonunicode(os, fmt.str, make_format_args(std::forward<Args>(args)...));
template<class... Args>
  void println(ostream& os,format-string<Args...> fmt, Args&&... args);

2 Effects: Equivalent to:

  print(os, "{}\n", format(fmt, std::forward<Args>(args)...));
void vprint_unicode(ostream& os, string_view fmt, format_args args);
void vprint_nonunicode(ostream& os, string_view fmt, format_args args);
3 Effects: Behaves as a formatted output function ([ostream.formatted.reqmts]) of os, except that: After constructing a sentry object, the function initializes an automatic variable via
string out = vformat(os.getloc(), fmt, args);
If the function is vprint_unicode and os is a stream that refers to a terminal capable of displaying Unicode which is determined in an implementation-defined manner, writes out to the terminal using the native Unicode API; if out contains invalid code units, the behavior is undefined and implementations are encouraged to diagnose it.

Otherwise (if os is not such a stream or the function is vprint_nonunicode), inserts the character sequence [out.begin(), out.end()) into os.

If writing to the terminal or inserting into os fails, calls os.setstate(ios_base::badbit) (which may throw ios_base::failure).

4 Recommended practice: For vprint_unicode, if invoking the native Unicode API requires transcoding, implementations should substitute invalid code units with U+FFFD REPLACEMENT CHARACTER per The Unicode Standard Version 14.0 – Core Specification, Chapter 3.9.

Add to Bibliography:

– The Unicode® Standard Version 14.0 – Core Specification

Appendix A: Unicode tests

This appendix gives full listings of programs for testing Unicode handling in various formatting facilities as well as test commands and their output on different platforms. The code contains additional sanity checks to ensure that the strings are encoded in some form of UTF as opposed to a legacy encoding.

C (test.c):

#include <stdio.h>
#include <stdlib.h>

int main() {
  const char* message = "Привет, κόσμος!\n";
  if ((unsigned char)message[0] != 0xD0 && (unsigned char)message[1] != 0x9F)

Go (test.go):

package main

import "fmt"
import "log"

func main() {
  var message = "Привет, κόσμος!"
  if message[0] != 0xD0 && message[1] != 0x9F {
    log.Fatal("wrong encoding")

Java (Test.java):

class Test {
  public static void main(String[] args) {
    String message = "Привет, κόσμος!\n";
    if (message.charAt(0) != 0x41F) throw new RuntimeException();

JavaScript / Node.js (test.js):

message = "Привет, κόσμος!";
if (message.charCodeAt(0) != 0x41F) throw "wrong encoding";

Python (test.py):

message = "Привет, κόσμος!"
if ord(message[0]) != 0x41F:
    raise Exception()

Rust (test.rs):

fn main() {
  if "Привет, κόσμος!".chars().nth(0).unwrap() as u32 != 0x41F {
  println!("Привет, κόσμος!");


$ cc test.c -o c-test
$ ./c-test
Привет, κόσμος!
$ ./c-test > out-c-linux.txt

$ go build -o go-test test.go
$ ./go-test
Привет, κόσμος!
$ ./go-test > out-go-linux.txt

$ java Test
Привет, κόσμος!
$ java Test > out-java-linux.txt

$ node test.js
Привет, κόσμος!
$ node test.js > out-js-linux.txt

$ python3 test.py
Привет, κόσμος!
$ python3 test.py > out-py-linux.txt

$ rustc test.rs -o rust-test
$ ./rust-test
Привет, κόσμος!
$ ./rust-test > out-rust-linux.txt

All output files are in UTF-8:

Linux configuration:


% cc test.c -o c-test
% ./c-test
Привет, κόσμος!
% ./c-test > out-c-macos.txt

% go build -o test-go test.go
% ./test-go
Привет, κόσμος!
% ./test-go > out-go-macos.txt

% java Test
Привет, κόσμος!
% java Test > out-java-macos.txt

% node test.js
Привет, κόσμος!
% node test.js > out-js-macos.txt

% python3 test.py
Привет, κόσμος!
% python3 test.py > out-py-macos.txt

% rustc test.rs -o rust-test
% ./rust-test
Привет, κόσμος!
% ./rust-test > out-rust-macos.txt

All output files are in UTF-8:

macOS configuration:


>cl /Fe:c-test.exe test.c
╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В!
>c-test > out-c-windows.txt
>c-test | findstr ,
╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В!

>go build -o go-test.exe test.go
Привет, κόσμος!
>go-test > out-go-windows.txt
>go-test | findstr ,
╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В!

>java Test
Привет, ??????!
>java Test > out-java-windows.txt
>java Test | findstr ,
╧ЁштхЄ, ??????!

>node test.js
Привет, κόσμος!
>node test.js > out-js-windows.txt
>node test.js | findstr ,
╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В!

>python test.py
Привет, κόσμος!
>python test.py > out-py-windows.txt
Traceback (most recent call last):
  File "...\test.py", line 4, in <module>
  File "...\Python39\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec cant encode characters in position 8-13: character maps to <undefined>
>python test.py | findstr ,
Traceback (most recent call last):
  File "...\test.py", line 4, in <module>
  File "...\Python39\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec cant encode characters in position 8-13: character maps to <undefined>

>rustc test.rs -o rust-test.exe
Привет, κόσμος!
>rust-test > out-rust-windows.txt
>rust-test | findstr ,
╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В!

C, JavaScript (node.js), Rust and Go produced valid UTF-8 when the output was redirected to files. Java produced a file in the legacy CP1251 encoding with ? for non-representable code points. Python failed on transcoding to CP1251. Output files:

Windows configuration:

17. Acknowledgements

Thanks to Corentin Jabot for his work on text encodings in C++ and in particular [P1885] that will simplify implementation of the current proposal.

Thanks to Roger Orr, Peter Brett, Hubert Tong, the BSI C++ panel and Tom Honermann for their feedback, support, constructive criticism and contributions to the proposal.

Thanks to Tim Song for substantial improvements to the wording.


