{"id":32848,"date":"2023-09-26T16:00:25","date_gmt":"2023-09-26T16:00:25","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=32848"},"modified":"2023-09-25T13:27:23","modified_gmt":"2023-09-25T13:27:23","slug":"c11-threads-in-visual-studio-2022-version-17-8-preview-2","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/c11-threads-in-visual-studio-2022-version-17-8-preview-2\/","title":{"rendered":"C11 Threads in Visual Studio 2022 version 17.8 Preview 2"},"content":{"rendered":"<p>Back in <a href=\"https:\/\/devblogs.microsoft.com\/cppblog\/c11-atomics-in-visual-studio-2022-version-17-5-preview-2\/\">Visual Studio 2022 version 17.5<\/a> Microsoft Visual C gained preliminary support for C11 atomics. We are happy to announce that support for the other major concurrency feature of C11, threads, is available in Visual Studio version 17.8 Preview 2. This should make it easier to port cross-platform C applications to Windows, without having to drag along a threading compatibility layer.<\/p>\n<p>Unlike C11 atomics, C11 threads do not share an ABI with C++\u2019s <code>&lt;thread&gt;<\/code> facilities, but C++ programs can include the C11 threads header and call the functions just like any C program. Both are implemented in terms of the primitives provided by Windows, so their usage can be mixed in the same program and on the same thread. The implementations are distinct, however, for example you can\u2019t use the C11 mutexes with C++ condition variables.<\/p>\n<p>C11 contains support for threads and a variety of related concurrency primitives including mutexes, condition variables, and thread specific storage. All of these are implemented in Visual Studio version 17.8 Preview 2.<\/p>\n<h2><a id=\"post-32848-threads\"><\/a>Threads<\/h2>\n<p>Threads are created with <code>thrd_create<\/code>, to which you pass a pointer to the desired entry point and a user data pointer (which may be null), along with a pointer to a <code>thrd_t<\/code> structure to fill in. Once you have a <code>thrd_t<\/code> created with <code>thrd_create<\/code> you can call functions to compare it to another <code>thrd_t<\/code>, join it, or detach it. Functions are also provided to sleep or yield the current thread.<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">int thread_entry(void* data) {\r\n    return 0;\r\n}\r\n\r\nint main(void) {\r\n    thrd_t thread;\r\n    int result = thrd_create(&amp;thread, thread_entry, NULL);\r\n    if(result != thrd_success) {\r\n        \/\/ handle error\r\n    }\r\n    result = thrd_join(thread, NULL);\r\n    if(result != thrd_success) {\r\n        \/\/ handle error\r\n    }\r\n    return 0;\r\n}<\/code><\/pre>\n<p>&nbsp;<\/p>\n<p>A key difference between our implementation and C11 threads implementations based on pthreads is that threads can not detach <em>themselves<\/em> using <code>thrd_current()<\/code> and <code>thrd_detach()<\/code>. This is because of a fundamental difference in how threads work on Windows vs Unix descendants and we would require a shared datastructure that tracks thread handles to implement the typical behavior.<\/p>\n<p>On Unix derivatives the integer thread ID <em>is<\/em> the handle to the thread and detaching just sets a flag causing the thread to be cleaned up immediately when it finishes. This makes detached threads somewhat dangerous to use on Unix derivatives, since after a detached thread exits any other references to that thread ID will be dangling and could later refer to a different thread altogether. On Windows the handle to a thread is a win32 <code>HANDLE<\/code> and is reference counted. The thread is cleaned up when the last handle is closed. There is no way to close all handles to a thread except by keeping track of them and closing each one.<\/p>\n<p>We could implement the Unix\/pthreads behavior by keeping a shared mapping of thread-id to handle, populated by <code>thrd_create<\/code>. If you need this functionality then you can implement something like this yourself, but we don\u2019t provide it by default because it would incur a cost even if it\u2019s not used. Better workarounds may also be available, such as passing a pointer to the <code>thrd_t<\/code> populated by <code>thrd_create<\/code> via the user data pointer to the created thread.<\/p>\n<h2><a id=\"post-32848-mutexes\"><\/a>Mutexes<\/h2>\n<p>Mutexes are provided through the <code>mtx_t<\/code> structure and associated functions. Mutexes can be either plain, recursive, timed, or a combination of these properties. All kinds of mutexes are manipulated with the same functions (the type is dynamic).<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">#include &lt;threads.h&gt; \r\n \r\nstatic mtx_t mtx; \/\/ see below \r\n \r\nint main(void) { \r\n    if(mtx_lock(&amp;mtx) != thrd_success) { \r\n        return 1; \r\n    } \r\n    \/\/ do some stuff protected by the mutex \r\n \r\n    \/\/ no need to check the result of a valid unlock call \r\n    mtx_unlock(&amp;mtx); \r\n \r\n    \/\/ no need to call mtx_destroy \r\n} <\/code><\/pre>\n<p>&nbsp;<\/p>\n<p>Our mutexes are always implemented on top of <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/sync\/slim-reader-writer--srw--locks\">Slim Reader Writer Locks<\/a> and are 32 bytes each on x64 (our C++ <code>std::mutex<\/code> is 80 bytes). They consist of an 8 byte tag (this is much more than needed, but provides some room for future expansion), an <code>SRWLock<\/code>, a win32 <code>CONDITION_VARIABLE<\/code>, and a 32-bit owner and lock count. The owner and lock count are always maintained, even when mutex is not recursive. If you attempt to recursively lock a non-recursive mutex, or unlock a mutex you do not own then <code>abort()<\/code> is called. Structurally valid calls to <code>mtx_unlock<\/code> always succeed, and it is safe to ignore the return value of <code>mtx_unlock<\/code> in our implementation.<\/p>\n<p>In our implementation you need not call <code>mtx_init<\/code>; a zeroed <code>mtx_t<\/code> is a valid plain mutex. Mutexes also don\u2019t require any cleanup and calls to <code>mtx_destroy<\/code> are optional. This means you can safely use mutexes as static variables and similar.<\/p>\n<h2><a id=\"post-32848-condition-variables\"><\/a>Condition Variables<\/h2>\n<p>Condition variables are provided through the <code>cnd_t<\/code> structure and associated functions. This structure is 8 bytes and stores just a win32 <code>CONDITION_VARIABLE<\/code>. You can wait on the condition variable with <code>cnd_wait<\/code>, or <code>cnd_timedwait<\/code>, and you can wake one waiting thread with <code>cnd_signal<\/code> or all waiting threads with <code>cnd_broadcast<\/code>. Spurious wakeups are allowed.<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">#include &lt;threads.h&gt; \r\n \r\nstatic mtx_t mtx; \r\nstatic cnd_t cnd; \r\nstatic int condition; \r\n \r\nint main(void) { \r\n    if(mtx_lock(&amp;mtx) != thrd_success) { \r\n        return 1; \r\n    } \r\n    while(condition == 0) { \r\n        if(cnd_wait(&amp;cnd, &amp;mtx) != thrd_success) { \r\n            return 1; \r\n        } \r\n    } \r\n    mtx_unlock(&amp;mtx); \r\n    return 0; \r\n} <\/code><\/pre>\n<p>Similarly to mutexes, zeroed condition variables are valid and you can omit calls to <code>cnd_init<\/code> and <code>cnd_destroy<\/code>.<\/p>\n<h2><a id=\"post-32848-thread-specific-storage\"><\/a>Thread Specific Storage<\/h2>\n<p>Thread specific storage is provided via the <code>_Thread_local<\/code> (<code>thread_local<\/code> in C23) keyword, or via the <code>tss_<\/code> family of functions. <code>_Thread_local<\/code> works just like <code>__declspec(thread)<\/code> (see <a href=\"https:\/\/learn.microsoft.com\/en-us\/cpp\/cpp\/thread?view=msvc-170\">docs<\/a>) and the <code>tss_<\/code> functions work similarly, but not identically, to the <code>Fls*<\/code> or <code>Tls*<\/code> family of functions.<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">#include &lt;threads.h&gt; \r\n#include &lt;stdlib.h&gt; \r\nvoid dtor(void* dat) { \r\n    \/\/ not called in this program \r\n    abort(); \r\n} \r\n \r\nstatic tss_t t; \r\n \r\nint main(void) { \r\n    if(tss_create(&amp;t, dtor) != thrd_success) { \r\n        return 1; \r\n    } \r\n    if(tss_set(t, (void*)42) != thrd_success) { \r\n        return 1; \r\n    } \r\n    if(tss_get(t) != (void*)42) { \r\n        return 1; \r\n    } \r\n    return 0; \r\n} <\/code><\/pre>\n<p>The C11 TSS facilities support <em>destructors<\/em> which are run when threads exit and are passed the value of the associated TSS key, if it is non-null. The macro <code>TSS_DTOR_ITERATIONS<\/code> specifies how many times we\u2019ll check for more destructors to run in the case that a destructor calls <code>tss_set<\/code>. Currently it\u2019s set to 1, however, if this is a problem for you let us know. Destructors are run from either <code>DllMain<\/code>, or from a TLS callback (if you use the static runtime), and are not run on process teardown. This is an important difference from FLS destructors which are run on process teardown and get run before any <code>DllMain<\/code> routines or TLS callbacks.<\/p>\n<h3><a id=\"post-32848-Xfd8bf69ca5cd09bc517393a90eb8a3dbe850e6d\"><\/a>TSS limits and performance characteristics<\/h3>\n<p>When using the explicit <code>tss_<\/code> functions there is a limit of 1024 TSS indices per process, these are <em>not<\/em> the same indices used for the <code>Fls*<\/code> functions, the <code>Tls*<\/code> functions, or <code>_Thread_local<\/code> \u201cimplicit\u201d TLS variables. If you use any <code>&lt;threads.h&gt;<\/code> functions (not just the TSS functions) <em>and<\/em> you use the static runtime then you will use at least one implicit TLS index (the ones used for <code>_Thread_local<\/code>), even if you don\u2019t otherwise use implicit TLS. This is because we need to enable TLS callbacks, which causes the loader to allocate such an index. If this is a problem (for example because of the loader gymnastics that are required to dynamically load such modules) let us know, or just use the dynamic runtime. If you use the <code>tss_<\/code> functions then additionally you will use one dynamic TLS index (the same ones used by <code>TlsAlloc<\/code>), you will only use <em>one<\/em>, no matter how many <code>tss_t<\/code>s you create. Threads will only spend time processing TSS destructors at thread exit if a TSS index with an associated destructor was ever set on that thread. When you create the first <code>tss_t<\/code> a table of destructors is allocated and when you use <code>tss_set<\/code> for the first time on a particular thread a per-thread table is allocated. Memory usage scales with the number of threads that use the C11 TSS functionality, not the total number of threads in the process. The destructor table is 8KiB (4KiB on 32-bit platforms) and each per thread table is 8209 bytes (4105 bytes on 32-bit platforms). These performance and memory characteristics may change in the future.<\/p>\n<h2><a id=\"post-32848-new-runtime-components\"><\/a>New Runtime Components<\/h2>\n<p>Because <code>&lt;threads.h&gt;<\/code> is a new feature and we want the implementation to be able to change and improve over time, it\u2019s shipped as a new satellite DLL of vcruntime: <code>vcruntime140_threads.dll<\/code> and <code>vcruntime140_threadsd.dll<\/code>. If you use the dynamic version of the Visual C++ runtime (<code>\/MD<\/code> or <code>\/MDd<\/code>), and you use the new threads facilities, then you need to either redistribute this file with your app, or redistribute a Visual C++ runtime redist that is new enough to contain these files. If you don\u2019t touch the C11 threads functionality then your app won\u2019t depend on anything in this DLL and it will not be loaded at all.<\/p>\n<h2><a id=\"post-32848-send-us-your-feedback\"><\/a>Send us your feedback!<\/h2>\n<p>Try out C11 threads in the <a href=\"https:\/\/visualstudio.microsoft.com\/vs\/preview\/\">latest Visual Studio preview<\/a> and share your thoughts with us in the comments below, on <a href=\"https:\/\/developercommunity.visualstudio.com\/cpp\">Developer Community<\/a>, on twitter (<a href=\"https:\/\/twitter.com\/visualc\">@VisualC<\/a>) or via email at visualcpp@microsoft.com.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Back in Visual Studio 2022 version 17.5 Microsoft Visual C gained preliminary support for C11 atomics. We are happy to announce that support for the other major concurrency feature of C11, threads, is available in Visual Studio version 17.8 Preview 2. This should make it easier to port cross-platform C applications to Windows, without having [&hellip;]<\/p>\n","protected":false},"author":4807,"featured_media":35994,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[270,1],"tags":[],"class_list":["post-32848","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-announcement","category-cplusplus"],"acf":[],"blog_post_summary":"<p>Back in Visual Studio 2022 version 17.5 Microsoft Visual C gained preliminary support for C11 atomics. We are happy to announce that support for the other major concurrency feature of C11, threads, is available in Visual Studio version 17.8 Preview 2. This should make it easier to port cross-platform C applications to Windows, without having [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/32848","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/4807"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=32848"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/32848\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/35994"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=32848"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=32848"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=32848"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}