{"id":25462,"date":"2020-02-04T18:10:26","date_gmt":"2020-02-04T18:10:26","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cppblog\/?p=25462"},"modified":"2020-02-04T18:10:26","modified_gmt":"2020-02-04T18:10:26","slug":"jcc-erratum-mitigation-in-msvc","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cppblog\/jcc-erratum-mitigation-in-msvc\/","title":{"rendered":"JCC Erratum Mitigation in MSVC"},"content":{"rendered":"<p><i>The content of this blog was provided by Gautham Beeraka from Intel Corporation.<\/i><\/p>\n<p style=\"text-align: justify;\">Intel recently announced <a href=\"https:\/\/www.intel.com\/content\/dam\/support\/us\/en\/documents\/processors\/mitigations-jump-conditional-code-erratum.pdf\">Jump Conditional Code (JCC) Erratum<\/a> which can occur in some of its processors. The MSVC team has been working with Intel to provide a software fix in the compiler to mitigate the performance impact of the microcode update that prevents the erratum.<\/p>\n<h2 style=\"text-align: justify;\">Introduction<\/h2>\n<p style=\"text-align: justify;\">There are three things one should know about JCC erratum:<\/p>\n<ol style=\"text-align: justify;\">\n<li>What the erratum is, if and how it affects you.<\/li>\n<li>Microcode update which prevents the erratum, if you have it and its side effects.<\/li>\n<li style=\"text-align: justify;\">MSVC compiler support to mitigate the side effects of the microcode update.<\/li>\n<\/ol>\n<p style=\"text-align: justify;\">Each of these topics are explained below.<\/p>\n<h3>JCC Erratum<\/h3>\n<p style=\"text-align: justify;\">The processors listed in Intel\u2019s white paper referenced above have an erratum which can occur under certain conditions that involve jump instructions overlaying a cache-line boundary. This erratum can result in unpredictable behavior for the software running on these processors. If your software runs on these processors, you are affected by this erratum.<\/p>\n<h3>Microcode Update<\/h3>\n<p style=\"text-align: justify;\">Applying a microcode update (MCU) can prevent JCC erratum. The MCU works by preventing the jump instructions that overlay or end on 32-byte boundary as shown in the figure below from being cached in the decoded uop cache. The MCU affects conditional jumps, macro-fused conditional jumps, direct unconditional jump, indirect jump, direct\/indirect call and return.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-25463\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/a-screenshot-of-a-computer-description-automatica.png\" alt=\"Examples of instructions which straddle 32-bit alignment\" width=\"509\" height=\"467\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/a-screenshot-of-a-computer-description-automatica.png 605w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/a-screenshot-of-a-computer-description-automatica-300x276.png 300w\" sizes=\"(max-width: 509px) 100vw, 509px\" \/><\/p>\n<p style=\"text-align: justify;\">The MCU will be distributed through Windows Update. We will update this blog once we have more information on the Windows Update. Note that the MCU is not specific to Windows and applies to other operating systems also.<\/p>\n<p style=\"text-align: justify;\">Applying the MCU can regress performance of software running on the patched machines. Based on our measurements, we see an impact between 0-3%. The impact was higher on a few outlier microbenchmarks.<\/p>\n<h3>Software Mitigation in MSVC compiler<\/h3>\n<p>To mitigate the performance impact, developers can build their code with the software fix enabled by <strong><em>\/QIntel-jcc-erratum<\/em><\/strong> switch in MSVC compiler. We observed that the performance regressions become negligible after rebuilding with this fix. The switch can increase code size which was about 3% based on our measurements.<\/p>\n<h2>How to enable the software mitigation?<\/h2>\n<p style=\"text-align: justify;\">Starting from Visual Studio 2019 version 16.5 Preview 2, developers can apply the software mitigation for the performance impact of the MCU. To enable software mitigation for JCC erratum for your code, simply select \u201cYes\u201d under the \u201cCode Generation\u201d section of the project Property Pages:<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-25464\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/word-image.png\" alt=\"Screenshot of the Enable Intel JCC Erratum Mitigation in the property pages\" width=\"680\" height=\"471\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/word-image.png 789w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/word-image-300x208.png 300w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/word-image-768x532.png 768w\" sizes=\"(max-width: 680px) 100vw, 680px\" \/><\/p>\n<p style=\"text-align: justify;\">A few undocumented compiler flags are also available to restrict the scope of the software mitigation as shown below. These flags can be useful to experiment with, but we are not committed to service them in future releases.<\/p>\n<ol style=\"text-align: justify;\">\n<li><strong><em>\/d2QIntel-jcc-erratum-partial<\/em><\/strong> &#8211; This applies the mitigation only inside loops in a function.<\/li>\n<li><strong><em>\/d2QIntel-jcc-erratum:&lt;file.txt&gt;<\/em><\/strong> &#8211; This applies the mitigation only to functions specified within file.txt.<\/li>\n<li><strong><em>\/d2QIntel-jcc-erratum-partial:&lt;file.txt&gt;<\/em><\/strong> &#8211; This applies the mitigation only to loops in the functions specified within file.txt.<\/li>\n<\/ol>\n<p style=\"text-align: justify;\">The function names given in &lt;file.txt&gt; are the decorated function names as used by the compiler.<\/p>\n<p style=\"text-align: justify;\">To enable these flags, add them to the \u201cAdditional Options\u201d under the \u201cCommand Line\u201d section of the project Property Pages:<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-25465\" src=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/word-image-1.png\" alt=\"Screenshot of adding \/d2Qintel-jecc-erratum-partial to the additional compiler flags\" width=\"668\" height=\"462\" srcset=\"https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/word-image-1.png 792w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/word-image-1-300x208.png 300w, https:\/\/devblogs.microsoft.com\/cppblog\/wp-content\/uploads\/sites\/9\/2020\/01\/word-image-1-768x531.png 768w\" sizes=\"(max-width: 668px) 100vw, 668px\" \/><\/p>\n<p style=\"text-align: justify;\">All these switches work only in release builds and are incompatible with <a href=\"https:\/\/docs.microsoft.com\/en-us\/cpp\/build\/reference\/clr-common-language-runtime-compilation?view=vs-2019\">\/clr<\/a> switches. In the event multiple <em>\/d2QIntel-jcc-erratum*<\/em> switches have been given, full processing (all branches) is favored over partial (loop branches only) processing. If any of the switches specifies a functions file, the processing is limited to just those functions.<\/p>\n<h2>What does the software mitigation do?<\/h2>\n<p style=\"text-align: justify;\">The software mitigation in the compiler detects all affected jumps in the code (the jumps that overlay or end at 32-byte boundary) and aligns them to start at this boundary. This is done by adding benign segment override prefixes to the instructions before the jump. The size of the resultant instructions increases but is less than 15 bytes. In situations where prefixes cannot be added, NOPs are used. The example below shows how the compiler generates code when the mitigation is on and off.<\/p>\n<p>Sample C++ code:<\/p>\n<pre class=\"prettyprint\">for (int i = 0; i &lt; length; i++) {\r\n\t\tsum += arr[i] + c;\r\n}\r\n<\/pre>\n<table style=\"border-style: solid; width: 100%; height: 380px; border-collapse: collapse;\">\n<tbody>\n<tr style=\"height: 86px;\">\n<td style=\"border-style: solid; width: 50%; height: 86px;\">\n<p style=\"text-align: center;\">Code without \/QIntel-jcc-erratum<\/p>\n<p style=\"text-align: center;\">(\/O2 \/FAsc)<\/p>\n<\/td>\n<td style=\"border-style: solid; width: 50%; height: 86px;\">\n<p style=\"text-align: center;\">Code with \/QIntel-jcc-erratum<\/p>\n<p style=\"text-align: center;\">(\/O2 \/FAsc \/QIntel-jcc-erratum)<\/p>\n<\/td>\n<\/tr>\n<tr style=\"height: 294px;\">\n<td style=\"border-style: solid; width: 50%; height: 294px;\"><span style=\"font-size: 10pt;\">$LL8@test1:<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">00010 44 8b 0c 91 \u00a0\u00a0 mov r9d, DWORD PTR [rcx+rdx*4]<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">00014 48 ff c2 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 inc rdx<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">00017 45 03 c8 \u00a0 \u00a0 \u00a0 \u00a0 add r9d, r8d<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">0001a 41 03 c1 \u00a0 \u00a0 \u00a0 \u00a0 add eax, r9d<\/span><\/p>\n<p><span style=\"font-size: 10pt;\"><span style=\"color: #008000;\">0001d<\/span> 49 3b d2 \u00a0 \u00a0 \u00a0 \u00a0 cmp rdx, r10<\/span><\/p>\n<p><span style=\"font-size: 10pt;\"><span style=\"color: #008000;\">00020<\/span> 7c ee \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 jl SHORT $LL8@test1<\/span><\/td>\n<td style=\"border-style: solid; width: 50%;\" width=\"366\"><span style=\"font-size: 10pt;\">$LL8@test1:<\/span><\/p>\n<p><span style=\"font-size: 10pt;\"> 00010 <span style=\"color: #008000;\">3e 3e 3e<\/span> 44 8b 0c 91 \u00a0\u00a0 mov\u00a0 r9d, DWORD PTR [rcx+rdx*4]<\/span><\/p>\n<p><span style=\"font-size: 10pt;\"> 00017 48 ff c2 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 inc\u00a0 rdx<\/span><\/p>\n<p><span style=\"font-size: 10pt;\"> 0001a 45 03 c8 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 add\u00a0 r9d, r8d<\/span><\/p>\n<p><span style=\"font-size: 10pt;\"> 0001d 41 03 c1 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 add\u00a0 eax, r9d<\/span><\/p>\n<p><span style=\"font-size: 10pt;\"><span style=\"color: #008000;\"> 00020<\/span> 49 3b d2 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 cmp\u00a0 rdx, r10<\/span><\/p>\n<p><span style=\"font-size: 10pt;\"><span style=\"color: #008000;\"> 00023<\/span> 7c eb \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 jl\u00a0\u00a0 SHORT $LL8@test1<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p style=\"text-align: justify;\">In the example above, the CMP and JL instructions are macro-fused and overlay a 32-byte boundary. The mitigation pads the first instruction in the block, the MOV instruction with 0x3E prefix to align the CMP instruction to begin on a 32-byte boundary.<\/p>\n<h2>What is the performance story?<\/h2>\n<p style=\"text-align: justify;\">We did evaluate the performance impact of the MCU and fix in the MSVC compiler. The numbers stated below use the following test PC configuration.<\/p>\n<p style=\"text-align: justify;\"><strong>Processor<\/strong> \u2013 Intel\u00ae Core\u2122 i9 9900K @ 3.60GHz<\/p>\n<p style=\"text-align: justify;\"><strong>Operating System<\/strong> \u2013 Private build of Windows with the MCU applicable to this processor.<\/p>\n<p style=\"text-align: justify;\"><strong>Benchmark suite<\/strong> \u2013 SPEC CPU\u00ae 2017<\/p>\n<p style=\"text-align: justify;\">Based on our measurements, we see regressions ranging from 0-3% after applying the MCU. We also saw regressions going up to 10% on some outlier microbenchmarks.<\/p>\n<p style=\"text-align: justify;\">Applying the software mitigation through the <em>\/QIntel-jcc-erratum<\/em> switch in MSVC compiler makes the regressions negligible. This switch applies the mitigation globally to all modules built with it and increases code size. We measured an average of 3% code size bloat.<\/p>\n<p style=\"text-align: justify;\">We measured that applying the mitigation only in loops through the <em>\/d2QIntel-jcc-erratum-partial <\/em>switch also makes the performance regressions negligible but with lesser code size increase. We measured an average of 1.5% code size bloat with the partial mitigation. You can further reduce the code size impact and get most of the performance back by applying the mitigations only to hot functions through the <em>\/d2QIntel-jcc-erratum:&lt;file.txt&gt; <\/em>and <em>\/d2QIntel-jcc-erratum-partial:&lt;file.txt&gt; <\/em>switches.<\/p>\n<p style=\"text-align: justify;\">We also measured that the performance impact of <em>\/QIntel-jcc-erratum<\/em> switch on processors that are not affected by the erratum is negligible. However, as codebases vary greatly, we advise developers to evaluate the impact of <em>\/QIntel-jcc-erratum<\/em> in the context of their applications and workloads.<\/p>\n<h2>Closing Notes<\/h2>\n<p style=\"text-align: justify;\">If your software can run on the machines with processors affected by the JCC erratum and versions of Windows with the MCU, we encourage you to profile your code and check for performance regressions. You can use <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-hardware\/test\/wpt\/\">Windows Performance Toolkit<\/a> or <a href=\"https:\/\/software.intel.com\/en-us\/vtune\">Intel\u00ae VTune \u2122 Profiler<\/a> to profile your code. You can detect if the MCU is affecting performance by following steps in Intel\u2019s white paper. If you are affected, recompile with <em>\/QIntel-jcc-erratum <\/em>or other switches listed above to mitigate the effects.<\/p>\n<p style=\"text-align: justify;\">Your feedback is key to deliver the best experience. If you have any questions, please feel free to ask us below. You can also send us your comments through <a href=\"mailto:visualcpp@microsoft.com\">e-mail<\/a>. If you encounter problems with the experience or have suggestions for improvement, please\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/visualstudio\/ide\/how-to-report-a-problem-with-visual-studio?view=vs-2019\" target=\"_blank\" rel=\"noopener noreferrer\">Report A Problem<\/a>\u00a0or reach out via\u00a0<a href=\"https:\/\/developercommunity.visualstudio.com\/spaces\/62\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">Developer Community<\/a>. You can also find us on Twitter\u00a0<a href=\"https:\/\/twitter.com\/visualc\" target=\"_blank\" rel=\"noopener noreferrer\">@VisualC<\/a>.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The content of this blog was provided by Gautham Beeraka from Intel Corporation. Intel recently announced Jump Conditional Code (JCC) Erratum which can occur in some of its processors. The MSVC team has been working with Intel to provide a software fix in the compiler to mitigate the performance impact of the microcode update that [&hellip;]<\/p>\n","protected":false},"author":14432,"featured_media":25463,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[270,218],"tags":[],"class_list":["post-25462","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-announcement","category-performance"],"acf":[],"blog_post_summary":"<p>The content of this blog was provided by Gautham Beeraka from Intel Corporation. Intel recently announced Jump Conditional Code (JCC) Erratum which can occur in some of its processors. The MSVC team has been working with Intel to provide a software fix in the compiler to mitigate the performance impact of the microcode update that [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/25462","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/users\/14432"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/comments?post=25462"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/posts\/25462\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media\/25463"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/media?parent=25462"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/categories?post=25462"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cppblog\/wp-json\/wp\/v2\/tags?post=25462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}