{"id":103512,"date":"2025-07-22T18:18:56","date_gmt":"2025-07-23T01:18:56","guid":{"rendered":"https:\/\/developer.nvidia.com\/blog\/?p=103512"},"modified":"2025-12-27T13:52:24","modified_gmt":"2025-12-27T21:52:24","slug":"train-a-reasoning-capable-llm-in-one-weekend-with-nvidia-nemo","status":"publish","type":"post","link":"https:\/\/developer.nvidia.com\/blog\/train-a-reasoning-capable-llm-in-one-weekend-with-nvidia-nemo\/","title":{"rendered":"Train a Reasoning-Capable LLM in One Weekend with NVIDIA NeMo"},"content":{"rendered":"\n<p>Have you ever wanted to build your own <a href=\"https:\/\/www.nvidia.com\/en-us\/glossary\/ai-reasoning\/\" target=\"_blank\" rel=\"noreferrer noopener\">reasoning<\/a> models such as the open <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/foundation-models\/nemotron\/\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA Nemotron<\/a>, but thought it was too complicated or required massive resources? Think again. With NVIDIA\u2019s powerful tools and datasets, you can train a small, effective reasoning model in <strong>about 48 hours<\/strong>, all <strong>on a single GPU<\/strong>. Even better, we\u2019ve made all the code available to you to get started right away.\u00a0<\/p>\n\n\n\n<p>Let\u2019s dive in.<\/p>\n\n\n\n<h2 id=\"quick_links_to_dataset_and_codes\"  class=\"wp-block-heading\">Quick links to dataset and codes<a href=\"#quick_links_to_dataset_and_codes\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Hugging Face: <a href=\"https:\/\/huggingface.co\/datasets\/nvidia\/Llama-Nemotron-Post-Training-Dataset\" target=\"_blank\" rel=\"noreferrer noopener\">Llama Nemotron Post-Training Dataset<\/a><\/li>\n\n\n\n<li>GitHub: <a href=\"https:\/\/github.com\/NVIDIA\/NeMo-Curator\/tree\/main\/tutorials\/llama-nemotron-data-curation\" target=\"_blank\" rel=\"noreferrer noopener\">Data Curation Code<\/a> with NVIDIA <a href=\"https:\/\/github.com\/NVIDIA\/NeMo-Curator\" target=\"_blank\" rel=\"noreferrer noopener\">NeMo Curator<\/a><\/li>\n\n\n\n<li>GitHub: <a href=\"https:\/\/github.com\/NVIDIA\/NeMo\/tree\/main\/tutorials\/llm\/reasoning\" target=\"_blank\" rel=\"noreferrer noopener\">Training and Evalulation Code<\/a> with NVIDIA <a href=\"https:\/\/github.com\/NVIDIA\/NeMo\" target=\"_blank\" rel=\"noreferrer noopener\">NeMo Framework<\/a><\/li>\n<\/ul>\n\n\n\n<h2 id=\"prerequisites\"  class=\"wp-block-heading\">Prerequisites<a href=\"#prerequisites\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>NVIDIA Ampere GPU or newer with at least 80GB memory (with&nbsp;<a href=\"https:\/\/developer.nvidia.com\/cuda-gpus\" target=\"_blank\" rel=\"noreferrer noopener\">Compute Capability<\/a>&nbsp;&gt;= 8.0)\n<ul class=\"wp-block-list\">\n<li>This tutorial has been tested on 1xA100 (80GB), and 1xH100 (80GB)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>250 GB of disk space (for dataset download, docker images and training checkpoints).<\/li>\n\n\n\n<li>A valid Hugging Face API token with access to&nbsp;<a href=\"https:\/\/huggingface.co\/meta-llama\/Llama-3.1-8B-Instruct\" target=\"_blank\" rel=\"noreferrer noopener\">Meta Llama 3.1 8B Instruct<\/a><\/li>\n<\/ul>\n\n\n\n<h2 id=\"video_walkthrough\"  class=\"wp-block-heading\">Video walkthrough<a href=\"https:\/\/developer.nvidia.com\/blog\/train-a-reasoning-capable-llm-in-one-weekend-with-nvidia-nemo\/#video_walkthrough\"><\/a><a href=\"#video_walkthrough\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>Click <a href=\"https:\/\/www.youtube.com\/watch?v=hMGikmMFLAU\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a> to watch the video tutorial on YouTube.<\/p>\n\n\n\n<h2 id=\"reasoning_models_and_test-time_computation\"  class=\"wp-block-heading\">Reasoning models and test-time computation<a href=\"#reasoning_models_and_test-time_computation\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>The advent of reasoning (or thinking) language models is transformative. By leveraging test-time computation scaling laws, more time is spent on generating tokens and internally reasoning about various aspects of the problem before producing the final answer. This makes them exceptionally skilled at tasks demanding deep critical thinking and reasoning, such as math and coding. This advancement signifies a paradigm shift in how language models are trained and used in various settings.&nbsp;<\/p>\n\n\n\n<p>NVIDIA stands at the forefront of this advancement with its introduction of the <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/foundation-models\/llama-nemotron\/\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA Nemotron<\/a>, a family of,&nbsp; of most open and efficient models that think fast and deliver highest accuracy for agentic AI.<\/p>\n\n\n\n<p>Nemotron models are trained with open training data and AI techniques \u2014giving full visibility, enabling better compliance, and ensuring trustworthy AI deployment.<\/p>\n\n\n\n<p>To learn more about Nemotron models, check out <a href=\"https:\/\/developer.nvidia.com\/blog\/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models\/\" target=\"_blank\" rel=\"noreferrer noopener\">this blog post<\/a>. The principles discussed in this blog also apply to other leading models such as ServiceNow\u2019s <a href=\"https:\/\/huggingface.co\/ServiceNow-AI\/Apriel-Nemotron-15b-Thinker\">Apriel Nemotron 15B<\/a>, highlighting the broader relevance of reasoning models in enterprise problem domains. To learn more about this model, check out <a href=\"https:\/\/blogs.nvidia.com\/blog\/servicenow-apriel-nemotron\/\" target=\"_blank\" rel=\"noreferrer noopener\">this blog post<\/a>.<\/p>\n\n\n\n<h3 id=\"from_\u201creasoning_off\u201d_to_\u201creasoning_on\u201d_controllable_reasoning_modes\"  class=\"wp-block-heading\">From \u201creasoning off\u201d to \u201creasoning on\u201d: controllable reasoning modes<a href=\"#from_\u201creasoning_off\u201d_to_\u201creasoning_on\u201d_controllable_reasoning_modes\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>A key innovation of the open <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/foundation-models\/llama-nemotron\/\" target=\"_blank\" rel=\"noreferrer noopener\">Llama Nemotron<\/a> models is their dynamic reasoning toggle, which allows users to switch between standard chat (\u201creasoning off\u201d) and advanced reasoning (\u201creasoning on\u201d) modes during inference via a simple instruction in the system prompt. This flexibility allows for optimized resource utilization: engaging deep reasoning capabilities for complex tasks like scientific analysis or coding, while reverting to a lightweight mode for simpler interactions for reduced latency and computational costs.\u00a0<\/p>\n\n\n\n<h3 id=\"our_open_post-training_dataset_for_reasoning\"  class=\"wp-block-heading\">Our open post-training dataset for reasoning<a href=\"#our_open_post-training_dataset_for_reasoning\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>To empower the developer community, NVIDIA has open-sourced a substantial portion of the data that was used in the post-training pipeline of the Llama Nemotron models. The <a href=\"https:\/\/huggingface.co\/datasets\/nvidia\/Llama-Nemotron-Post-Training-Dataset\" target=\"_blank\" rel=\"noreferrer noopener\">Llama Nemotron Post-Training Dataset<\/a>, containing over <strong>32 million samples<\/strong> across areas such as math, coding, chat, and sciences, provides a foundation for practitioners to train their own reasoning models. This dataset is key to teaching your model how to control its reasoning mode, mirroring Llama Nemotron capabilities.<\/p>\n\n\n\n<p>In this blog post, we&#8217;ll explore how you can leverage the <a href=\"https:\/\/huggingface.co\/datasets\/nvidia\/Llama-Nemotron-Post-Training-Dataset\" target=\"_blank\" rel=\"noreferrer noopener\">Llama Nemotron Post-Training Dataset<\/a>, <a href=\"https:\/\/github.com\/NVIDIA-NeMo\/Curator\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA NeMo Curator<\/a>, and <a href=\"https:\/\/github.com\/NVIDIA\/NeMo\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA NeMo Framework<\/a> to train your own reasoning language model over a weekend.&nbsp;<\/p>\n\n\n\n<h2 id=\"the_anatomy_of_the_llama_nemotron_post-training_dataset\"  class=\"wp-block-heading\">The anatomy of the Llama Nemotron Post-Training dataset<a href=\"#the_anatomy_of_the_llama_nemotron_post-training_dataset\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>The <a href=\"https:\/\/huggingface.co\/datasets\/nvidia\/Llama-Nemotron-Post-Training-Dataset\" target=\"_blank\" rel=\"noreferrer noopener\">Llama Nemotron Post-Training Dataset<\/a> is meticulously synthesized to enhance the reasoning capabilities of LLMs. Organized into distinct subsets for supervised fine-tuning (SFT) or reinforcement learning (RL), it encompasses samples from various problem domains. The following is a breakdown of the samples across different domains at the time of this writing.<\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Category<\/strong><\/td><td><strong>Sample Count<\/strong><\/td><\/tr><tr><td>Math<\/td><td>22,066,397<\/td><\/tr><tr><td>Coding<\/td><td>10,108,883<\/td><\/tr><tr><td>Science<\/td><td>708,920<\/td><\/tr><tr><td>Instruction Following<\/td><td>56,339<\/td><\/tr><tr><td>Chat<\/td><td>39,792<\/td><\/tr><tr><td>Safety<\/td><td>31,426<\/td><\/tr><tr><td><\/td><td><\/td><\/tr><tr><td>Total Samples<\/td><td>32,011,757<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><em>Table 1. Sample domain and distribution of the Llama Nemotron post-training dataset<\/em><\/figcaption><\/figure>\n\n\n\n<p>All samples in this dataset are in JSON lines (JSONL) format and contain metadata such as license type, source model, as well as the Llama Nemotron model(s) trained with that sample. Each sample consists of a prompt, along with an expected response with detailed chain-of-thought (CoT) reasoning traces followed by responses (i.e., \u201creasoning on\u201d), as well as samples with direct responses (i.e., \u201creasoning off\u201d). More concretely, each sample has the following attributes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>input<\/code>: the prompt(s) to the model in the multi-turn chat completions message format. It always starts with a message with the role <code>user<\/code>, followed by zero or more turns, and ending with a message with the role <code>assistant<\/code>, such as:<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n&#x5B;\n  {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Can you help me understand the Pythagorean theorem?&quot;},\n  {&quot;role&quot;: &quot;assistant&quot;, &quot;content&quot;: &quot;The Pythagorean theorem states that... Does that make sense?&quot;},\n  {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Yes, but I have a follow up question...&quot;},\n\n  #\n  # ... (zero or more messages),\n  #\n\n  {&quot;role&quot;: &quot;assistant&quot;, &quot;content&quot;: &quot;Sure, happy to help!&quot;},\n]\n<\/pre><\/div>\n\n\n<ul class=\"wp-block-list\">\n<li><code>output<\/code>: the expected response from the model (ground truth), such as:<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nThe Pythagorean theorem states that in a right triangle, the square of the hypotenuse equals the sum of the squares of the other two sides: a\u00b2 + b\u00b2 = c\u00b2.\n<\/pre><\/div>\n\n\n<ul>\n  <li><code>reasoning<\/code>: whether the sample is for reasoning \u201con\u201d mode or not.\n    <ul>\n      <li>If the value is \u201c<code>on<\/code>\u201d, then the output contains a detailed CoT trace encoded inside\n          <code>&lt;think&gt;&lt;\/think&gt;<\/code> followed by the output, such as:<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\n&lt;think&gt;\nHmm so the user is asking about the Pythagorean theorem. If I remember correctly...\n&lt;\/think&gt;\n\nThe Pythagorean theorem states that in a right triangle, the square of the hypotenuse equals the sum of the squares of the other two sides: a\u00b2 + b\u00b2 = c\u00b2.\n<\/pre><\/div>\n\n\n<ul>\n\n    <ul>\n      <li>If the value is \u201c<code>off<\/code>\u201d, then the output doesn\u2019t contain any reasoning traces and instead contains a direct response.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>system_prompt<\/code>: the (suggested) system prompt to control the reasoning mode of the system. For Llama Nemotron training, the system prompt is always either \u201c<code>detailed thinking on<\/code>\u201c or \u201c<code>detailed thinking off<\/code>\u201c. Needless to say, this field is tied to the value in the field \u201c<code>reasoning<\/code>\u201d (and vice versa).<\/li>\n\n\n\n<li><code>category<\/code>: the sample category, such as math, coding, science, instruction following, chat or safety.&nbsp;<\/li>\n\n\n\n<li><code>license<\/code>: the license associated with that sample.<\/li>\n\n\n\n<li><code>generator<\/code>: the generator model used to synthesize the sample, such as DeepSeek-R1, etc.<\/li>\n\n\n\n<li><code>used_in_training<\/code>: the list of Llama Nemotron models that used this sample for training. For instance, a value of <code>[\u201cUltra\u201d, \u201cNano\u201d]<\/code> indicates that this sample was used for training Llama Nemotron Nano and Ultra, but not Super.<\/li>\n\n\n\n<li><code>version<\/code>: a version tag associated with each sample. Since new samples are added to this dataset over time, this version tag helps identify when a particular sample was added.<\/li>\n<\/ul>\n\n\n\n<h2 id=\"from_zero_to_reasoning_in_3_easy_steps\"  class=\"wp-block-heading\">From zero to reasoning in 3 easy steps<a href=\"#from_zero_to_reasoning_in_3_easy_steps\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>Let\u2019s go over a training and data curation recipe that we used to train a small reasoning model. We leverage the Llama Nemotron Post-Training dataset, enabling your model to learn controllable reasoning similar to what we described above.&nbsp;<\/p>\n\n\n\n<p>Training your own reasoning model typically involves data curation, fine-tuning, and evaluation. In this section, we cover a proven recipe that lets you train a model on a single GPU in just 48 hours. Note that our recipe uses supervised fine-tuning (SFT) to instill reasoning capabilities. While reinforcement learning (RL) is also an option, recent work suggests that a multi-pass approach (i.e. SFT followed by RL) yields the best results.<\/p>\n\n\n\n<h3 id=\"things_to_consider\"  class=\"wp-block-heading\">Things to consider<a href=\"#things_to_consider\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dataset composition<\/strong>: The Llama Nemotron Post-Training dataset is large, so you\u2019ll need to curate a focused subset emphasizing reasoning. For real-world use, prioritize samples that align closely with your domain-specific tasks, and consider augmenting with your own domain-specific samples.<\/li>\n\n\n\n<li><strong>Base model selection:<\/strong> Given the time and computational constraints, teaching small models to reason is challenging, so the base model choice is critical. We recommend starting with models of at least 8B parameters. We used Llama 3.1 8B Instruct, which worked well.&nbsp;<\/li>\n\n\n\n<li><strong>Fine-tuning technique<\/strong>: Fully fine-tuning all the weights of an 8-billion parameter model requires at least 8 GPUs, aggressive memory optimization techniques, and a lot of time! However, we\u2019ve observed comparable results with parameter efficient fine-tuning (PEFT) using LoRA adapters. In fact, you can fine-tune a LoRA adapter for an 8-billion parameter model on a single NVIDIA H100 GPU in 48 hours.<\/li>\n\n\n\n<li><strong>Evaluation:<\/strong> Post fine-tuning, evaluate your model using standard benchmarks, and compare its performance to the original base model to assess improvement.<\/li>\n<\/ul>\n\n\n\n<h3 id=\"step_1_processing_data_with_nvidia_nemo_curator\"  class=\"wp-block-heading\">Step 1: Processing data with NVIDIA NeMo Curator<a href=\"#step_1_processing_data_with_nvidia_nemo_curator\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>High-quality data is the bedrock of a powerful reasoning model. There are many ways to subset the Llama Nemotron Post-Training dataset, but we recommend starting with the math and chat subsets because they contain strong examples of domain-agnostic reasoning.<\/p>\n\n\n\n<p>To get good results, we recommend a data processing pipeline with at least 500,000 samples and a balanced mix of \u201creasoning on\u201d and \u201creasoning off\u201d examples. Here\u2019s a recommended filtering and processing approach:&nbsp;<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Select the appropriate small subset<\/strong>\n<ol class=\"wp-block-list\">\n<li><strong>Use Llama Nemotron Nano samples<\/strong>: Start with these high-quality, pre-vetted samples used in Llama Nemotron Nano training.<\/li>\n\n\n\n<li><strong>Select key subsets<\/strong>: Select only the <code>math_v1.1<\/code> and <code>chat<\/code> subsets for strong, domain agnostic reasoning.<\/li>\n\n\n\n<li><strong>Filter by language<\/strong>: Remove all non-English samples by language identification to ensure dataset consistency.<br><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li><strong>Filter samples<\/strong>\n<ol class=\"wp-block-list\">\n<li><strong>Enforce answer format<\/strong>: Discard math samples that don&#8217;t have final answers in the LaTeX <code>\\boxed{}<\/code> format.<\/li>\n\n\n\n<li><strong>Exclude refusal samples<\/strong>: Exclude samples with thinking mode enabled but empty <code>&lt;think&gt;&lt;\/think&gt;<\/code> tags. These are often refusal samples which are necessary for additional safety training, but we can discard them for simplicity.<\/li>\n\n\n\n<li><strong>Restrict sample length<\/strong>:<strong> <\/strong>Filter out samples longer than a fixed token limit (e.g. 8k or 16k, <strong>after<\/strong> applying the tokenizer\u2019s chat template).<br><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li><strong>Apply a chat template<\/strong>: Format all training samples using a consistent chat-style template (e.g., system\/user\/assistant roles). This is required for instruction-following models that were trained with chat templates, and helps the model generalize better to downstream chat interfaces.<br><\/li>\n\n\n\n<li><strong>Reasoning mode via system prompt<\/strong>: Add control statements to the system prompt to signal whether reasoning should be enabled. Llama Nemotron models use phrases like \u201c<code>detailed thinking on<\/code>\u201d or \u201c<code>detailed thinking off<\/code>\u201d to control this behavior.<\/li>\n<\/ol>\n\n\n\n<ol start=\"5\" class=\"wp-block-list\">\n<li><strong>Utilize curriculum learning:<\/strong> Sort samples in the increasing order of difficulty. You can use the completion token count as a measure of sample difficulty. Feel free to experiment with different schemes.\n<ol class=\"wp-block-list\">\n<li>Split your data into \u201creasoning on\u201d and \u201creasoning off\u201d buckets.<\/li>\n\n\n\n<li>Sort each bucket by increasing completion length (as a proxy for difficulty)<\/li>\n\n\n\n<li>Interleave samples from each bucket to gradually introduce complexity.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n\n\n\n<p>You can use the NVIDIA <a href=\"https:\/\/developer.nvidia.com\/nemo-curator?sortBy=developer_learning_library%2Fsort%2Ffeatured_in.nemo_curator%3Adesc%2Ctitle%3Aasc&amp;hitsPerPage=6\" target=\"_blank\" rel=\"noreferrer noopener\">NeMo Curator<\/a>, part of the <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/products\/nemo\/\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA NeMo<\/a> software suite for managing the AI agent lifecycle, to implement this pipeline efficiently. We&#8217;ve released a simple, easy-to-understand pipeline on GitHub to help you get started. It runs locally on modest hardware, even without a GPU. Check out <a href=\"https:\/\/github.com\/NVIDIA\/NeMo-Curator\/tree\/main\/tutorials\/llama-nemotron-data-curation\" target=\"_blank\" rel=\"noreferrer noopener\">the code here<\/a>.<\/p>\n\n\n\n<p>The NeMo Curator pipeline demonstrates various facilities available in the framework (such as language identification and distributed processing) to quickly process a subset of Llama Nemotron Post-Training dataset for fine-tuning. You can easily modify this pipeline as you see fit and adapt it to your domain- or business-specific needs.<\/p>\n\n\n\n<p>Here are some commands to get you started, based on our recommendations provided above.<\/p>\n\n\n\n<p><strong>Obtain the dataset from Hugging Face (requires ~130GB of disk space):<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n$ git lfs install\n$ git clone https:\/\/huggingface.co\/datasets\/nvidia\/Llama-Nemotron-Post-Training-Dataset\n<\/pre><\/div>\n\n\n<p><strong>Obtain the <\/strong><a href=\"https:\/\/fasttext.cc\/docs\/en\/language-identification.html\"><strong>FastText<\/strong><\/a><strong> language identification model:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n$ wget https:\/\/dl.fbaipublicfiles.com\/fasttext\/supervised-models\/lid.176.ftz -P .\/\n<\/pre><\/div>\n\n\n<p><strong>Launch the data curation pipeline with 8 workers:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n$ python main.py \\\n    --input-dir &quot;\/path\/to\/Llama-Nemotron-Post-Training-Dataset\/SFT&quot; \\\n    --remove-columns &quot;version&quot; &quot;license&quot; &quot;generator&quot; &quot;category&quot; &quot;used_in_training&quot; &quot;system_prompt&quot; &quot;reasoning&quot; \\\n    --filename-filter &quot;chat&quot; &quot;math_v1.1&quot; \\\n    --tokenizer &quot;meta-llama\/Llama-3.1-8B-Instruct&quot; \\\n    --lang-id-model-path &quot;\/path\/to\/lid.176.ftz&quot; \\\n    --max-token-count 8192 \\\n    --max-completion-token-count 16384 \\\n    --output-dir &quot;\/path\/to\/curated-data&quot; \\\n    --json-blocksize &quot;100mb&quot; \\\n    --n-workers 8 \\\n    --device &quot;cpu&quot;\n<\/pre><\/div>\n\n\n<p>Once the above pipeline finishes execution, the curated dataset will be written to the specified output path. The output will be written in the form of multiple JSONL files. This is because the large input dataset is divided into smaller partitions so that each partition can be processed in parallel.&nbsp;<\/p>\n\n\n\n<p>Each JSONL file is in the <code>input\/output<\/code> format. For every record in each file, the <code>input<\/code> field contains the model inputs, i.e., system prompt and user messages (after undergoing chat template transformation with the specified tokenizer), and the <code>output<\/code> field contains the expected response from the model, including any special tokens added by the tokenizer\u2019s chat template (e.g., end of turn or end of sentence tokens).<\/p>\n\n\n\n<p>To combine all the different partitions into a single JSONL file, run the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n$ find \/path\/to\/curated-data -type f -name &quot;*.jsonl&quot; -size +0c -print0 | xargs -0 cat | awk &#039;NF&#039; &gt; training.jsonl\n<\/pre><\/div>\n\n\n<p>This will create a single JSONL file called training.jsonl with around 1.7 million samples. You can directly use this resulting file with <a href=\"https:\/\/github.com\/NVIDIA\/NeMo\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA NeMo Framework<\/a>, also part of the <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/products\/nemo\/\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA NeMo<\/a> software suite for managing the AI agent lifecycle, training scripts without modification.<\/p>\n\n\n\n<h3 id=\"step_2_training\"  class=\"wp-block-heading\">Step 2: Training<a href=\"#step_2_training\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>We experimented with base models ranging from 3B to 8B parameters and LoRA ranks from 16 to 128. The smallest model that consistently produced strong reasoning performance was Llama 3.1 8B Instruct, with LoRA rank 64 as the sweet spot.<\/p>\n\n\n\n<p>A few key factors contributed to successful training:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>High learning rate to accelerate convergence.<\/li>\n\n\n\n<li>Curriculum learning, using progressively harder samples, significantly improved stability and final performance.<\/li>\n\n\n\n<li>Batch size of at least 256.<\/li>\n<\/ul>\n\n\n\n<p>Full training hyperparameters are listed in the table below:<\/p>\n\n\n\n<figure class=\"wp-block-table aligncenter\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Hyperparameter<\/strong><\/td><td><strong>Value<\/strong><\/td><\/tr><tr><td>LoRA<\/td><td><\/td><\/tr><tr><td>&nbsp; &nbsp; Rank<\/td><td>64<\/td><\/tr><tr><td>&nbsp; &nbsp; Alpha<\/td><td>128<\/td><\/tr><tr><td>Learning Rate<\/td><td>0.0001<\/td><\/tr><tr><td>&nbsp; &nbsp; Scheduler<\/td><td>Cosine<\/td><\/tr><tr><td>&nbsp; &nbsp; Warmup steps<\/td><td>5% of total training steps<\/td><\/tr><tr><td>&nbsp; &nbsp; Weight decay<\/td><td>0.001<\/td><\/tr><tr><td>Batch Size<\/td><td>256 <em>(w\/ gradient accumulation)<\/em><\/td><\/tr><tr><td>Steps to train for<\/td><td>At least 2,000 steps<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><em>Table 2. Training hyper-parameters<\/em><\/figcaption><\/figure>\n\n\n\n<p>We trained the model on a single NVIDIA H100 80GB GPU for around 30 hours. Notably, consistent reasoning behavior emerged after just ~13 hours of training (after stepping through ~100,000 to 130,000 samples).<\/p>\n\n\n\n<p>If you have a GPU with lower than 80GB memory, you can reduce the (on device) batch size and increase the gradient accumulation steps to maintain a larger effective batch size while still being able to train with lower memory capacity.<\/p>\n\n\n\n<p>We\u2019ve prepared a <a href=\"https:\/\/github.com\/NVIDIA\/NeMo\/blob\/main\/tutorials\/llm\/reasoning\/Reasoning-SFT.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">Jupyter notebook for you on GitHub<\/a> that sets up the aforementioned training pipeline with appropriate hyperparameters using NVIDIA NeMo Framework. This notebook walks you through various settings that are available to you for fine-tuning your own model. Moreover, this notebook provides an option for you to perform full model fine-tuning instead of PEFT, should you choose to do so.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69efe87a9ea0b&quot;}\" data-wp-interactive=\"core\/image\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1507\" height=\"612\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training.png\" alt=\"benchmark showing the loss and learning rate scheduler for training.\" class=\"wp-image-103600\" srcset=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training.png 1507w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-300x122.png 300w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-625x254.png 625w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-179x73.png 179w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-768x312.png 768w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-645x262.png 645w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-500x203.png 500w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-160x65.png 160w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-362x147.png 362w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-271x110.png 271w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-1024x416.png 1024w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-1.-The-loss-and-learning-rate-scheduler-plots-for-training-960x390.png 960w\" sizes=\"auto, (max-width: 1507px) 100vw, 1507px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on-async--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 1. The loss and learning rate scheduler plots for training.<\/em><\/figcaption><\/figure><\/div>\n\n\n<p>For your reference, here are the loss plots from our own experiments to fine-tune a LoRA adapter of rank 64, using the first 500,000 training samples from the filtering and data curation pipeline, with a batch size of 256 and 2,000 training steps.<\/p>\n\n\n\n<p>You might be wondering about the sudden loss drop at the end. This is expected. Recall that our curated dataset is arranged in the increasing order of sample difficulty for curriculum learning. With 500,000 training samples, a batch size of 256 and 2,000 steps, that\u2019s just slightly over 1 epoch of training. Towards the end of that epoch, when the model sees the first few (easier samples) again, it can easily predict the correct tokens for them so the loss value ends up being much lower.<\/p>\n\n\n\n<h3 id=\"step_3_evaluation\"  class=\"wp-block-heading\">Step 3: Evaluation<a href=\"#step_3_evaluation\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>After training, it&#8217;s essential to evaluate the model to confirm that reasoning capabilities have been learned. We recommend:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Benchmarking against the base model<\/strong>: Run side-by-side comparisons on reasoning-heavy tasks to assess improvement.&nbsp;<\/li>\n\n\n\n<li><strong>Standard and domain-specific benchmarks<\/strong>:\n<ul class=\"wp-block-list\">\n<li>Evaluate on datasets like MMLU, GPQA Diamond, GPQA Main, or OpenBookQA to get a sense of the model\u2019s overall capabilities.<\/li>\n\n\n\n<li>Evaluate on domain-specific data, to get clear insight into the model\u2019s behavior in production.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Manual inspection<\/strong>: Sample outputs for both \u201creasoning on\u201d and \u201creasoning off\u201d modes to verify controllability and consistency. Just ensure that the chat templates and the system prompts are set up correctly.<\/li>\n<\/ul>\n\n\n\n<p>Let\u2019s dive deeper into the above three recommendations and see how our trained model performs.<\/p>\n\n\n\n<p>We\u2019ve prepared a set of scripts to benchmark your trained model against the base model on GPQA Diamond, GPQA Main, and MMLU datasets. <a href=\"https:\/\/github.com\/NVIDIA\/NeMo\/tree\/main\/tutorials\/llm\/reasoning\/evaluation\" target=\"_blank\" rel=\"noreferrer noopener\">Check out these scripts here<\/a>, which can be expanded to incorporate other benchmarking datasets. These scripts demonstrate dataset download and preparation, model deployment, and running the relevant benchmarks.<\/p>\n\n\n\n<p>The first step of the evaluation is to prepare the dataset for model evaluation. We download the MMLU, GPQA Diamond, and GPQA main datasets from Hugging Face and preprocess the data to rearrange it as question, choices, and the correct answer as one of the multiple-choice options (\u201cA\u201d, \u201cB\u201d, \u201cC\u201d, \u201cD\u201d).<\/p>\n\n\n\n<p>Next, we will deploy and evaluate our trained adapter as well as the base. In this step we will start a server and deploy the model using <a href=\"https:\/\/docs.nvidia.com\/deeplearning\/triton-inference-server\/user-guide\/docs\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">Triton Inference Server<\/a>, which provides OpenAI APIs endpoints. The<code> \/v1\/chat\/completions\/<\/code> endpoint allows for multi-turn conversational interactions with the model. This endpoint accepts a structured list of messages with different roles (system, user, assistant) to maintain context and generate chat-like responses. Under the hood, a chat template is applied to turn the conversation into a single input string.<\/p>\n\n\n\n<p>In order to deploy the trained model using \u201cdetailed thinking on\u201d, we can use the following chat template:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nchat_payload = {\n        &quot;messages&quot;: &#x5B;{&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: &quot;detailed thinking on&quot;}, {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: prompt}],\n        &quot;model&quot;: model_name,\n        &quot;max_tokens&quot;: 20000,\n    }\n<\/pre><\/div>\n\n\n<p>Similarly for \u201cdetailed thinking off\u201d mode, you can use the following chat template:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nchat_payload = {\n        &quot;messages&quot;: &#x5B;{&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: &quot;detailed thinking off&quot;}, {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: prompt}],\n        &quot;model&quot;: model_name,\n        &quot;max_tokens&quot;: 20000,\n    }\n<\/pre><\/div>\n\n\n<p>The <code>max_tokens<\/code> takes into consideration tokens needed for the input, system prompt, and the response form model.<\/p>\n\n\n\n<p>Lastly, we compare the ground truth responses and model responses generated in the previous step by performing extraction of the final answer to calculate the accuracy.<\/p>\n\n\n\n<p>Based on the above described process, we observed the following evaluation results when we compared the base model versus the trained adapter:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69efe87a9fba0&quot;}\" data-wp-interactive=\"core\/image\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1425\" height=\"841\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks.png\" alt=\"benchamrk of evaluation results when we compared the base model versus the trained adapter\" class=\"wp-image-103599\" srcset=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks.png 1425w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-300x177.png 300w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-625x369.png 625w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-179x106.png 179w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-768x453.png 768w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-645x381.png 645w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-500x295.png 500w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-152x90.png 152w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-362x214.png 362w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-186x110.png 186w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-1024x604.png 1024w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/07\/Figure-4.-Evaluation-of-the-trained-LoRA-adapter-and-the-base-instruct-model-on-GPQA-and-MMLU-benchmarks-915x540.png 915w\" sizes=\"auto, (max-width: 1425px) 100vw, 1425px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on-async--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 2. Evaluation of the trained LoRA adapter and the base instruct model on GPQA and MMLU benchmarks<\/em><\/figcaption><\/figure><\/div>\n\n\n<p>These results show that our trained LoRA adapter outperforms the base instruct model on various benchmarks, sometimes by as much as 10 points. These results are significant because our model was only trained for 48 hours on a relatively small number of training samples using a single GPU. LLM scaling laws predict that by increasing the number of training samples and the allotted training time, we could train even stronger reasoning models. <\/p>\n\n\n\n<p>If you prefer using a microservice instead of a framework to streamline your evaluation, check out the <a href=\"https:\/\/developer.nvidia.com\/nemo-evaluator?sortBy=developer_learning_library%2Fsort%2Ffeatured_in.nemo_evaluator%3Adesc%2Ctitle%3Aasc&amp;hitsPerPage=6\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA NeMo Evaluator<\/a> microservice, part of the <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/products\/nemo\/\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA NeMo<\/a> software suite for managing the AI agent lifecycle. NeMo Evaluator simplifies the end-to-end evaluation of generative AI applications and provides LLM-as-a-judge capabilities, along with a comprehensive suite of benchmarks and metrics for a wide range of custom tasks and domains, including reasoning, coding, and instruction-following.<\/p>\n\n\n\n<h2 id=\"conclusion_and_next_steps\"  class=\"wp-block-heading\">Conclusion and next steps<a href=\"#conclusion_and_next_steps\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>In this blog, we described a simple and computationally efficient recipe for training reasoning models with small amounts of training data curated based on the Llama Nemotron Post-Training Dataset. We discussed a strategy based on LoRA adapter training and highlighted the key considerations and hyperparameters for successfully teaching a small language model to reason in 48 hours. Through evaluations, we demonstrated that our trained LoRA adapter significantly outperforms the base instruct model on GPQA and MMLU datasets.<\/p>\n\n\n\n<p>Since our model was only trained on math and chat data, its reasoning abilities will be generic. By introducing domain-specific data, you can train models that are proficient in the problem domain that is relevant to your application or business needs.&nbsp;<\/p>\n\n\n\n<p>To train your own reasoning models or replicate this tutorial, click the following links:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Hugging Face: <a href=\"https:\/\/huggingface.co\/datasets\/nvidia\/Llama-Nemotron-Post-Training-Dataset\" target=\"_blank\" rel=\"noreferrer noopener\">Llama Nemotron Post-Training Dataset<\/a><\/li>\n\n\n\n<li>GitHub: <a href=\"https:\/\/github.com\/NVIDIA\/NeMo-Curator\/tree\/main\/tutorials\/llama-nemotron-data-curation\" target=\"_blank\" rel=\"noreferrer noopener\">Data Curation Code<\/a> with NVIDIA <a href=\"https:\/\/github.com\/NVIDIA\/NeMo-Curator\" target=\"_blank\" rel=\"noreferrer noopener\">NeMo Curator<\/a><\/li>\n\n\n\n<li>GitHub: <a href=\"https:\/\/github.com\/NVIDIA\/NeMo\/tree\/main\/tutorials\/llm\/reasoning\" target=\"_blank\" rel=\"noreferrer noopener\">Training and Evalulation Code<\/a> with NVIDIA <a href=\"https:\/\/github.com\/NVIDIA\/NeMo\" target=\"_blank\" rel=\"noreferrer noopener\">NeMo Framework<\/a><\/li>\n<\/ul>\n\n\n\n<p>From here, you can scale this weekend recipe to any model of your choice. Visit <a href=\"https:\/\/github.com\/NVIDIA-NeMo\/Nemotron\/tree\/main\/use-case-examples\">GitHub<\/a> for our Nemotron cookbook recipes and tutorials.<\/p>\n\n\n\n<h2 id=\"learn_more_about_nvidia_nemotron\"  class=\"wp-block-heading\">Learn more about NVIDIA Nemotron<a href=\"#learn_more_about_nvidia_nemotron\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>Stay up-to-date on<a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/foundation-models\/nemotron\/\"> NVIDIA Nemotron<\/a> by subscribing to <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-data-science\/generative-ai\/news\/\">NVIDIA news<\/a> and following NVIDIA AI on <a href=\"https:\/\/www.linkedin.com\/showcase\/nvidia-ai\/posts\/?feedView=all\">LinkedIn<\/a>, <a href=\"https:\/\/x.com\/NVIDIAAIDev\">X<\/a>, <a href=\"https:\/\/discord.com\/invite\/nvidiadeveloper\">Discord<\/a>, and <a href=\"https:\/\/www.youtube.com\/@NVIDIADeveloper\">YouTube<\/a>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Visit our <a href=\"https:\/\/developer.nvidia.com\/nemotron\">Nemotron developer page<\/a> for all the essentials you need to get started with the most open, smartest-per-compute reasoning model.&nbsp;<\/li>\n\n\n\n<li>Explore new open Nemotron models and datasets on <a href=\"https:\/\/huggingface.co\/collections\/nvidia\/nvidia-nemotron-689f6d6e6ead8e77dd641615\">Hugging Face<\/a> and <a href=\"https:\/\/build.nvidia.com\/models?filters=publisher%3Anvidia&amp;q=Nemotron\">NIM microservices<\/a> and <a href=\"https:\/\/build.nvidia.com\/blueprints\">Blueprints<\/a> on <a href=\"http:\/\/build.nvidia.com\">build.nvidia.com<\/a>.&nbsp;<\/li>\n\n\n\n<li><a href=\"http:\/\/nemotron.ideas.nvidia.com\/?nvid=nv-int-tblg-905992\">Share your ideas<\/a> and vote on features to help shape the future of Nemotron.&nbsp;<\/li>\n\n\n\n<li>Tune into upcoming <a href=\"https:\/\/www.addevent.com\/calendar\/Og917781\">Nemotron livestreams<\/a> and connect with the NVIDIA Developer community through <a href=\"https:\/\/forums.developer.nvidia.com\/c\/ai-data-science\/nvidia-nemotron\/669\">the Nemotron developer forum<\/a> and the <a href=\"https:\/\/discord.com\/channels\/1019361803752456192\/1407781691698708682\">Nemotron channel<\/a> on <a href=\"https:\/\/discord.com\/invite\/nvidiadeveloper\">Discord.<\/a><\/li>\n\n\n\n<li>Browse <a href=\"https:\/\/www.youtube.com\/playlist?list=PL5B692fm6--vdRKB14FImVi7MTJ77zjn4\">video tutorials and livestreams<\/a> to get the most out of NVIDIA Nemotron.<\/li>\n<\/ul>\n\n\n\n<h2 id=\"acknowledgement\"  class=\"wp-block-heading\">Acknowledgement<a href=\"#acknowledgement\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>We would like to acknowledge <a href=\"https:\/\/developer.nvidia.com\/blog\/author\/cmunley\/\" target=\"_blank\" rel=\"noreferrer noopener\">Christian Munley<\/a> for his valuable assistance in this work.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Have you ever wanted to build your own reasoning models such as the open NVIDIA Nemotron, but thought it was too complicated or required massive resources? Think again. With NVIDIA\u2019s powerful tools and datasets, you can train a small, effective reasoning model in about 48 hours, all on a single GPU. Even better, we\u2019ve made &hellip; <a href=\"https:\/\/developer.nvidia.com\/blog\/train-a-reasoning-capable-llm-in-one-weekend-with-nvidia-nemo\/\">Continued<\/a><\/p>\n","protected":false},"author":2039,"featured_media":101380,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"publish_to_discourse":"","publish_post_category":"318","wpdc_auto_publish_overridden":"1","wpdc_topic_tags":"","wpdc_pin_topic":"","wpdc_pin_until":"","discourse_post_id":"1653675","discourse_permalink":"https:\/\/forums.developer.nvidia.com\/t\/train-a-reasoning-capable-llm-in-one-weekend-with-nvidia-nemo\/339932","wpdc_publishing_response":"success","wpdc_publishing_error":"","nv_subtitle":"","ai_post_summary":"<ul><li>NVIDIA&#039;s tools and datasets enable training a small reasoning model in about 48 hours on a single GPU.<\/li><li>The Llama Nemotron Post-Training Dataset, containing over 32 million samples, is used to teach a model controllable reasoning similar to Llama Nemotron capabilities.<\/li><li>To train a reasoning model, curate a focused subset of the dataset, select a base model with at least 8B parameters, and fine-tune using parameter-efficient fine-tuning (PEFT) with LoRA adapters.<\/li><li>Evaluation of the trained model involves benchmarking against the base model, using standard and domain-specific benchmarks, and manual inspection to verify controllability and consistency.<\/li><\/ul>","footnotes":"","_links_to":"","_links_to_target":""},"categories":[3110,1903],"tags":[3965,4817,453,3933,2932],"coauthors":[3748,4554,4534,2946],"class_list":["post-103512","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-generative-ai","category-features","tag-ai-agent","tag-build-ai-agent","tag-featured","tag-llama","tag-large-language-models","tagify_workload-generative-ai","tagify_workload-data-science"],"acf":{"post_industry":["General"],"post_products":["NeMo","NeMo Curator","NeMo Evaluator","Nemotron"],"post_learning_levels":["Intermediate Technical"],"post_content_types":["Tutorial"],"post_collections":""},"jetpack_featured_media_url":"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/06\/nemotron-featured.png","primary_category":{"category":"Agentic AI \/ Generative AI","link":"https:\/\/developer.nvidia.com\/blog\/category\/generative-ai\/","id":3110,"data_source":""},"nv_translations":[{"language":"zh_CN","title":"\u4f7f\u7528 NVIDIA NeMo \u5728\u4e00\u4e2a\u5468\u672b\u5185\u8bad\u7ec3\u4e00\u4e2a\u5177\u5907\u63a8\u7406\u80fd\u529b\u7684 LLM","post_id":14687}],"jetpack_shortlink":"https:\/\/wp.me\/pcCQAL-qVy","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/103512","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/users\/2039"}],"replies":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/comments?post=103512"}],"version-history":[{"count":36,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/103512\/revisions"}],"predecessor-version":[{"id":110916,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/103512\/revisions\/110916"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/media\/101380"}],"wp:attachment":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/media?parent=103512"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/categories?post=103512"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/tags?post=103512"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/coauthors?post=103512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}