Chuck Darwin<p><a href="https://c.im/tags/BigCode" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BigCode</span></a> is an open scientific collaboration working on responsible training of large language models for coding applications. </p><p>In this organization you can find the artefacts of this collaboration:<br>👉 <a href="https://c.im/tags/StarCoder" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>StarCoder</span></a>, a state-of-the-art language model for code, <br>👉 The <a href="https://c.im/tags/Stack" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Stack</span></a>, the largest available pretraining dataset with perimssive code, and 👉 <a href="https://c.im/tags/SantaCoder" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SantaCoder</span></a>, a 1.1B parameter model for code.</p><p><a href="https://c.im/tags/StarCoder" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>StarCoder</span></a> is a 15.5B parameters language model for code trained for 1T tokens on 80+ programming languages. <br>It uses MQA for efficient generation, has 8,192 tokens context window and can do fill-in-the-middle.</p><p>Chat with StarCoder here: <a href="https://huggingface.co/chat/?model=bigcode/starcoder" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="ellipsis">huggingface.co/chat/?model=big</span><span class="invisible">code/starcoder</span></a></p><p><a href="https://huggingface.co/bigcode" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="">huggingface.co/bigcode</span><span class="invisible"></span></a></p>