Prime Highlights:
DeepSeek-V3 is now available under the widely used MIT open-source license, offering greater flexibility for developers and commercial use.
The updated model is more capable and hardware-efficient than its predecessor, with improved programming abilities, particularly in Python and Bash code generation.
Key Background:
DeepSeek has released an improved version of its DeepSeek-V3 large language model (LLM) under the widely used MIT open-source license. This new release builds on the initial version, offering increased capabilities and hardware efficiency, marking a significant update in the open-source AI landscape.
Originally launched in December, DeepSeek-V3 serves as a general-purpose model, capable of solving basic math problems and generating code, though it was not initially optimized for reasoning. Its predecessor, DeepSeek-R1, is a reasoning-focused model that gained attention earlier this year. While DeepSeek-V3 is not tailored for reasoning tasks like DeepSeek-R1, it has garnered recognition for its general utility in various applications.
Previously distributed under a custom open-source license, DeepSeek-V3 is now available under the MIT License, providing developers the freedom to incorporate the model into commercial projects and modify it without significant restrictions. This shift allows broader access to the model, particularly for developers looking for flexibility in deployment.
One of the most significant improvements in the latest release is its enhanced hardware efficiency. For instance, Awni Hannun, a researcher at Apple Inc., demonstrated the model’s ability to run efficiently on a high-end Mac Studio, achieving a token generation rate of approximately 20 tokens per second. This performance was made possible through the application of four-bit quantization, which reduces memory usage and latency, though at a slight cost to output accuracy. Additionally, the updated model demonstrates improved performance in programming tasks.
Benchmark tests revealed that the new DeepSeek-V3 outperforms the original in generating Python and Bash code, with a score of around 60%. However, it still lags behind DeepSeek-R1 and other reasoning-optimized models such as Qwen-32B. With 671 billion parameters, the latest DeepSeek-V3 release activates only 37 billion for each prompt, making it more efficient than its predecessors. Fine-tuning the model with responses from DeepSeek-R1 has further improved its performance, allowing it to operate with less infrastructure, thus lowering inference costs.