The Opportunity Scanner
Posts
The Deepseek R1 Technical Report [For non-tech people]

The Deepseek R1 Technical Report [For non-tech people]

PLUS: Top 5 Innovative Use Cases Across the World

The Opportunity Scanner
February 03, 2025

What Makes DeepSeek-R1 Special?

DeepSeek-R1 is a Mixture-of-Experts (MoE) AI model. Instead of using its entire neural network for every task, it activates only the most relevant parts, making it efficient and scalable.

💡 Think of it like a university with 671 billion professors. Instead of making all of them work on every problem, DeepSeek-R1 picks just the best 37 billion for each task.

Deepseek R1 outperforms all existing LLMs in benchmark scores

Key Numbers That Matter

671 billion total parameters (brain-like connections), but only 37 billion are used per task. It’s among the largest models in terms of parameters.
14.8 trillion tokens of diverse, high-quality training data (the AI version of reading almost everything on the internet).
$5.57 million total training costs (far cheaper than GPT-4o, which likely costs hundreds of millions).
2.788 million GPU hours, trained on NVIDIA H800 GPUs (powerful AI chips).

DeepSeek-R1 is massive, but it’s designed in a way that saves time, energy, and money.

What are parameters in LLMs?
What are tokens in LLMs?

How Did They Build It? (And Why It’s So Cheap to Train)

DeepSeek-R1 isn't just powerful—it’s extremely cost-efficient. This was possible due to several smart design choices.

Deepseek R1 costed lesser to train and costs lesser for end users like you and me than GPT

💡 1. Smarter Attention System (Multi-Head Latent Attention, MLA)

Technical Term: Multi-Head Latent Attention (MLA) optimizes how the model focuses on different words in a sentence.
Simple Explanation: Imagine you’re at a noisy party trying to hear one conversation. A normal AI listens to everything (wasting energy), while DeepSeek-R1 focuses only on the important parts.
Why It Matters: This reduces memory usage, speeds up responses, and makes training more efficient.

💡 2. More Efficient Training (FP8 Mixed Precision Training & DualPipe Algorithm)

Training an AI model is like teaching a student—it takes time and resources. DeepSeek-R1 optimizes this in two major ways:

✅ FP8 Mixed Precision Training

Technical Term: Uses lower-precision numbers (FP8 instead of FP16 or FP32) to store and process data.
Simple Explanation: Imagine writing notes in shorthand instead of full sentences—you save space but still understand everything.
Why It Matters: This allows DeepSeek-R1 to use less memory and train faster without losing accuracy.

✅ DualPipe Algorithm

Technical Term: A pipeline parallelism strategy that overlaps different steps of training to avoid delays.
Simple Explanation: Imagine cooking a meal—instead of waiting for the pasta to finish before starting the sauce, you do both at the same time.
Why It Matters: This dramatically cuts training time and makes DeepSeek-R1 much cheaper to develop.

Why DeepSeek-R1 is Shaking Up the AI and Chip Industry

DeepSeek-R1 isn’t just another AI model—it’s a potential game-changer for both AI companies and the chip industry.

1. AI at a Fraction of the Cost

AI training is usually incredibly expensive, but DeepSeek-R1 proves top-tier AI can be trained for just $5.57 million, compared to hundreds of millions for models like GPT-4o.

The Nvidia H100 GPU used to train LLMs costs $25,000 per GPU

This could democratize AI, making it accessible to startups and businesses that couldn’t afford it before.

2. NVIDIA’s AI Chip Monopoly at Risk?

AI models typically rely on NVIDIA’s expensive GPUs. But DeepSeek-R1’s efficient training means fewer GPUs are needed, potentially reducing demand for NVIDIA chips.

NVIDIA Stock falling almost 17% after Deepseek R1 launch

If others adopt these methods, NVIDIA could lose dominance, and China may accelerate the development of its own AI chips.

3. Open-Source Disrupts the AI Market

Unlike GPT-4o, DeepSeek-R1 is completely open-source, allowing anyone to use, study, and improve it. This could accelerate AI innovation globally, forcing big tech companies to rethink their closed strategies.

Difference between Open Source and Closed Source Software

Top 5 Use Cases of Deepseek R1 around the world

Best Model to Use in VS Code

DeepSeek R1 is *the* best model available right now. It's at the level of o1, but you can use it for free, and it's much faster.
A huge leap forward that nobody saw coming. No wonder so many people are throwing tantrums online trying to discredit the Chinese students who built… x.com/i/web/status/1…
— Santiago (@svpino)
2:17 PM • Jan 27, 2025

Someone built Perplexi-Tube, Perplexity for YouTube with Deepseek R1 in 20 mins

Introducing Perplexi-Tube, the Perplexity for YouTube that I built in 20 minutes...
A Mobile app that uses Deepseek and Gemini
I used Cursor and the React Native template @anshnanda_ built that makes building mobile apps incredibly easy.
Tomorrow I'm releasing a full… x.com/i/web/status/1…
— Riley Brown (@rileybrown_ai)
3:33 AM • Jan 28, 2025

Someone built an FPS shooter game with Deepseek R1

DeepSeek was used to create code for a 1st-person shooter 3D environment featuring procedural terrain, biomes, and trees.
⚠️ Everything is generated on the spot 👀
Rendered with Ursina. GitHub repo linked below.
— Emm (@emmanuel_2m)
11:39 AM • Jan 28, 2025

A Browser Agent like ChatGPT Operator without paying $200

No need to pay $200 to use Operator
You can create an agent that uses a web browser without writing a line of code.
Combine DeepSeek R1 and Browser Use (free and open source) and you're good to go.
(Links and prompt below)
— Paul Couvert (@itsPaulAi)
8:39 PM • Jan 24, 2025

AGI at home with 7 M4 Mac Minis

AGI at home
Running DeepSeek R1 across my 7 M4 Pro Mac Minis and 1 M4 Max MacBook Pro.
Total unified memory = 496GB.
Uses @exolabs distributed inference with 4-bit quantization.
Next goal is fp8 (requires >700GB)
— Alex Cheema - e/acc (@alexocheema)
4:35 AM • Jan 21, 2025