微软退休工程师对【Deepseek R1】模型的权威解读（中、英字幕）

来源: 宇之道于 2025-01-28 06:27:10 [博客] [旧帖] [给我悄悄话] 本文已被阅读：次

嗨，我是戴夫。欢迎来到我的商店。我是戴夫·普拉默，来自
微软回到了 MS-DOS 和 Windows 95 时代，今天我们要解决一个重大问题
科技界的重大变革。中国开源人工智能模型DeepSeek R1发布。
马克·安德森将这一进展描述为“斯普特尼克时刻”
理由很充分。正如人造卫星的发射挑战了人们对美国
20 世纪的技术主导地位，DeepSeek R1 正在迫使
21 世纪。多年来，许多人认为，人工智能霸权的争夺牢牢掌握在
OpenAI 和 Anthropic 等老牌公司。但随着这一突破，一个新的竞争对手
不仅仅是进入该领域，他们也远远超出了预期。如果你关心
关于人工智能创新和全球技术竞争的未来，你会想
了解 DeepSeek R1 是什么、它为什么重要、它是否只是一场大型心理战，以及
对整个世界来说，这意味着什么？让我们深入探讨一下。
首先，这是真正让行业感到不安并导致
Nvidia 和 Microsoft 等公司都感到震惊。DeepSeek R1 不仅达到或超过了
OpenAI 的 O1 等美国最佳人工智能模型的性能，他们以低成本做到了这一点，
据报道，不到 600 万美元。与已经数百亿美元的交易相比，
投资，甚至更多，在这里实现类似的结果，更不用说 5000 亿美元的讨论
星际之门事件值得警惕。因为中国不仅声称自己做到了
成本很低，但据报道，他们没有使用最新的 Nvidia 芯片。如果
确实，这就像在车库里用雪佛兰的备用零件造一辆法拉利。如果你
你可以在自己的店里组装一辆法拉利，它真的和
一辆普通的法拉利，你认为这对法拉利的价格有什么影响？所以这有点
就像那样。
DeepSeek R1 到底是什么？它是一种新的语言模型，旨在提供强大的性能
超出其能力范围。虽然训练规模较小，但仍能回答问题，
生成文本并理解上下文。它与众不同之处不仅在于其能力，
而是它的构建方式。DeepSeek 的设计目标是廉价、高效，而且令人惊讶的是
资源丰富，利用更大的基础人工智能，如 OpenAI 的 GPT-4 或 Meta 的 Lama
搭建脚手架来创造更大的东西。
让我们来分析一下。因为从本质上讲，DeepSeek R1 是一个精简的语言模型。当你
训练一个大型人工智能模型，最终会得到一个庞大的数据，数千亿甚至数万亿
参数，消耗数 TB 的数据，并且需要一个数据中心的 GPU
才能正常运作。但是如果你在执行大多数任务时不需要那么多电力怎么办？
这就是蒸馏的想法的由来。你拿一个更大的模型，比如 GPT-4
或者 6710 亿参数的庞然大物 R1，你用它来训练较小的参数。
就像工匠大师教导徒弟一样。你不需要徒弟知道一切，
足以很好地完成实际工作。
DeepSeek R1 将这种方法发挥到了极致。通过使用更大的模型来指导训练，
DeepSeek 的创造者已经成功压缩了
更大的系统变成更小更轻量的东西。结果呢？一个模型
不需要大型数据中心即可运行。你可以运行这些较小的变体
在一款不错的消费级 CPU 甚至功能强大的笔记本电脑上，这将改变游戏规则。
但这是如何实现的呢？嗯，这有点像通过示例进行教学。假设你有一个
大型模型了解天体物理学、莎士比亚和 Python 编码的一切。相反
试图复制原始的计算能力，DeepSeek R1 试图模仿
更大模型的输出适用于各种问题和场景。通过仔细选择
通过示例和迭代训练过程，你可以教小模型生成
无需存储所有原始信息，即可获得类似的答案。这有点像
比如在没有整个图书馆的情况下复制答案。
这就是更有趣的地方。因为 DeepSeek 不仅仅依赖于单个
该过程的大型模型，它使用了多个人工智能，包括一些开源人工智能，如
Mattas Lama，在培训中提供多样化的观点和解决方案。思考
它就像一个专家小组，专门培养一个非常聪明的学生。
结合不同架构和数据集的见解，DeepSeek R1 达到了
在如此小的模型中，这种稳健性和适应性是罕见的。
现在下结论还为时过早，但该模型的开源性质意味着
模型中内置的任何偏见或过滤器都应该可以公开发现
可用权重。这是一种奇特的说法，即当
该模型是开源的。
事实上，我的第一个测试是向 DeepSeek 询问哪张著名的照片描绘了一个站立的男人
在坦克阵前。它正确地回应了天安门广场的抗议活动，
照片证书、拍摄者，甚至围绕着它的审查问题。
当然，DeepSeek 的在线版本可能完全不同，因为我正在运行
它在当地下线了，谁知道他们在中国会得到什么版本，但是公开版本
您可以下载，看起来稳定可靠。
那么为什么这一切很重要呢？首先，它大大降低了人工智能的准入门槛。相反
需要大规模的基础设施和自己的核电站来部署大型
语言模型，你可以通过更小的设置来实现。这很好
为小型公司、研究实验室甚至想要尝试的业余爱好者提供新闻
无需花费太多钱就能获得人工智能。
事实上，我在配备 Nvidia RTX 6000 AIDA 的 AMD Threadripper 上运行它
GPU 有 48GB VRAM，我可以运行最大的 6710 亿参数模型，
它每秒仍能生成超过 4 个 token。即使是 320 亿版本也能运行
在我的 MacBook Pro 上运行良好，较小的 Oren Nano 运行良好，售价为 249 美元。
但有一个问题。廉价建造东西有一定的风险。首先，较小的
模型通常会难以应对较大模型所拥有的知识的广度和深度。
他们更容易出现幻觉，有时会做出自信但错误的反应，
他们可能不太擅长处理高度专业化或细致入微的查询。此外，
由于这些较小的模型依赖于较大模型的训练数据，因此它们只能
和老师一样优秀。因此，如果他们训练的大型模型中存在错误或偏差
这些问题会逐渐延伸至较小的问题。
然后还有扩展的问题。DeepSeq 的效率令人印象深刻，但它也凸显了
所涉及的权衡。由于专注于成本和可及性，DeepSeq R1 可能无法与
在尖端能力方面，它直接与最大的参与者竞争。相反，它
作为一种实用且具有成本效益的替代方案，它为自己开辟了一个重要的市场。
在某些方面，这种方法让我想起了个人计算的早期阶段。返回
然后大型主机开始主导这个行业，然后这些零碎的
虽然小型电脑不能完成所有事情，但对于很多人来说已经足够好了
世界。快进几十年，个人电脑彻底改变了计算。DeepSeq 可能不会
可能是 GPT-5，但它可以为更加民主化的人工智能格局铺平道路，在这种格局中，先进的
工具并不局限于少数科技巨头。
这其中的影响是巨大的。想象一下，针对特定行业量身定制的人工智能模型，
在本地硬件上保护隐私和控制，甚至嵌入智能手机等设备
和智能家居中心。拥有自己的个人人工智能助理的想法，一个不会
依靠庞大的云后端，突然感觉更容易实现。当然，
前路并非没有挑战。DeepSeq 和类似的模型必须证明它们能够
可靠地处理实际任务，有效地扩展，并在主导的领域继续创新
迄今为止，竞争对手要强大得多。
但如果说我们从技术史中学到了什么，那就是创新
并不总是来自最大的参与者。有时只需要一个新视角
以及愿意（有时是必须）以不同的方式做事。DeepSeq R1 信号
中国不仅是全球人工智能竞赛的参与者，更是强大的竞争对手
能够生产尖端的开源模型。对于像 OpenAI 这样的美国人工智能公司来说，
谷歌、DeepMind 和 Anthropic 提出了双重挑战——保持技术
面对日益强大、成本效益更高的产品，保持市场领先地位并证明溢价的合理性
替代品。
那么这对美国人工智能有何影响？DeepSeq 等开源模型
R1 允许全球开发者以更低的成本进行创新。这可能会削弱竞争
专有模型的优势，特别是在研究和中小型领域
企业采用。严重依赖订阅或基于 API 的收入的美国公司
可能会感受到压力，从而可能抑制投资者的热情。
DeepSeq R1 作为开源软件的发布也使得强大的 AI 功能变得民主化。
世界各地的公司和政府都可以在其基础上继续发展，而无需
担心许可证问题或美国公司施加的限制。这可能会加速人工智能的采用
但减少了对美国开发车型的需求，影响了企业的收入来源
比如 OpenAI 和 Google Cloud。在股票市场中，严重依赖人工智能许可的公司，
云基础设施、NVIDIA 的芯片或 API 集成可能面临下行压力
因为投资者考虑到预计增长速度下降或竞争加剧。
在介绍中，我提到了这个 PsiOp 角度的潜力，尽管我
我自己并不是一个阴谋论者，但有人认为我们不应该
相信中国人对模型生产过程的描述。如果
这部片子是在二流硬件上制作的，只花了几百万美元，这已经是一部大片了。但有些
有人认为，也许中国在国家层面投入巨资进行援助，是希望打乱
通过让本来应该非常困难的事情看起来便宜来改变美国的现状
并且容易。但只有时间才能告诉我们答案。
这就是 DeepSeq R1 的简要介绍。这是一款小型人工智能，它发挥了超乎寻常的作用，
使用巧妙的技术，旨在让更多的人能够使用先进的人工智能
从未有过。它并不完美，也没有试图完美，但它是一次迷人的一瞥
人工智能的未来可能是什么样的——轻量、高效，但边缘略显粗糙，
但充满潜力。
现在，如果你发现这个关于 DeepSeq 的小解释器具有任何信息量的组合
或娱乐，记住我主要是为了订阅和喜欢，所以我会
如果您考虑订阅我的频道以获得更多类似内容，我将非常荣幸。
底部还有一个分享按钮，所以你的工具栏上会有
是一个转发图标，您可以单击它来将此内容发送给您
认为可能想要了解，但不知道这个频道。所以如果你
想要告诉他们有关 DeepSeq R1 的信息，请向他们发送此视频的链接。
如果你有任何与自闭症谱系相关的有趣问题，请查看免费的
在亚马逊上查看我的书的样本。这是我现在所知道的关于如何过上最好的生活的一切
我希望很久以前就知道这个频谱。
与此同时，希望下次能在 Dave's 见到你
车库。

Hey, I'm Dave. Welcome to my shop. I'm Dave Plummer, a retired software engineer from
Microsoft going back to the MS-DOS and Windows 95 days, and today we're tackling a seismic
shift in the world of technology. The release of China's open source AI model DeepSeek R1.
This development has been described as nothing less than a Sputnik moment by Marc Andreessen
and for good reason. Just as the launch of Sputnik challenged assumptions about American
technological dominance in the 20th century, DeepSeek R1 is forcing a reckoning in the
21st. For years, many believed that the race for AI supremacy was firmly in the hands of
the established players like OpenAI and Anthropic. But with this breakthrough, a new competitor
has not just entered the field, they've also seriously outpaced expectations. If you care
about the future of AI innovation and global technological competition, you'll want to
understand what DeepSeek R1 is, why it matters, whether it's just a giant psyop, and what
it means for the world at large. Let's dive in.
To set the stage, here's the part that really upset the industry and sent the stocks of
companies like Nvidia and Microsoft reeling. Not only does DeepSeek R1 meet or exceed the
performance of the best American AI models like OpenAI's O1, they did it on the cheap,
reportedly for under $6 million. And when you compare that to the tens of billions already
invested, if not more, here to achieve similar results, not to mention the $500 billion discussion
around Stargate, it's cause for alarm. Because not only does China claim to have done it
cheaply, but they reportedly did it without access to the latest of Nvidia's chips. If
true, it's akin to building a Ferrari in your garage out of spare Chevy parts. And if you
can throw together a Ferrari in your shop on your own and it's really just as good as
a regular Ferrari, what do you think that does to Ferrari prices? So it's a little bit
like that.
And just what is DeepSeek R1? It's a new language model designed to offer performance that punches
above its weight. Trained on a smaller scale, but still capable of answering questions,
generating text and understanding context. And what sets it apart isn't just the capabilities,
but the way that it's been built. DeepSeek is designed to be cheap, efficient and surprisingly
resourceful, leveraging larger foundational AIs like OpenAI's GPT-4 or Meta's Lama as
scaffolding to create something much larger.
Let's unpack that. Because at its core, DeepSeek R1 is a distilled language model. When you
train a large AI model, you end up with something massive, hundreds of billions if not a trillion
parameters, consuming terabytes of data and requiring a data center's worth of GPUs just
to function. But what if you don't need all that power for most tasks?
And that's where the idea of distillation comes in. You take a larger model like a GPT-4
or the 671 billion parameter behemoth R1 and you use it to train the smaller ones. It's
like a master craftsman teaching an apprentice. You don't need the apprentice to know everything,
just enough to do the actual job really well.
DeepSeek R1 takes this approach to an extreme. By using larger models to guide its training,
DeepSeek's creators have managed to compress the knowledge and reasoning capabilities of
much bigger systems into something far smaller and more lightweight. The result? A model
that doesn't need massive data centers to operate. You can run these smaller variants
on a decent consumer-grade CPU or even a beefy laptop, and that's a game changer.
But how does this work? Well, it's a bit like teaching by example. Let's say you have a
large model that knows everything about astrophysics, Shakespeare, and Python coding. And instead
of trying to replicate that raw computational power, DeepSeek R1 is trying to mimic the
outputs of the larger model for a wide range of questions and scenarios. By carefully selecting
examples and iterating over the training process, you can teach the smaller model to produce
similar answers without needing to store all that raw information itself. It's kind of
like copying the answers without the entire library.
And here's where it gets even more interesting. Because DeepSeek didn't just rely on a single
large model for the process, it used multiple AIs, including some open source ones like
Mattas Lama, to provide diverse perspectives and solutions during the training. Thinking
of it assembling like a panel of experts to train one exceptionally bright student. By
combining insights from different architectures and datasets, DeepSeek R1 achieves a level
of robustness and adaptability that's rare in such a small model.
It's too early to draw very many conclusions, but the open source nature of the model means
that any biases or filters built into the model should be discoverable in the publicly
available weights. Which is a fancy way of saying that it's hard to hide that stuff when
the model is open source.
In fact, one of my first tests was to ask DeepSeek what famous photo depicts a man standing
in front of a line of tanks. It correctly answered the Tiananmen Square protests, the
certificates of the photo, who took it, and even the censorship issues surrounding it.
Of course, the online version of DeepSeek may be completely different because I'm running
it offline locally, and who knows what version they get within China, but the public version
that you can download seems solid and reliable.
So why does all this matter? For one, it dramatically lowers the barrier to entry for AI. Instead
of requiring massive infrastructure and your own nuclear power plant to deploy a large
language model, you could potentially get by with a much smaller setup. That's good
news for smaller companies, research labs, or even hobbyists looking to experiment with
AI without breaking the bank.
In fact, I'm running it on our AMD Threadripper that's equipped with an Nvidia RTX 6000 AIDA
GPU that has 48GB of VRAM, and I can run the very largest 671 billion parameter model and
it still generates more than 4 tokens per second. And even the 32 billion version runs
nicely on my MacBook Pro, and the smaller ones run down to the Oren Nano for $249.
But there's a catch. Building something on the cheap has some risks. For starters, smaller
models often struggle with the breadth and depth of knowledge that the larger ones have.
They're more prone to hallucinations, generating confident but incorrect responses sometimes,
and they might not be as good at handling highly specialized or nuanced queries. Additionally,
because these smaller models rely on training data from the larger ones, they're only as
good as their teachers. So if there are errors or biases in the large models that they train
on, those issues can trickle down into the smaller ones.
And then there's the issue of scaling. DeepSeq's efficiency is impressive, but it also highlights
the tradeoffs involved. By focusing on cost and accessibility, DeepSeq R1 might not compete
directly with the biggest players in terms of cutting edge capabilities. Instead, it
carves out an important niche for itself as a practical, cost-effective alternative.
In some ways, this approach reminds me a bit of the early days of personal computing. Back
then you had massive mainframes dominating the industry, and then along came these scrappy
little PCs that couldn't quite do everything but what were good enough for a lot of the
world. Fast forward a few decades and the PC revolutionized computing. DeepSeq might not
be GPT-5, but it could pave the way for a more democratized AI landscape where advanced
tools aren't confined to a handful of tech giants.
The implications here are huge. Imagine AI models tailored to specific industries, running
on local hardware for privacy and control, or even embedded in devices like smartphones
and smart home hubs. The idea of having your own personal AI assistant, one that doesn't
rely on a massive cloud backend, suddenly feels a lot more attainable. Of course, the
road ahead isn't without its challenges. DeepSeq and models like it must prove that they can
handle real-world tasks reliably, scale effectively, and continue to innovate in a space dominated
so far by much larger competitors.
But if there's one thing we've learned from the history of technology, it's that innovation
doesn't always come from the biggest players. Sometimes all it takes is a fresh perspective
and a willingness, or sometimes a necessity, to do things differently. DeepSeq R1 signals
that China is not just a participant in the global AI race, but a formidable competitor
capable of producing cutting-edge open-source models. For American AI companies like OpenAI,
Google, DeepMind, and Anthropic, this creates a dual challenge - maintaining technological
leadership and justifying the price premium in the face of increasingly capable, cost-effective
alternatives.
So what are the implications for American AI? Well, open-source models like DeepSeq
R1 allow developers worldwide to innovate at lower cost. This could undermine the competitive
advantage of proprietary models, particularly in areas like research and small to medium
enterprise adoption. US companies that rely heavily on subscription or API-based revenue
could feel the squeeze, potentially dampening investor enthusiasm.
The release of DeepSeq R1 as open-source software also democratizes access to powerful AI capabilities.
Companies and governments around the world can build upon its foundation without the
licensing fears or the restrictions imposed by US firms. This could accelerate AI adoption
globally but reduce demand for US-developed models, impacting revenue streams for firms
like OpenAI and Google Cloud. In the stock market, companies heavily reliant on AI licensing,
cloud infrastructure, NVIDIA's chips, or API integrations could face downward pressure
as investors factor in lower projected growth or increased competition.
In the intro, I made a side reference to the potential of this PsiOp angle, and while I'm
not much of a conspiracy theorist myself, some have argued that perhaps we should not
take the Chinese up their word when it comes to how the model was produced. If it really
was produced on second-tier hardware for just a few million dollars, it's major. But some
argue that perhaps China invested heavily at the state level to assist, hoping to upset
the status quo in America by making what is supposed to be very hard look supposedly cheap
and easy. But only time will tell.
So that's DeepSeq R1 in a nutshell. A scrappy little AI, punching above its weight, built
using clever techniques and designed to make advanced AI accessible to more people than
ever before. It's not perfect, it's not trying to be, but it's a fascinating glimpse into
what the future of AI might look like - lightweight, efficient, and a little rough around the edges,
but full of potential.
Now if you found this little explainer on DeepSeq to be any combination of informative
or entertaining, remember that I'm mostly in this for the subs and likes, so I'd be
honoured if you'd consider subscribing to my channel to get more like it.
There's also a share button down in the bottom here, so somewhere in your toolbar there'll
be a forward icon which you can use to click on to send this to somebody else that you
think probably wants to be educated and just doesn't know about this channel. So if you
want to tell them about DeepSeq R1, send them a link to this video.
If you have any interesting matters related to the autism spectrum, check out the free
sample of my book on Amazon. It's everything I know now about living your best life on
the spectrum that I wish I'd known long ago.
In the meantime and in between time, hope to see you next time, right here in Dave's
Garage.