blocks|key|39277|text|请注意，\frac{\partial+L}{\partial+\theta}与\frac{\partial+\theta}{\partial+L}不同。您试图描述的似乎是\frac{\partial+L}{\partial+\theta}，其中\theta是一个变量。如果\theta是高维的，有时我们只使用\nabla表示法。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|39278|梯度下降|39279|39280|\theta_{n%2B1}=\theta_n-\gamma+\nabla+L(\theta_n)|atomic|mathjax|teX|\theta_{n%2B1}=\theta_n-\gamma+\nabla+L(\theta_n)|39281|39282|不是所有的东西都是可微的，对于某些优化问题，梯度可能没有很好的定义。|unordered-list-item|39283|在存在约束的情况下，L可能需要充当朗格朗日函数而不是目标函数。|39284|梯度下降只是寻找模型参数的一种方法。基于梯度的方法似乎是目前的常态，但情况可以改变。|39285|你建议纹身的只是“坡度”或“斜度”。不是反对，只是想让你知道你在做什么。|entityMap|0|INLINETEX|mutability|IMMUTABLE|\frac{\partial+L}{\partial+\theta}|1|\frac{\partial+\theta}{\partial+L}|2|\frac{\partial+L}{\partial+\theta}|3|\theta|4|5|\nabla|6|L^0|4|Y|13|Y|2D|Y|3E|6|3S|6|4A|6|4|Y|0|13|Y|1|2D|Y|2|3E|6|3|3S|6|4|4A|6|5|0|0|0|0|1B|0|0|0|A|1|A|1|6|0|0^^$0|@$1|2|3|4|5|6|7|1F|8|@$9|1G|A|1H|B|C]|$9|1I|A|1J|B|C]|$9|1K|A|1L|B|C]|$9|1M|A|1N|B|C]|$9|1O|A|1P|B|C]|$9|1Q|A|1R|B|C]]|D|@$9|1S|A|1T|1|1U]|$9|1V|A|1W|1|1X]|$9|1Y|A|1Z|1|20]|$9|21|A|22|1|23]|$9|24|A|25|1|26]|$9|27|A|28|1|29]]|E|$]]|$1|F|3|G|5|6|7|2A|8|@]|D|@]|E|$]]|$1|H|3|-4|5|6|7|2B|8|@]|D|@]|E|$]]|$1|I|3|J|5|K|7|2C|8|@$9|2D|A|2E|B|C]]|D|@]|E|$L|-1|M|N]]|$1|O|3|-4|5|6|7|2F|8|@]|D|@]|E|$]]|$1|P|3|Q|5|R|7|2G|8|@]|D|@]|E|$]]|$1|S|3|T|5|R|7|2H|8|@$9|2I|A|2J|B|C]]|D|@$9|2K|A|2L|1|2M]]|E|$]]|$1|U|3|V|5|R|7|2N|8|@]|D|@]|E|$]]|$1|W|3|X|5|6|7|2O|8|@]|D|@]|E|$]]]|Y|$Z|$5|10|11|12|E|$M|13]]|14|$5|10|11|12|E|$M|15]]|16|$5|10|11|12|E|$M|17]]|18|$5|10|11|12|E|$M|19]]|1A|$5|10|11|12|E|$M|19]]|1B|$5|10|11|12|E|$M|1C]]|1D|$5|10|11|12|E|$M|1E]]]]

Note that $\frac{\partial L}{\partial \theta}$ is different from $\frac{\partial \theta}{\partial L}$. What you tried to describe seems to be $\frac{\partial L}{\partial \theta}$ where $\theta$ is a variable. If $\theta$ is high dimensional, sometimes we just use the $\nabla$ notation. 

Gradient descent is 

$$\theta_{n+1}=\theta_n-\gamma \nabla L(\theta_n)$$

<ul>
<li>Not everything is differentiable and gradient might not be well defined for some optimization problem. </li>
<li>In the event that there are constraints, $L$ might need to take the role of Langragian rather than the objective function. </li>
<li>Gradient descent is just a means to find the parameters for a model. Gradient based approach seems to be the norm for now but things can change.</li>
</ul>

What you proposed to tattoo is just "gradient" or "slope". Not objection but just want to let you know what you are doing.

blocks|key|87684|text|除了你的纹身，在梯度下降，损失函数需要最小化，这是我们的目标函数在这种情况下。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|87685|渐变更新规则指出，|87686|87687|\Large+\theta_{ij}+=+\theta_{ij}+-+\frac{\partial+L}{\partial+\theta_{ij}}|atomic|offset|length|style|CODE|mathjax|teX|\Large+\theta_{ij}+=+\theta_{ij}+-+\frac{\partial+L}{\partial+\theta_{ij}}|87688|87689|其中\theta是需要优化的参数。这是梯度下降的基本方程，它是几乎所有AI/ML任务的优化算法。|entityMap|0|INLINETEX|mutability|IMMUTABLE|\theta^0|0|0|0|0|22|0|0|2|6|2|6|0^^$0|@$1|2|3|4|5|6|7|X|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|Y|8|@]|9|@]|A|$]]|$1|D|3|-4|5|6|7|Z|8|@]|9|@]|A|$]]|$1|E|3|F|5|G|7|10|8|@$H|11|I|12|J|K]]|9|@]|A|$L|-1|M|N]]|$1|O|3|-4|5|6|7|13|8|@]|9|@]|A|$]]|$1|P|3|Q|5|6|7|14|8|@$H|15|I|16|J|K]]|9|@$H|17|I|18|1|19]]|A|$]]]|R|$S|$5|T|U|V|A|$M|W]]]]

Apart from your tattoo, in gradient descent, the loss function needs to be minimised which is our objective function in this case.

The Gradient Descent update rule states that, 

$\Large \theta_{ij} = \theta_{ij} - \frac{\partial L}{\partial \theta_{ij}}$

Where $\theta$ is the parameter which needs to be optimised. This is the fundamental equation of gradient descent which is used as an optimization algorithm in nearly all AI/ML tasks.

When I look into the following partial derivative, I see it as being the key element of any optimization algorithm out there. Correct me if I'm wrong, but this gets us the slope of the loss function, so we can go opposite to that slope, therefore minimizing the loss.

$$\frac{\partial \theta}{\partial \mathcal{L}}$$

where: $\theta$ is the weights, and the $\mathcal{L}$ is the loss;

<hr>

Does that make sense? Is there any other calculation step that is arguably more fundamental to the optimization of neural networks other than this derivative?

This topic is specially important for me right now, because I was thinking of tattoing this derivative, as a cool A.I. tattoo, and I want it to be fundamental and simple.

Neural networks, optimization math intuition

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

当我查看下面的偏导数时，我认为它是任何优化算法的关键元素。如果我错了，请纠正我，但是这得到了损失函数的斜率，所以我们可以与那个斜率相反，从而最小化损失。\frac{\partial \theta}{\partial \mathcal{L}}其中：\theta是权重，\mathcal{L}是损失；这有意义吗？除了这个导数，还有其他的计算步骤可以说是优化神经网络的基础吗？这个话题现在对我来说特别重要，

问神经网络优化数学直觉
EN

回答 2

Data Science用户

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问神经网络优化数学直觉EN

回答 2

Data Science用户

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问神经网络优化数学直觉
EN