blocks|key|732771|text|这是一个公开的研究问题。激活的选择也与模型的体系结构和可用的计算/资源交织在一起，所以不能用筒仓来回答。论文LeCun+et.阿尔。对什么是良好的激活功能有了很好的见解。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|732772|尽管如此，这里有一些玩具例子，可能有助于获得激活功能的直觉。考虑具有一个隐藏层和一个简单分类任务的简单MLP：|732773|​|732774|📷|atomic|732775|732776|在最后一层，我们可以结合使用sigmoid和binary_crossentropy损失，以便使用logistic回归的直觉--因为我们只是对隐藏层提供给最后一层的学习特性进行简单的logistic回归。|style|CODE|732777|学习什么类型的特征取决于在该隐藏层中使用的激活函数和该隐藏层中的神经元数量。|732778|下面是ReLU在使用两个隐藏神经元时学到的内容：|732779|A.gif|732780|(左边是决策边界在特征空间中的样子)|732781|当你加入更多的神经元时，你会得到更多的片段来逼近决策边界。这里有三个隐藏的神经元：|732782|732783|732784|732785|10个隐藏的神经元：|732786|732787|732788|732789|Sigmoid和Tanh产生相似的衰变边界(这是tanh，jynT0RkGsZFqt3WSFcez4w.gif+-+sigmoid是相似的)，它们更加连续和正弦。|732790|主要的区别是sigmoid不是以零为中心的，这并不能使它成为隐藏层的好选择--尤其是在深网络中。|732791|entityMap|0|LINK|mutability|MUTABLE|url|http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf|1|IMAGE|IMMUTABLE|imageUrl|https://developer.qcloudimg.com/http-save/yehe-900000/f356e222de092e81c8ba769e1d6fccf3.png|imageAlt|2|https://miro.medium.com/max/2000/1*5nK725uTBUeoIA0XjEyA_A.gif|3|https://developer.qcloudimg.com/http-save/yehe-900000/063e7d8adb4eefeb604bca3a71ac66e7.png|4|https://developer.qcloudimg.com/http-save/yehe-900000/93609b2442608245c461f616999a4468.png|5|https://miro.medium.com/max/2000/1*jynT0RkGsZFqt3WSFcez4w.gif^0|1I|C|0|0|0|0|0|1|1|0|0|E|7|M|J|0|0|0|0|5|2|0|0|0|0|0|1|3|0|0|0|0|0|1|4|0|0|0|7|8|4|O|4|1M|7|T|Q|5|0|6|7|0^^$0|@$1|2|3|4|5|6|7|1X|8|@]|9|@$A|1Y|B|1Z|1|20]]|C|$]]|$1|D|3|E|5|6|7|21|8|@]|9|@]|C|$]]|$1|F|3|G|5|6|7|22|8|@]|9|@]|C|$]]|$1|H|3|I|5|J|7|23|8|@]|9|@$A|24|B|25|1|26]]|C|$]]|$1|K|3|G|5|6|7|27|8|@]|9|@]|C|$]]|$1|L|3|M|5|6|7|28|8|@$A|29|B|2A|N|O]|$A|2B|B|2C|N|O]]|9|@]|C|$]]|$1|P|3|Q|5|6|7|2D|8|@]|9|@]|C|$]]|$1|R|3|S|5|6|7|2E|8|@]|9|@]|C|$]]|$1|T|3|U|5|6|7|2F|8|@]|9|@$A|2G|B|2H|1|2I]]|C|$]]|$1|V|3|W|5|6|7|2J|8|@]|9|@]|C|$]]|$1|X|3|Y|5|6|7|2K|8|@]|9|@]|C|$]]|$1|Z|3|G|5|6|7|2L|8|@]|9|@]|C|$]]|$1|10|3|I|5|J|7|2M|8|@]|9|@$A|2N|B|2O|1|2P]]|C|$]]|$1|11|3|G|5|6|7|2Q|8|@]|9|@]|C|$]]|$1|12|3|13|5|6|7|2R|8|@]|9|@]|C|$]]|$1|14|3|G|5|6|7|2S|8|@]|9|@]|C|$]]|$1|15|3|I|5|J|7|2T|8|@]|9|@$A|2U|B|2V|1|2W]]|C|$]]|$1|16|3|G|5|6|7|2X|8|@]|9|@]|C|$]]|$1|17|3|18|5|6|7|2Y|8|@$A|2Z|B|30|N|O]|$A|31|B|32|N|O]|$A|33|B|34|N|O]|$A|35|B|36|N|O]]|9|@$A|37|B|38|1|39]]|C|$]]|$1|19|3|1A|5|6|7|3A|8|@$A|3B|B|3C|N|O]]|9|@]|C|$]]|$1|1B|3|-4|5|6|7|3D|8|@]|9|@]|C|$]]]|1C|$1D|$5|1E|1F|1G|C|$1H|1I]]|1J|$5|1K|1F|1L|C|$1M|1N|1O|-4]]|1P|$5|1E|1F|1G|C|$1H|1Q]]|1R|$5|1K|1F|1L|C|$1M|1S|1O|-4]]|1T|$5|1K|1F|1L|C|$1M|1U|1O|-4]]|1V|$5|1E|1F|1G|C|$1H|1W]]]]

This is an open research question. The choice of activation is also very intertwined with the architecture of the model and the computation / resources available so it's not something that can be answered in silo. The paper <a href="http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf" rel="nofollow noreferrer">Efficient Backprop, Yann LeCun et. al.</a> has a lot of good insights into what makes a good activation function.
That being said, here are some toy examples that may help get intuition for activation functions. Consider a simple MLP with one hidden layer and a simple classification task:
<a href="https://i.stack.imgur.com/stp7y.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/stp7y.png" alt="enter image description here" /></a>
In the last layer we can use <code>sigmoid</code> in combination with the <code>binary_crossentropy</code> loss in order to use intuition from logistic regression - because we're just doing simple logistic regression on the learned features that the hidden layer gives to the last layer.
What types of features are learned depends on the activation function used in that hidden layer and the number of neurons in that hidden layer.
Here is what ReLU learns when using two hidden neurons:
<a href="https://miro.medium.com/max/2000/1*5nK725uTBUeoIA0XjEyA_A.gif" rel="nofollow noreferrer">https://miro.medium.com/max/2000/1*5nK725uTBUeoIA0XjEyA_A.gif</a>
(on the left is what the decision boundary looks like in the feature space)
As you add more neurons you get more pieces with which to approximate the decision boundary. Here is with 3 hidden neurons:
<a href="https://i.stack.imgur.com/Y7O6O.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/Y7O6O.png" alt="enter image description here" /></a>
And 10 hidden neurons:
<a href="https://i.stack.imgur.com/NXMdp.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/NXMdp.png" alt="enter image description here" /></a>
<code>Sigmoid</code> and <code>Tanh</code> produce similar decsion boundaries (this is <code>tanh</code> <a href="https://miro.medium.com/max/2000/1*jynT0RkGsZFqt3WSFcez4w.gif" rel="nofollow noreferrer">https://miro.medium.com/max/2000/1*jynT0RkGsZFqt3WSFcez4w.gif</a> - <code>sigmoid</code> is similar) which are more continuous and sinusoidal.
The main difference is that <code>sigmoid</code> is not zero-centered which doesn't make it a good choice for a hidden layer - especially in deep networks.

Currently there are a lot of activation functions like sigmoid, tanh, ReLU ( being the preferred choice ), but I have a question that concerns which choices are needed to be considered so that a certain activation function should be selected.
For example : When we want to Upsample a network in GANs, we prefer using LeakyReLU.
I am a newbie in this subject, and have not found a concrete solution as to which activation function to use in different situations.
My knowledge uptil now : 
Sigmoid : When you have a binary class to identify 
Tanh : ? 
ReLU : ? 
LeakyReLU : When you want to upsample
Any help or article will be appreciated.

How to decide on activation function?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

目前有许多激活函数，如sigmoid、tanh、ReLU (是首选)，但我有一个问题，即需要考虑哪些选择才能选择特定的激活函数。例如:当我们想在GANs中更新一个网络时，我们更喜欢使用LeakyReLU。我是这门学科的新手，还没有找到在不同情况下使用哪种激活功能的具体解决方案。我的知识直到现在：Sigmoid :当您有一个二进制类要识别时唐：？ReLU：？LeakyReLU :当你想要升级时如有任

问如何确定激活函数？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何确定激活函数？EN