blocks|key|1663301|text|我想分享我的观点|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1663302|我想训练集700意味着，你的数据是<+1k。|1663303|我甚至感到惊讶的是，83%25.|1663304|Even执行|unordered-list-item|1663305|数据集被认为是很小的(60.000训练-+10.000测试)。您的数据要小得多。|ordered-list-item|1663306|1663307|1663308|使用|1663309|，您可以尝试使用pca来减少更小的数据。那么，svm将学到什么呢？没有任何有区别的样本？？|offset|length|style|CODE|1663310|1663311|1663312|，如果我是你，我会用random-forest分类器进行测试。随机森林甚至可能表现得更好。|1663313|1663314|即使您平衡了您的数据，data.|1663315|I相信使用SMOTE也不会提高结果。如果您的数据由图像组成，那么您可以使用ImageDataGenerator复制数据。虽然我不确定ImageDataGenerator.|1663316|You包含的matlab是否会使用主成分分析，但是当你有很多样本的时候。然而，样本并不直接影响精度，但它们是data.|1663317|For实例的组成部分:让我们考虑手写的数字分类数据。|1663318|1663319|​|1663320|📷|atomic|1663321|1663322|从上面我们可以说每个像素都直接影响了精度吗？|1663323|答案是否定的？以上的黑色像素并不重要的准确性，因此，删除他们，我们使用pca。|1663324|如果您希望使用python示例进行详细解释的话。看看我的另一个answer|1663325|entityMap|0|IMAGE|mutability|IMMUTABLE|imageUrl|https://developer.qcloudimg.com/http-save/yehe-900000/4a97dd43182048b118216b3c628744b4.png|imageAlt|1|LINK|MUTABLE|url|https://stackoverflow.com/questions/62851445/optimal-feature-selection-technique-after-pca^0|0|0|0|1|1|0|0|0|8|3|N|3|0|0|0|A|D|0|0|0|5|5|11|I|1U|I|0|6|6|0|0|0|0|0|1|0|0|0|0|Z|3|0|V|6|1|0^^$0|@$1|2|3|4|5|6|7|1Z|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|20|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|21|8|@]|9|@]|A|$]]|$1|F|3|G|5|H|7|22|8|@]|9|@]|A|$]]|$1|I|3|J|5|K|7|23|8|@]|9|@]|A|$]]|$1|L|3|-4|5|6|7|24|8|@]|9|@]|A|$]]|$1|M|3|-4|5|6|7|25|8|@]|9|@]|A|$]]|$1|N|3|O|5|6|7|26|8|@]|9|@]|A|$]]|$1|P|3|Q|5|H|7|27|8|@$R|28|S|29|T|U]|$R|2A|S|2B|T|U]]|9|@]|A|$]]|$1|V|3|-4|5|6|7|2C|8|@]|9|@]|A|$]]|$1|W|3|-4|5|6|7|2D|8|@]|9|@]|A|$]]|$1|X|3|Y|5|H|7|2E|8|@$R|2F|S|2G|T|U]]|9|@]|A|$]]|$1|Z|3|-4|5|6|7|2H|8|@]|9|@]|A|$]]|$1|10|3|11|5|6|7|2I|8|@]|9|@]|A|$]]|$1|12|3|13|5|H|7|2J|8|@$R|2K|S|2L|T|U]|$R|2M|S|2N|T|U]|$R|2O|S|2P|T|U]]|9|@]|A|$]]|$1|14|3|15|5|K|7|2Q|8|@$R|2R|S|2S|T|U]]|9|@]|A|$]]|$1|16|3|17|5|H|7|2T|8|@]|9|@]|A|$]]|$1|18|3|-4|5|6|7|2U|8|@]|9|@]|A|$]]|$1|19|3|1A|5|6|7|2V|8|@]|9|@]|A|$]]|$1|1B|3|1C|5|1D|7|2W|8|@]|9|@$R|2X|S|2Y|1|2Z]]|A|$]]|$1|1E|3|1A|5|6|7|30|8|@]|9|@]|A|$]]|$1|1F|3|1G|5|6|7|31|8|@]|9|@]|A|$]]|$1|1H|3|1I|5|6|7|32|8|@$R|33|S|34|T|U]]|9|@]|A|$]]|$1|1J|3|1K|5|6|7|35|8|@]|9|@$R|36|S|37|1|38]]|A|$]]|$1|1L|3|-4|5|6|7|39|8|@]|9|@]|A|$]]]|1M|$1N|$5|1O|1P|1Q|A|$1R|1S|1T|-4]]|1U|$5|1V|1P|1W|A|$1X|1Y]]]]

I want to share my opinion
I think training set 700 means, your data is &lt; 1k.
<ol>
<li>I'm even surprised that <code>svm</code> performs 83%.</li>
</ol>
<ul>
<li>Even MNIST dataset is considered to be small (60.000 training - 10.000 test). Your data is much-much smaller.
</li>
<li>You try to reduce your small data even smaller using <code>pca</code>. So what will <code>svm</code> learns? There is no discriminating samples left?
</li>
<li>If I were you I would test using <code>random-forest</code> classifier. Random-forest might even perform better.
</li>
</ul>
<ol start="2">
<li>Even if you balanced your data, it is small data.</li>
</ol>
<ul>
<li>I believe using <code>SMOTE</code> will not improve the result. If your data consist of images then you could use <code>ImageDataGenerator</code> for replicating your data. Though I'm not sure <code>matlab</code> contains <code>ImageDataGenerator</code>.</li>
</ul>
<ol start="3">
<li>You will use PCA, when you have lots of samples. Yet the samples are not directly effecting the accuracy but they are the components of data.</li>
</ol>
<ul>
<li>For instance: Let's consider handwritten digit classification data.</li>
</ul>
<a href="https://i.stack.imgur.com/s1IIn.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/s1IIn.png" alt="enter image description here" /></a>
From above can we say each pixel is directly effecting the accuracy?
The answer is no? Above the black pixels are not important for the accuracy, therefore to remove them we use <code>pca</code>.
If you want a detailed explanation with a python example. Check out my other <a href="https://stackoverflow.com/questions/62851445/optimal-feature-selection-technique-after-pca">answer</a>

I am using the Matlab Classification Learner app to test different classifiers over a training set (size = 700). My response variable is a categorical label with 5 possible values. I have 7 numerical features and 2 categorical ones. I found a Cubic SVM to have the highest accuracy of 83%. But the performance goes down considerably when I enable PCA with 95% explained variance (accuracy = 40.5%). I am a student and this is the first time I am using PCA.
<ol>
<li>Why do I see such a result?</li>
<li>Could it be because of a small / unbalanced data set?</li>
<li>When is it useful to apply PCA? When we say &quot;reduce dimensionality&quot;, is there a minimum number of features (dimensionality) in the original set?</li>
</ol>
Any help is appreciated. Thanks in advance!

When to use PCA for dimensionality reduction?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我正在使用Matlab分类学习应用程序测试不同的分类器在一个训练集(大小= 700)。我的响应变量是一个包含5个可能值的分类标签。我有7个数字特征和2个分类特征。我发现立方体支持向量机的准确率最高，为83%。但是，当我启用解释方差为95%的PCA时，性能明显下降(准确率为40.5%)。我是一名学生，这是我第一次使用PCA。为什么会看到这样的结果？可能是因为一个小的/不平衡的数据集吗？什么时候应用P

问何时使用PCA进行降维？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问何时使用PCA进行降维？EN