blocks|key|2793489|text|这绝不是一个详尽的答案，但它肯定会给你一个Python的起点-|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|2793490|数据探测|header-two|2793491|从Pandas+Profiling开始。它将为您提供变量的HTML报告。如果数据质量良好，它将提供一些关于填充率的见解，这取决于变量类型--每个变量的一些统计数据。|2793492|相关矩阵|2793493|熊猫分析报告中包含了相关矩阵。但是，如果您希望手工计算，请使用pd.corr()。您可以改变参数以获得不同的相关指标，如‘pearson’,+‘kendall’,+‘spearman’。|2793494|降维->+PCA+(降维)|2793495|有很多方法可以做到这一点。记住，如果您只是在寻找精确性，而不关心X是如何影响y的，那么(1)是一个可选的步骤(也适用于(2)+)。|2793496|分析相关矩阵，利用VIF对高相关变量进行转储|ordered-list-item|2793497|因子分析/+PCA降维方法|2793498|使用套索拟合模型，检查系数，0或0的系数可以被认为是弱指标，可以被消除。|2793499|保持全部50，并使用岭回归并改变alpha参数以精确调整精度(或任何您试图优化的度量)。|2793500|如果模型看起来仍然不稳定，试着用sklearn的多项式特征编写非线性特性，并进行正则化和重复。|2793501|也许在现实世界中最重要的是，询问领域专家他/她认为哪些是重要的变量？|2793502|基本线性回归技术|2793503|使用超参数来获得良好的交叉验证/测试分数是基本线性回归模型的关键。|2793504|从这里和这里尝试尽可能多的技术|entityMap|0|LINK|mutability|MUTABLE|url|https://github.com/pandas-profiling/pandas-profiling|1|https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.corr.html|2|http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html|3|http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html|4|http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html|5|http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html|6|http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html|7|http://scikit-learn.org/stable/modules/linear_model.html|8|http://scikit-learn.org/stable/supervised_learning.html^0|L|6|0|0|1|G|1|G|0|0|0|V|9|1O|W|V|9|1|0|0|W|1|12|1|0|9|3|0|6|3|2|0|E|1|G|1|2|2|3|0|A|3|4|0|O|5|5|0|0|0|N|4|6|0|1|2|7|4|2|8^^$0|@$1|2|3|4|5|6|7|1Y|8|@$9|1Z|A|20|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|21|8|@]|D|@]|E|$]]|$1|I|3|J|5|6|7|22|8|@$9|23|A|24|B|C]]|D|@$9|25|A|26|1|27]]|E|$]]|$1|K|3|L|5|H|7|28|8|@]|D|@]|E|$]]|$1|M|3|N|5|6|7|29|8|@$9|2A|A|2B|B|C]|$9|2C|A|2D|B|C]]|D|@$9|2E|A|2F|1|2G]]|E|$]]|$1|O|3|P|5|H|7|2H|8|@]|D|@]|E|$]]|$1|Q|3|R|5|6|7|2I|8|@$9|2J|A|2K|B|C]|$9|2L|A|2M|B|C]]|D|@]|E|$]]|$1|S|3|T|5|U|7|2N|8|@$9|2O|A|2P|B|C]]|D|@]|E|$]]|$1|V|3|W|5|U|7|2Q|8|@]|D|@$9|2R|A|2S|1|2T]]|E|$]]|$1|X|3|Y|5|U|7|2U|8|@$9|2V|A|2W|B|C]|$9|2X|A|2Y|B|C]]|D|@$9|2Z|A|30|1|31]]|E|$]]|$1|Z|3|10|5|U|7|32|8|@]|D|@$9|33|A|34|1|35]]|E|$]]|$1|11|3|12|5|U|7|36|8|@]|D|@$9|37|A|38|1|39]]|E|$]]|$1|13|3|14|5|U|7|3A|8|@]|D|@]|E|$]]|$1|15|3|16|5|H|7|3B|8|@]|D|@]|E|$]]|$1|17|3|18|5|U|7|3C|8|@]|D|@$9|3D|A|3E|1|3F]]|E|$]]|$1|19|3|1A|5|U|7|3G|8|@]|D|@$9|3H|A|3I|1|3J]|$9|3K|A|3L|1|3M]]|E|$]]]|1B|$1C|$5|1D|1E|1F|E|$1G|1H]]|1I|$5|1D|1E|1F|E|$1G|1J]]|1K|$5|1D|1E|1F|E|$1G|1L]]|1M|$5|1D|1E|1F|E|$1G|1N]]|1O|$5|1D|1E|1F|E|$1G|1P]]|1Q|$5|1D|1E|1F|E|$1G|1R]]|1S|$5|1D|1E|1F|E|$1G|1T]]|1U|$5|1D|1E|1F|E|$1G|1V]]|1W|$5|1D|1E|1F|E|$1G|1X]]]]

In no way is this going to be an exhaustive answer, but it will definitely give you a starting point in <code>Python</code> - 

Data Exploration

Start with <a href="https://github.com/pandas-profiling/pandas-profiling" rel="nofollow noreferrer"><code>Pandas Profiling</code></a>. It will give you HTML reports of your variables. If the quality of the data is good, it will provide some insights into the fill rate, depending upon the variable type some statistics for each variable

Correlational matrix

The pandas profiling report includes the coorelation matrix. But if you are looking to compute by hand, use <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.corr.html" rel="nofollow noreferrer"><code>pd.corr()</code></a>. You can vary parameters to get different correlation metrics like <code>‘pearson’, ‘kendall’, ‘spearman’</code>

Dimension Reduction -> PCA (Dimension reduction)

There are many ways to do this. Keep in mind if you are looking for accuracies only and don't care about how <code>X</code> is influencing <code>y</code>, (1) is an optional step (applies to (2) as well). 

<ol>
<li>Analyse the correlation matrix and use <code>VIF</code> to dump variables with high correlation</li>
<li>Factor Analysis / <a href="http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html" rel="nofollow noreferrer">PCA</a> for dimensionality reduction</li>
<li>Use <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html" rel="nofollow noreferrer">LASSO</a> to fit a model, check the coefficients and the ones that are <code>0</code> or going to <code>0</code> can be thought of as weak indicators and can be eliminated.</li>
<li>Keep all 50, and use <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html" rel="nofollow noreferrer">Ridge Regression</a> and vary the alpha parameter to fine-tune accuracy (or whatever metric you are trying to optimize)</li>
<li>If the model still doesn't seem to be stable, try to cook non-linear features with sklearn's <a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html" rel="nofollow noreferrer">Polynomial Features</a>, regularize and repeat.</li>
<li>Probably the most important in the real world, ask the domain expert on what he/she thinks might be the important variables</li>
</ol>

Basic Linear Regression technique

<ol>
<li>Playing with hyperparamters to get good cross-validation/test score is the key here for a basic <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html" rel="nofollow noreferrer">Linear Regression</a> model. </li>
<li>Try as many techniques as you can from <a href="http://scikit-learn.org/stable/modules/linear_model.html" rel="nofollow noreferrer">here</a> and <a href="http://scikit-learn.org/stable/supervised_learning.html" rel="nofollow noreferrer">here</a></li>
</ol>

I am a newbie to data science and statistics. I came across this problem, which has 50 independent variables and one dependent variable and trying to identify the good regression technique to start with. The following is the flow chart that I executed:

Data Exploration -> Correlational matrix -> dimension reduction -> PCA (Dimension reduction) -> Basic Linear Regression technique. 

Can someone guide me, if there is any other better technique or procedure.

What the good general regression technqiue for a problem with 50 independent varaibles

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我是数据科学和统计学的新手。我遇到了这个问题，它有50个自变量和一个因变量，并试图找出一个很好的回归技术。下面是我执行的流程图：数据挖掘->相关矩阵->维数约简-> (维数约简) ->基本线性回归技术。如果有其他更好的技术或程序，有人能指导我吗？

问50独立变量问题的好的一般回归技术
EN

回答 1

Data Science用户

数据探测

相关矩阵

降维-> PCA (降维)

基本线性回归技术

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问50独立变量问题的好的一般回归技术EN