首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Pgmpy:缺失数据贝叶斯网络参数学习的期望最大化

Pgmpy:缺失数据贝叶斯网络参数学习的期望最大化
EN

Stack Overflow用户
提问于 2022-03-18 13:30:12
回答 1查看 270关注 0票数 0

我正在尝试使用用于python的PGMPY包来学习贝叶斯网络的参数。如果我正确理解期望最大化,它应该能够处理丢失的值。我目前正在试验一个3变量BN,其中前500个数据点有一个缺失的值。没有潜在的变量。尽管pgmpy中的描述表明它应该处理丢失的值,但我得到了一个错误。此错误仅在使用缺少值的数据点调用函数时发生。我做错了什么吗?还是我应该查看另一个带有缺失值的EM包?

代码语言:javascript
复制
#import
import numpy as np
import pandas as pd
from pgmpy.estimators import BicScore, ExpectationMaximization
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import HillClimbSearch

# Read data that does not contain any missing values
data = pd.read_csv("asia10K.csv")
data = pd.DataFrame(data, columns=["Smoker", "LungCancer", "X-ray"])
test_data = data[:2000]
new_data = data[2000:]

# Learn structure of initial model from data
bic = BicScore(test_data)
hc = HillClimbSearch(test_data)
model = hc.estimate(scoring_method=bic)

# create some missing values
new_data["Smoker"][:500] = np.NaN

# learn parameterization of BN
bn = BayesianNetwork(model)
bn.fit(new_data, estimator=ExpectationMaximization, complete_samples_only=False)

我得到的错误是索引错误:

代码语言:javascript
复制
  File "main.py", line 100, in <module>
    bn.fit(new_data, estimator=ExpectationMaximization, complete_samples_only=False)
  File "C:\Python38\lib\site-packages\pgmpy\models\BayesianNetwork.py", line 585, in fit
    cpds_list = _estimator.get_parameters(n_jobs=n_jobs, **kwargs)
  File "C:\Python38\lib\site-packages\pgmpy\estimators\EM.py", line 213, in get_parameters
    weighted_data = self._compute_weights(latent_card)
  File "C:\Python38\lib\site-packages\pgmpy\estimators\EM.py", line 100, in _compute_weights
    weights = df.apply(lambda t: self._get_likelihood(dict(t)), axis=1)
  File "C:\Python38\lib\site-packages\pandas\core\frame.py", line 8833, in apply
    return op.apply().__finalize__(self, method="apply")
  File "C:\Python38\lib\site-packages\pandas\core\apply.py", line 727, in apply
    return self.apply_standard()
  File "C:\Python38\lib\site-packages\pandas\core\apply.py", line 851, in apply_standard
    results, res_index = self.apply_series_generator()
  File "C:\Python38\lib\site-packages\pandas\core\apply.py", line 867, in apply_series_generator
    results[i] = self.f(v)
  File "C:\Python38\lib\site-packages\pgmpy\estimators\EM.py", line 100, in <lambda>
    weights = df.apply(lambda t: self._get_likelihood(dict(t)), axis=1)
  File "C:\Python38\lib\site-packages\pgmpy\estimators\EM.py", line 76, in _get_likelihood
    likelihood *= cpd.get_value(
  File "C:\Python38\lib\site-packages\pgmpy\factors\discrete\DiscreteFactor.py", line 195, in get_value
    return self.values[tuple(index)]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

谢谢!

EN

回答 1

Stack Overflow用户

发布于 2022-03-29 21:58:12

由于您的具体问题仍然没有答案,让我用另一个模块提出一个解决方案:

代码语言:javascript
复制
#import 
import pandas as pd
import numpy as np
import pyAgrum as gum

# Read data that does not contain any missing values
data = pd.read_csv("asia10K.csv")
# not exactly the same names
data = pd.DataFrame(data, columns=["smoking", "lung_cancer", "positive_XraY"]) 
test_data = data[:2000]
new_data = data[2000:].copy() 

# Learn structure of initial model from data
learner=gum.BNLearner(test_data)
learner.useScoreBIC()
learner.useGreedyHillClimbing()
model=learner.learnBN()

# create some missing values
new_data["smoking"][:500] = "?" # instead of NaN

# learn parameterization of BN
bn = gum.BayesNet(model)
learner2=gum.BNLearner(new_data,model)
learner2.useEM(1e-10)
learner2.fitParameters(bn)

在笔记本上:

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71527787

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档