我一直在编写一个java库,我想用它来构建贝叶斯信念网络。我有用于构建有向图的类
public class Node{
private String label;
private List<Node> adjacencyList = new ArrayList<Node>();
private Frequency<String> distribution = new Frequency<String>();
public String getLabel() {
return label;
}
public void setLabel(String label) {
this.label = label;
}
public List<Node> getAdjacencyList(){
return adjacencyList;
}
public void addNeighbour(Node neighbour){
adjacencyList.add(neighbour);
}
public void setDistribution(List<String> data){
for(String s:data){
distribution.addValue(s);
}
}
public double getDistributionValue(String value){
return distribution.getPct(value);
}
} 图表
public class DirectedGraph {
Map<String,Node> graph = new HashMap<String,Node>();
public void addVertex(String label){
Node vertex = new Node();
vertex.setLabel(label);
graph.put(label, vertex);
}
public void addEdge(String here, String there){
Node nHere = graph.get(here);
Node nThere = graph.get(there);
nThere.addNeighbour(nHere);
graph.put(there,nThere);
}
public List<Node> getNeighbors(String vertex){
return graph.get(vertex).getAdjacencyList();
}
public int degree(String vertex){
return graph.get(vertex).getAdjacencyList().size();
}
public boolean hasVertex(String vertex){
return graph.containsKey(vertex);
}
public boolean hasEdge(String here, String there){
Set<Node> nThere = new HashSet<Node>(graph.get(there).getAdjacencyList());
boolean thereConHere = nThere.contains(here);
return (thereConHere);
}
}我有一个类,用于跟踪数据集的概率分布。
public class Frequency<T extends Comparable<T>> {
private Multiset event = HashMultiset.create();
private Multimap event2 = LinkedListMultimap.create();
public void addValue(T data){
if(event2.containsKey(data) == false){
event2.put(data,data);
}
event.add(data);
}
public void clear(){
this.event = null;
this.event2 = null;
this.event = HashMultiset.create();
this.event2 = LinkedListMultimap.create();
}
public double getPct(T data){
int numberOfIndElements = event.count(data);
int totalNumOfElements = event.size();
return (double) numberOfIndElements/totalNumOfElements;
}
public int getNum(T data){
int numberOfIndElements = event.count(data);
return numberOfIndElements;
}
public int getSumFreq(){
return event.size();
}
public int getUniqueCount(){
return event.entrySet().size();
}
public String[] getKeys(){
Set<String> test = event2.keySet();
Object[] keys = test.toArray();
String[] keysAsStrings = new String[keys.length];
for(int i=0;i<keys.length;i++){
keysAsStrings[i] = (String) keys[i];
}
return keysAsStrings;
}
}以及我可以用来计算条件概率的另一个函数。
public double conditionalProbability(List<String> interestedSet,
List<String> reducingSet,
String interestedClass,
String reducingClass){
List<Integer> conditionalData = new LinkedList<Integer>();
double returnProb = 0;
iFrequency.clear();
rFrequency.clear();
this.setInterestedFrequency(interestedSet);
this.setReducingFrequency(reducingSet);
for(int i = 0;i<reducingSet.size();i++){
if(reducingSet.get(i).equalsIgnoreCase(reducingClass)){
if(interestedSet.get(i).equalsIgnoreCase(interestedClass)){
conditionalData.add(i);
}
}
}
int numerator = conditionalData.size();
int denominator = this.rFrequency.getNum(reducingClass);
if(denominator !=0){
returnProb = (double)numerator/denominator;
}
iFrequency.clear();
rFrequency.clear();
return returnProb;
}然而,我仍然不知道如何连接所有的东西来执行分类。
我正在读一篇题为“贝叶斯网络分类器的比较”的论文,试图了解一下。
比方说,我试图根据身高、体重和鞋子大小来预测一个人的性别。我的理解是,我将性别作为我的父母/分类节点和身高,体重和鞋的大小将由我的孩子节点。
这就是我困惑的地方。不同的分类节点只跟踪它们各自属性的概率分布,但我需要条件概率来执行分类。
我有一个老版本的朴素贝尔斯,我写的
public void naiveBayes(Data data,List<String> targetClass, BayesOption bayesOption,boolean headers){
//intialize variables
int numOfClasses = data.getNumOfKeys();//.getHeaders().size();
String[] keyNames = data.getKeys();// data.getHeaders().toArray();
double conditionalProb = 1.0;
double prob = 1.0;
String[] rClass;
String priorName;
iFrequency.clear();
rFrequency.clear();
if(bayesOption.compareTo(BayesOption.TRAIN) == 0){
this.setInterestedFrequency(targetClass);
this.targetClassKeys = Util.convertToStringArray(iFrequency.getKeys());
for(int i=0;i<this.targetClassKeys.length;i++){
priors.put(this.targetClassKeys[i],iFrequency.getPct(this.targetClassKeys[i]));
}
}
//for each classification in the target class
for(int i=0;i<this.targetClassKeys.length;i++){
//get all of the different classes for that variable
for(int j=0;j<numOfClasses;j++){
String reducingKey = Util.convertToString(keyNames[j]);
List<String> reducingClass = data.dataColumn(reducingKey,DataOption.GET,true);// new ArrayList(data.getData().get(reducingKey));
this.setReducingFrequency(reducingClass);
Object[] reducingClassKeys = rFrequency.getKeys();
rClass = Util.convertToStringArray(reducingClassKeys);
for(int k=0;k<reducingClassKeys.length;k++){
if(bayesOption.compareTo(BayesOption.TRAIN) == 0){
conditionalProb = conditionalProbability(targetClass, reducingClass, this.targetClassKeys[i], rClass[k]);
priorName = this.targetClassKeys[i]+"|"+rClass[k];
priors.put(priorName,conditionalProb);
}
if(bayesOption.compareTo(BayesOption.PREDICT) == 0){
priorName = this.targetClassKeys[i]+"|"+rClass[k];
prob = prob * priors.get(priorName);
}
}
rFrequency.clear();
}
if(BayesOption.PREDICT.compareTo(bayesOption) == 0){
prob = prob * priors.get(this.targetClassKeys[i]);
Pair<String,Double> pred = new Pair<String, Double>(this.targetClassKeys[i],prob);
this.predictions.add(pred);
}
}
this.iFrequency.clear();
this.rFrequency.clear();
}所以我通常理解数学是如何工作的,但我不太确定我应该如何让这些东西与这个特定的体系结构一起工作。
如何计算条件概率?谁能给我解释一下这个差异吗?
发布于 2015-11-28 16:04:50
在看了更多的论文后,我意识到我误解了图表的工作原理。图应该包含基于父母(S)的条件概率。
这消除了我以前的疑虑。
有关更多信息,请参见这图书章节。
发布于 2015-11-28 19:43:43
我认为,如果您对使用Java(这不是一种用于机器学习的引渡语言)不感兴趣,您可以在R和Python:https://pymc-devs.github.io/pymc/ http://www.bayespy.org/index.html http://www.bnlearn.com/examples/中找到很多参考资料。
https://datascience.stackexchange.com/questions/9025
复制相似问题