文章/答案/技术大牛

发布

社区首页 >问答首页 >如何基于表格扫描每个单词的值，然后计算它，并从中生成向量空间模型( VSM )

问如何基于表格扫描每个单词的值，然后计算它，并从中生成向量空间模型( VSM )
EN

Stack Overflow用户

提问于 2016-06-20 13:53:22

回答 2查看 80关注 0票数 0

假设我有一个表，其中包含来自另一个表的每个单词的概率。这个表有两个类:classes和non_actual。我把它命名为master_table

 actual = [0.5;0.4;0.6;0.75;0.23;0.96;0.532]; %sum of the probabilities is 1.     
actual + non_actual = 1
non_actual = [0.5;0.6:0.4;0.25;0.77;0.04;0.468];
words = {'finn';'jake';'iceking';'marceline';'shelby';'bmo';'naptr'};
master_table = table(actual,non_actual,...
'RowNames',words)

然后我有一张包含句子的桌子。我把它命名为T2

sentence = {'finn marceline naptr';'jake finn simon marceline haha';'jake finn finn jake iceking';'bmo shelby shelby finn naptr';'naptr naptr jake finn bmo shelby'}
T2 = table('RowNames',sentence)

如何做到这一点( master_table中不属于"simon“、"haha”的单词值为1，因此不会影响确定类的概率的计算)：

                                    actual %determines the value based on probabilities from each words%        non_actual               class
finn marceline naptr                0.5 * 0.75 * 0.532                                                         0.5 * 0.25 * 0.468        compares the value from each class. if actual > non_actual then the class should be "actual"
jake finn simon marceline haha      0.4 * 0.5 * 1 * 0.25 * 1                                                   0.6 * 0.5 * 1 * 0.75 * 1
jake finn finn jake iceking
bmo shelby shelby finn naptr
naptr naptr jake finn bmo shelby

以及如何从上述问题出发建立VSM (向量空间模型)：

                                                                        WORDS                                   
                                    | bmo | finn | jake | iceking | haha | marceline | naptr | shelby | simon |     %words sorted alphabetically      
finn marceline naptr                   0     1       0        0       0        1         1       0       0      
jake finn simon marceline haha         0     1       1        0       1        1         0       0       1
jake finn finn jake iceking            0     2       2        1       0        0         0       0       0
bmo shelby shelby finn naptr           1     1       0        0       0        0         1       1       0      
naptr naptr jake finn bmo shelby       1     1       1        0       0        0         1       1       0

matlab

text-mining

回答 2

Stack Overflow用户

发布于 2016-06-20 14:31:09

作为一个快速解决方案，下面的代码应该可以解决您的问题：

% Split the sentence into single strings
s = strsplit(sentence{1});

% loop over all single strings
for i=1:length(s)
    % search for each string pattern in the words-cell
    c = strfind(words,s{i});
    % get a logical vector for getting the index of the found pattern in
    % the words-cell
    ix=cellfun('isempty', c);
    ind = find(ix == 0);

    if actual(ind) > non_actual(ind)
        % do something with actual...
    end;
end;

您应该阅读代码中使用的每个函数的帮助章节：strsplit strfind cellfun，以获得关于它们如何工作的更多信息。

票数 0

Stack Overflow用户

发布于 2016-06-20 16:52:22

这也有点疯狂，但我觉得性能不是问题。我首先创建一个更大的表，然后在循环中更改值：

T2 = table(ones(height(T2),1),ones(height(T2),1),repmat({''},height(T2),1),'RowNames',sentence,'VariableNames',{'actual' 'non_actual' 'outcome'});

for i=1:height(T2)
    % split the row name
    A=strsplit([T2.Properties.RowNames{i,:}]);
    actual=1; %which is neutral for multiplication
    non_actual=1; 
    for j=1:length(A)
       actual = actual *  master_table{A(j),1};
       non_actual = non_actual *  master_table{A(j),2};
    end
    %if you need those
    T2.actual(i)=actual;
    T2.non_actual(i)=non_actual;

    if actual > non_actual
        T2.outcome(i)={'actual'};
    else
        T2.outcome(i)={'non_actual'};
    end;
end;

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/37924343

复制

相似问题

问如何基于表格扫描每个单词的值，然后计算它，并从中生成向量空间模型( VSM )
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何基于表格扫描每个单词的值，然后计算它，并从中生成向量空间模型( VSM )EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何基于表格扫描每个单词的值，然后计算它，并从中生成向量空间模型( VSM )
EN