文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在Matlab中对搜索函数进行矢量化？

问如何在Matlab中对搜索函数进行矢量化？
EN

Stack Overflow用户

提问于 2015-04-07 14:48:27

回答 3查看 124关注 0票数 1

下面是一个Matlab编码问题(一个与intersect here略有不同的版本

一个等级矩阵A有3个cols，第一个是可能重复的用户is，第二个是可能重复的项目‘is，第三个是从用户到项目的等级，范围从1到5。

现在，我有一个用户ID( smallUserIDList )子集和一个项ID( item ID)的子集( smallItemIDList，)，然后我希望在A中找到在smallUserIDList，中被用户分级的行，并收集用户分级的项，并进行一些计算，比如使用smallItemIDList进行setdiff并计算结果，如下代码所示：

userStat = zeros(length(smallUserIDList), 1);
for i = 1:length(smallUserIDList)
    A2= A(A(:,1) == smallUserIDList(i), :);
    itemIDList_each = unique(A2(:,2));

    setDiff = setdiff(itemIDList_each , smallItemIDList);
    userStat(i) = length(setDiff);
end
userStat

最后，我发现配置文件查看器显示上面的循环效率很低，问题是如何用矢量化的方法来改进这段代码，但却需要for循环的帮助？

例如，：

输入

A = [
1 11 1
2 22 2
2 66 4
4 44 5
6 66 5
7 11 5
7 77 5
8 11 2
8 22 3
8 44 3
8 66 4
8 77 5    
]

smallUserIDList = [1 2 7 8]
smallItemIDList = [11 22 33 55 77]

输出

userStat =

 0
 1
 0
 2

matlab

optimization

matrix

vectorization

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-04-07 19:02:34

香草MATLAB：

据我所知，您的代码相当于：

%// Create matrix such that: user_item_rating(user,item)==rating
user_item_rating = sparse(A(:,1),A(:,2),A(:,3));

%// Keep all BUT the items in smallItemIDList
user_item_rating(:,smallItemIDList) = [];

%// Keep only those users in `smallUserIDList` and use order of this list
user_item_rating = user_item_rating(smallUserIDList,:);

%// Count the number of ratings
userStat = sum(user_item_rating~=0, 2);

如果每个(user,item)-combination至多有一个评级，这将是可行的。而且，它应该是相当有效的。

清洁的方法，而不重新发明车轮：

查看统计工具箱中的grpstats！实现可能与此类似：

%// Create ratings table
ratings = array2table(A, 'VariableNames', {'user','item','rating'});

%// Remove items we don't care about (smallItemIDList)
ratings = ratings(~ismember(ratings.item, smallItemIDList),:);

%// Keep only users we care about (smallUserIDList) 
ratings = ratings(ismember(ratings.user, smallUserIDList),:);

%// Compute the statistics grouped by 'user'. 
userStat = grpstats(ratings, 'user');

票数 3

Stack Overflow用户

发布于 2015-04-07 15:47:49

这可能是一种vectorized方法-

%// Take care of equality between first column of A and smallUserIDList to 
%// find the matching row and column indices.
%// NOTE: This corresponds to "A(:,1) == smallUserIDList(i)" from OP.
[R,C] = find(bsxfun(@eq,A(:,1),smallUserIDList.')); %//'

%// Take care of non-equality between second column of A and smallItemIDList. 
%// NOTE: This corresponds to SETDIFF in the original loopy code from OP.
mask1 = ~ismember(A(R,2),smallItemIDList);

AR2 = A(R,2); %// Elements from 2nd col of A that has matches from first step

%// Get only those elements from C and AR2 that has ONES in mask1
C1 = C(mask1);
AR2 = AR2(mask1);

%// Initialized output array
userStat = zeros(numel(smallUserIDList),1);

if ~isempty(C1)%//There is at least one element in C, so do further processing
    
    %// Find the count of duplicate elements for each ID in C1 indexed into AR2.
    %// NOTE: This corresponds to "unique(A2(:,2))" from OP.
    dup_counts = accumarray(C1,AR2,[],@(x) numel(x)-numel(unique(x)));
    
    %// Get the count of matches for each ID in C in the mask1.
    %// NOTE: This corresponds to:
    %//       "length(setdiff(itemIDList_each , smallItemIDList))" from OP.
    accums = accumarray(C,mask1);
    
    %// Store the counts in output array and also subtract the dup counts
    userStat(1:numel(accums)) = accums;
    userStat(1:numel(dup_counts)) = userStat(1:numel(dup_counts)) - dup_counts;
end

基准测试

下面列出的代码将建议的方法的运行时与原始的模糊代码进行比较-

%// Size parameters and random inputs with them
A_nrows    = 5000;
IDlist_len = 5000;
max_userID = 1000;
max_itemID = 1000;
A = [randi(max_userID,A_nrows,1) randi(max_itemID,A_nrows,1) randi(5,A_nrows,2)];
smallUserIDList = randi(max_userID,IDlist_len,1);
smallItemIDList = randi(max_itemID,IDlist_len,1);

disp('---------------------------- With Original Approach')
tic
%//   Original posted code
toc

disp('---------------------------- With Proposed Approach'))
tic
%//   Proposed approach code
toc

通过三组数据获得的运行时如下：

案例1：

A_nrows    = 500;
IDlist_len = 500;
max_userID = 100;
max_itemID = 100;
---------------------------- With Original Approach
Elapsed time is 0.136630 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.004163 seconds.

案例2：

A_nrows    = 5000;
IDlist_len = 5000;
max_userID = 100;
max_itemID = 100;
---------------------------- With Original Approach
Elapsed time is 1.579468 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.050498 seconds.

案例#3:

A_nrows    = 5000;
IDlist_len = 5000;
max_userID = 1000;
max_itemID = 1000;
---------------------------- With Original Approach
Elapsed time is 1.252294 seconds.
---------------------------- With Proposed Approach
Elapsed time is 0.044198 seconds.

结论：与建议的方法相比原来的混乱代码的加速，因此似乎是巨大的！！

票数 2

Stack Overflow用户

发布于 2015-04-07 15:09:26

我认为您正在尝试为某个用户子集删除一组固定的评等，并计算剩余的评等数：

做了以下工作：

Asub = A(ismember(A(:,1), smallUserIDList),1:2);
Bremove = allcomb(smallUserIDList, smallItemIDList);
Akeep = setdiff(Asub, Bremove, 'rows');
T = varfun(@sum, array2table(Akeep), 'InputVariables', 'Akeep2', 'GroupingVariables', 'Akeep1');
% userStat = T.GroupCount;

您需要从matlab中央的文件交换的所有梳函数，它给出了两个向量的笛卡儿乘积，而且无论如何都很容易实现。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/29494489

复制

相似问题

问如何在Matlab中对搜索函数进行矢量化？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Matlab中对搜索函数进行矢量化？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Matlab中对搜索函数进行矢量化？
EN