我正在尝试合并两个在多列中共享相同值的矩阵。
下面的矩阵说明了我的问题,并提供了一个MWE。但是,我的数据是非常长的size(500000, 4),因此我正在寻找一种有效的方法来合并它们。该数据由期权数据组成,其中c为看涨数据和p put数据,列为1:4:日期、罢工、到期、出价。最后,我想要列1:5的矩阵:日期,罢工,到期,招投标价格,卖出价。如MWE所示,数据的长度并不相同,但列1:3 (日期、罢工、到期)的每个组合只存在一次。
c = [7356011 300 7356081 1.15; 7356011 400 7356081 1.56; 7356011 500 7356081 1.79; 7356011 300 7356088 1.25; 7356011 400 7356088 1.67; 7356011 500 7356088 1.89; 7356011 600 7356088 1.92; 7356012 300 7356081 0.79; 7356012 400 7356081 0.99; 7356012 500 7356081 1.08; 7356012 300 7356088 0.81; 7356012 400 7356088 0.90; 7356012 500 7356088 1.07]
p = [7356011 300 7356081 1.35; 7356011 400 7356081 1.15; 7356011 500 7356081 1.03; 7356011 300 7356088 1.56; 7356011 400 7356088 1.15; 7356011 500 7356088 1.03; 7356012 300 7356081 1.25; 7356012 400 7356081 1.19; 7356012 500 7356081 1.02; 7356012 300 7356088 1.14; 7356012 400 7356088 0.98; 7356012 500 7356088 0.76; 7356012 600 7356088 0.56; 7356012 700 7356088 0.44]
我试图为每一列构建一个ID,方法是使用strcat和num2str,并获取'ID(1) = 73560113007356081‘,但是这需要很长时间才能获得大量的数据。我还试图使用unique和ismember找到解决方案,但在多列方面遇到了问题。
希望的产出是:
7356011 300 7356081 1.15 1.35 7356011 400 7356081 1.56 1.15 7356011 500 7356081 1.79 1.03 7356011 300 7356088 1.25 1.56 7356011 400 7356088 1.67 1.15 7356011 500 7356088 1.89 1.03 7356011 600 7356088 1.92 NaN 7356012 300 7356081 0.79 1.25 7356012 400 7356081 0.99 1.19 7356012 500 7356081 1.08 1.02 7356012 300 7356088 0.81 1.14 7356012 400 7356088 0.90 0.98 7356012 500 7356088 1.07 0.76 7356012 600 7356088 NaN 0.56 7356012 700 7356088 NaN 0.44
谢谢你的帮助
发布于 2015-06-16 15:49:01
您不需要使用循环,而是使用intersect。
[~,ic,ip] = intersect(c(:, 1:3),p(:, 1:3),'rows');
m = [c(ic, :), p(ip,end)];编辑:如果你想包括NaN,在那里它们不像上面的海报那样相交。
function m = merge(c, p, nc, np)
%check for input arg errors
if nargin == 3
np = nc;
elseif nargin ~= 4
disp('Please enter either 3 or 4 arguments')
m = {};
return
end
%make sure they are shaped the same
nc = reshape(nc, 1, []);
np = reshape(np, 1, []);
%And have the same number of elements
if numel(nc) ~= numel(np)
disp('Please ensure arguments 3 and 4 have the same number of elements')
m = {};
return
end
%The columns that aren't being compared
NotNC = find(~ismember(1:size(c,2), nc));
NotNP = find(~ismember(1:size(p,2), np));
%Find the matching rows
[matches,ic,ip] = intersect(c(:, nc),p(:, np),'rows');
%Put together matching rows with the other data not included in the match
m1 = [matches, c(ic, NotNC), p(ip, NotNP)];
%Find rows that did not matched
NotIC = find(~ismember(1:size(c,1), ic));
NotIP = find(~ismember(1:size(p,1), ip));
%Put together data not in the matched set
m2 = [c(NotIC, nc), c(NotIC, NotNC), nan(length(NotIC), size(NotNP,2))];
m3 = [p(NotIP, np), nan(length(NotIP), size(NotNC,2)), p(NotIP, NotNP)];
%merge all three lists
m = [m1; m2; m3];
end发布于 2015-06-16 11:25:14
好的,我不明白如果p总是大的,所以我会用一个if来写这两个解决方案。
if length(c) > length(p)
xx = length(c);
newm = [c NaN(xx, 1)];
row = ismember(c, p, 'rows');
newm(row, end) = p(row, end);
else
xx = length(p);
newm = [p(:,1:3) NaN(xx, 1) p(:, end)];
row = ismember(p(:,1:3), c(:,1:3), 'rows');
newm(row, 4) = c(row, end);
end更新:
此代码适用于当前示例。
[row_p, row_c] = ismember(p(:,1:3), c(:,1:3), 'rows');
newm = [];
for ii = 1:length(row_p)
if row_p(ii) == 1
newm = [newm; p(ii, 1:3) c(row_c(ii), end) p(ii, end)];
else
newm = [newm; p(ii, 1:3) NaN p(ii, end)];
end
end
[row_c, row_p] = ismember(c(:,1:3), p(:,1:3), 'rows');
for ii = 1:length(row_c)
if row_c(ii) == 1
newm = [newm; c(ii, 1:3) c(ii, end) p(row_p(ii), end)];
else
newm = [newm; c(ii, 1:3) c(ii, end) NaN];
end
end
newm = unique(newm, 'rows');https://stackoverflow.com/questions/30865657
复制相似问题