最好使用System.linq
因此,在我的程序中,我有一个采购清单,其中包含一个产品清单。我正试图确定哪两种产品更常在一次销售中一起购买。例如:
产品-香蕉,苹果,泰迪熊,啤酒,尿布,电子游戏,汽车
Purchases1 -香蕉,泰迪熊,尿布,啤酒
Purchases2 -尿布,汽车,啤酒
Purchases3 -香蕉,电子游戏,汽车
最常买的是两种产品=尿布和啤酒。
有人知道最理想的方法吗?在实践中,我的采购词典中有大约240万个元素,在产品字典中有8013个独特的产品。
发布于 2022-05-23 23:25:49
我相信您需要创建一个Dictionary来跟踪您的购买对。也许您可以对购买列表进行排序,以便条目始终按字母顺序排列,然后您就可以对已有的列表进行迭代。
下面是我制作的一个示例项目,它可以满足您的要求,尽管产品只是char。它可能不是最有效的,但它至少会给你一个方法来完成你的要求。
class Program
{
class Purchase{
public List<char> Products = new List<char>();
}
static void Main(string[] args)
{
do
{
// here is our list of products
char[] products = new char[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };
// randomly populate a list of purchases
Random random = new Random();
List<Purchase> purchases = new List<Purchase>();
for (int i = 0; i < 10000; i++)
{
var currPurchase = new Purchase();
foreach (var p in products)
if (random.Next(0, 2) > 0)
currPurchase.Products.Add(p);
purchases.Add(currPurchase);
}
// for all of our products, we need to make a dictionary of all the possible pairs
Dictionary<(char, char), int> count = new Dictionary<(char, char), int>();
// now for each purchase, we go through and see all the pairs that happened, and update those dictionary entries
foreach (var purchase in purchases)
{
// get the pairs, update the pairs
for (int i = 0; i < purchase.Products.Count; i++)
{
for (int j = i + 1; j < purchase.Products.Count; j++)
{
if (count.ContainsKey((purchase.Products[i], purchase.Products[j])))
count[(purchase.Products[i], purchase.Products[j])]++;
else
count[(purchase.Products[i], purchase.Products[j])] = 1;
}
}
}
// then get the pair that had the highest frequency
var highest = count.Max(kvp => kvp.Value);
// and then get all the keys that occurred that many times (assuming we could have two pairs of equal frequency)
var mostFrequent = count.Where(kvp => kvp.Value == highest).Select(x => x.Key);
Console.WriteLine($"Most Frequent Pairs (Occurred {highest} times): ");
foreach (var pair in mostFrequent)
Console.Write(pair + "; ");
//type quit and hit enter to quit
} while (Console.ReadLine() != "quit");
}
}发布于 2022-05-24 00:12:55
下面是我使用LINQ的尝试:
var purchases = new List<List<string>>
{
new List<string>() { "banana", "teddy bear", "beer", "diaper" },
new List<string>() { "beer", "diaper", "car" },
new List<string>() { "banana", "video game", "car" }
};
var itemsPairsOccurences = purchases
// cross join every purchase with itself and produce items pairs, i.e.: correlate every item included in a purchase with every item from the same purchase
.SelectMany(purchase1 => purchase1.SelectMany(_ => purchase1, (item1, item2) => Tuple.Create(item1, item2)))
// filter out pairs containing the same items (e.g. banana-banana) and duplicate pairs (banana-car remains, car-banana is skipped)
.Where(itemsPair => string.CompareOrdinal(itemsPair.Item1, itemsPair.Item2) < 0)
// group all pairs from all purchases by unique pairs and count occurrences
.GroupBy(itemsPair => itemsPair, (itemsPair, allItemsPairs) => KeyValuePair.Create(itemsPair, allItemsPairs.Count()));
// choose a pair with the most occurrences
var mostFrequent1 = itemsPairsOccurences.Aggregate((itemsPair1, itemsPair2) => itemsPair1.Value > itemsPair2.Value ? itemsPair1 : itemsPair2);
// OR order pairs by occurencces and select the first one
var mostFrequent2 = itemsPairsOccurences.OrderByDescending(itemsPair => itemsPair.Value).FirstOrDefault();发布于 2022-05-24 15:03:56
在RAM中完成所有这一切可能会是相当资源密集型的IMHO。
虽然这可能是不成熟的优化,但我将使用以下逻辑进行此优化:
mention)
的产品。
这也适用于3,4等组合。产品,尽管你将需要切换到多头或风险溢出整数。
我不确定你能否在LINQ中轻松地做到这一点,这并不是我真正喜欢的一杯茶;在SQL中,除了计算素数之外,我想这是一件轻而易举的事情……(虽然您可能可以预先计算(或下载)到静态表中,特别是如果您计划更频繁地运行该表)。
PS:您可以将采购列表拆分到n个独立的线程上,并最终合并产生的字典;如果硬件支持它,这可能会使它更快一些。
https://stackoverflow.com/questions/72353772
复制相似问题