文章/答案/技术大牛

发布

社区首页 >问答首页 >删除连续、相同、重复的文件

问删除连续、相同、重复的文件
EN

Stack Overflow用户

提问于 2011-04-07 00:10:26

回答 2查看 146关注 0票数 0

我有一台运行Windows Server2003 R2企业版的服务器，每个目录包含50,000到250,000个1KB的文本文件。文件名是连续的(例如，MLLP000001.rcv、MLLP000002.rcv等)相同的文件将是连续的。一旦后续文件不同，我可以预期我不会收到另一个相同的文件。

我需要一个脚本来做下面的事情，但是我不知道从哪里开始。

for each file in the target directory index 'i'
{
  for each file in the target directory index 'j' = i+1
  {
    compare the hash values of files i and j

    if the hashes are identical
      delete file j
    if the hashes differ
      set i = j // to skip past the files that are now deleted
      break
  }
}

我尝试过DOS批处理脚本，但那真的很麻烦，我不能跳出内部循环，它会自己出错，因为外部循环在目录中有一个文件列表，但这个列表一直在变化。据我所知，VBScript没有哈希函数。

windows-scripting

language-agnostic

scripting

file

duplicate-removal

回答 2

Stack Overflow用户

发布于 2011-04-07 17:12:44

既然文件只有1KB大小，为什么不进行逐位比较并避免散列呢？

票数 1

Stack Overflow用户

发布于 2011-04-07 04:36:08

听起来你可以这样做：

Set Files to an array of files in a given directory.
Set PreviousHash to hash of the first file in the Files.

For each CurrentFile file after the first in Files,
    Set CurrentHash to hash of the CurrentFile.
    If CurrentHash is equal to PreviousHash, then delete CurrentFile.
    Else, set PreviousHash to CurrentHash.

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/5569611

复制

相似问题

问删除连续、相同、重复的文件
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问删除连续、相同、重复的文件EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问删除连续、相同、重复的文件
EN