首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在bash的最终结果之前,逐行查找和执行

在bash的最终结果之前,逐行查找和执行
EN

Stack Overflow用户
提问于 2022-09-06 07:46:48
回答 3查看 70关注 0票数 1

我在努力摆脱旧的资产版本。文件命名严格如下:

<timestamp>_<constant><version><assetID>.zip<.extra>

例如,202201012359_FOOBAR0101234567.zip.done。

<timestamp>是将文件添加到文件夹的日期。

<constant>在正在处理的文件夹中不会更改。

<version>是从00开始的两位数字,它用<assetID>描述资产的版本。

<extra>是可选的,因此扩展可以是.zip、.zip.done或.zip.somethingelse。

但是,资产可能具有所有三个不同的扩展,并且可以使用不同的时间戳多次存在。这意味着资产可能有多个具有相同ID和版本号的附加文件,但时间戳不同。

目标是找到每一个具有相同ID的资产的最新版本,并删除旧版本。重要的是版本号,而不是时间戳。

所有资产都位于一个文件夹中,没有子文件夹。

电流溶液

到目前为止,实现这一目标的方式如下:

代码语言:javascript
复制
#!/bin/bash
location="/home/user/FOOBAR"
echo "Deleting older files..."

# Declare variable to print the outcome of removed asset ID's
declare -A assetsRemoved

# The main loop which finds all the files in the folder
find $location -maxdepth 1 -type f -name "*.zip*" -a -name "*FOOBAR*" | while read line; do
    # <timestamp>_FOOBAR<iterator><assetId><file-extensions>
    # 20201229104919_FOOBAR0300040682.zip.done

    # Separate assetId
    rest=${line#*'.zip'}
    # .done
    pos=$(( ${#line} - ${#rest} - 4 ))
    # 20201229104919_FOOBAR0300040682<^>.zip.done
    assetId=${line:pos-8:8}
    # 20201229104919_FOOBAR03<00040682>.zip.done

    # Find all files with same assetId
    assets="$(find ~+ $location -maxdepth 1 -type f -name "*$assetId.zip*" -a -name "*FOOBAR*")"
    # Init loop variables
    max=-1
    mostRecent=""
    cleanedOld=0

    # Loop all files with same assetId
    for file in $assets
    do
        # Separate basename without extension
        basenameNoExt="${file%%.*}"
        # <20201229104919_FOOBAR00300040682>.zip.done
        # Separate iterator, 2 numbers
        iter=${basenameNoExt:${#basenameNoExt}-10:2}
        # 20201229104919_FOOBAR0<03>00040682.zip.done
        if [[ $iter -gt $max ]]
        then
            max=$iter
            if [[ -n $mostRecent ]]
            then
                rm $mostRecent*
                cleanedOld=1
            fi
            mostRecent=$basenameNoExt
        elif [[ $iter -lt $max ]]
        then
            [ -f $file ] && rm $basenameNoExt* 
            cleanedOld=1
        fi
        # $iter == $max -> same asset with different file extension, leave to be
    done

    if [[ $max -gt 0  && cleanedOld -gt 0 ]] 
    then
        assetsRemoved[$assetId]=$max
    fi

done

for a in "${!assetsRemoved[@]}"; do
    echo "Cleaned asset $a from versions lower than ${assetsRemoved[$a]}"
done

问题所在

这个解决方案有一个严重的问题:它是缓慢的。由于它首先查找所有文件,获取一个文件并在删除旧版本的同时计算出最大版本,所以最外层的find-循环中的下一个迭代尝试对可能已经处理或删除的资产执行查找-remove-命令。

问题是

是否有一种方法在find 的每个结果被收集之前执行的命令?或者还有其他更有效的方法来循环结果呢?有超过100 k的文件需要处理,我假设通配符rm在搜索要删除的相关文件时会循环它们。这需要对文件进行100.000^2次以上的迭代。有什么办法可以防止这种情况发生吗?

示例

考虑一个包含以下文件的文件夹:

代码语言:javascript
复制
20191229104919_FOOBAR0001234567.zip
20191229104919_FOOBAR0001234567.zip.done
20191229104919_FOOBAR0001234567.zip.somethingelse
20191229104919_FOOBAR0087654321.zip
20191129104919_FOOBAR0087654321.zip.done
20191129104919_FOOBAR0087654321.zip.somethingelse
20191129100000_FOOBAR0187654321.zip
20191229100000_FOOBAR0187654321.zip.done
20191229100000_FOOBAR0187654321.zip.somethingelse
20201229104919_FOOBAR0101234567.zip
20201229104919_FOOBAR0101234567.zip.done
20201229104919_FOOBAR0101234567.zip.somethingelse
20211229104919_FOOBAR0201234567.zip
20211229104919_FOOBAR0201234567.zip.done
20211229104919_FOOBAR0201234567.zip.somethingelse
20221229104919_FOOBAR0201234567.zip
20221229104919_FOOBAR0201234567.zip.done
20221229104919_FOOBAR0201234567.zip.somethingelse

清理后剩下的文件如下:

代码语言:javascript
复制
20191129100000_FOOBAR0187654321.zip
20191229100000_FOOBAR0187654321.zip.done
20191229100000_FOOBAR0187654321.zip.somethingelse
20211229104919_FOOBAR0201234567.zip
20211229104919_FOOBAR0201234567.zip.done
20211229104919_FOOBAR0201234567.zip.somethingelse
20221229104919_FOOBAR0201234567.zip
20221229104919_FOOBAR0201234567.zip.done
20221229104919_FOOBAR0201234567.zip.somethingelse

注意:

最新的版本才是最重要的。必须保留具有不同时间戳和扩展的相同资产版本和ID。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-09-09 15:22:38

感谢@ogus的工作和清洁解决方案!

为了文档起见,我将添加最后使用的解决方案,并澄清xargs在本例中的使用。

代码语言:javascript
复制
#!/bin/bash

# Takes optional argument to delete found assets while running.
removeFound=${1:-n}
location="/home/user/bashtest"

if [[ "$removeFound" =~ ^(y|Y|yes|Yes|YES)$ ]]
then
    echo "Deleting older assets from $location"
else
    echo "Searching old assets from $location"
fi


# Find all .zip and .zip.somethingelse -files, pipe lines to awk, save to variable
assetsToDelete=`printf '%s\n' $location/*.zip* | awk '{
    # <timestamp>FOOBAR<iterator><assetId><file-extensions>
    # 20191229104919_FOOBAR0387654321.zip.done

    # Extension position
    extPos = index($0, ".zip")
    # 20191229104919_FOOBAR0387654321<^>.zip.done
    
    # Separate asset ID
    assetId = substr($0, extPos - 8, 8)
    # 20191229104919_FOOBAR03<87654321>.zip.done
    
    # Separate iterator, 2 numbers
    assetVer = substr($0, extPos - 10, 2)
    # 20191229104919_FOOBAR<03>87654321.zip.done

    # List variables used below:
    # assetList -> [assetId][asset file(s)] -> keys: list of asset IDs encountered, values: one or more asset file paths, absolute, separated by ORS (newline)
    # maxAssetV -> [assetId][assetMaxVersion] -> keys: list of asset IDs encountered, values: maximum version of the corresponding asset encountered

    # Everything printed out with <print> is the output of the awk-command, thus to be deleted

    # Find if ID has not been recorded, or version is smaller than recorded
    if (!(assetId in assetList) || assetVer > maxAssetV[assetId]) {
        # Asset recorded, version is smaller, remove old asset by printing its path
        if (assetId in assetList)
            print assetList[assetId]

        # Record new or newer asset
        assetList[assetId] = $0
        maxAssetV[assetId] = assetVer
    }
    # Find if asset is the same version as current max version
    else if (assetVer == maxAssetV[assetId]) {
        # Record the asset by stacking it on the list, separated with ORS (newline)
        assetList[assetId] = assetList[assetId] ORS $0
    }
    # Asset recorded and with smaller version -> print thus delete
    else {
        print
    }
}' `

if [ -z "$assetsToDelete" ]; then
    echo "Zero older assets found in the $location"
else
    if [[ "$removeFound" =~ ^(y|Y|yes|Yes|YES)$ ]]
    then
        echo $assetsToDelete | awk -v OFS="\n" '{$1=$1}1' | xargs -n1 -I {} sh -c 'echo {}; rm {}'   
    else
        echo "Moving files to ./remove folder, delete manually from there."
        echo "To delete on the go, run script with parameter <yes>"

        echo $assetsToDelete | awk -v OFS="\n" '{$1=$1}1' | xargs -n1 -I {} sh -c 'echo {}; mv {} $(dirname {})/remove/'
    fi
fi

exit
票数 0
EN

Stack Overflow用户

发布于 2022-09-06 12:36:26

在包含这些文件的目录中运行,这将列出要删除的文件:

代码语言:javascript
复制
printf '%s\n' *.zip *.zip.* | awk '{
  i = index($0, ".zip")
  id = substr($0, i - 7, 8)
  ver = substr($0, i - 9, 2)

  if (!(id in keep) || ver > keep_ver[id]) {
    if (id in keep)
      print keep[id]

    keep[id] = $0
    keep_ver[id] = ver
  }
  else if (ver == keep_ver[id]) {
    keep[id] = keep[id] ORS $0
  }
  else {
    print
  }
}'

如果输出看起来不错,将其输送到xargs rm以实际删除它们。

票数 2
EN

Stack Overflow用户

发布于 2022-09-06 11:40:27

与双循环不同的是,2-pass方案如何:

代码语言:javascript
复制
#!/bin/bash

location="/home/user/FOOBAR"
declare -A latestver            # associates the latest version number with assetID

# pass 1: extract the latest version number for the assetID
for f in "$location"/*FOOBAR*.zip*; do
    tmp=${f%.zip*}              # remove the suffix ".zip*"
    ver=${tmp: -10:2}           # extract the version number
    id=${tmp: -8:8}             # extract the assetID
    (( 10#$ver > 10#${latestver[$id]} )) && latestver[$id]="$ver"
                                # update the latest version number assiciated with the assetID
done

# pass 2: if the associated version number of the assetID does not match, remove the file
for f in "$location"/*FOOBAR*.zip*; do
    tmp=${f%.zip*}              # remove the suffix ".zip*"
    id=${tmp: -8:8}             # extract the version number
    ver=${latestver[$id]}       # expected latest version number
    if [[ $f != *FOOBAR$ver$id.zip* ]]; then
                                # filename does not match, meaning the version number is older
        echo rm -- "$f"         # then remove the file
    fi
done

如果输出看起来不错,则删除"echo“。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73618466

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档