文章/答案/技术大牛

发布

社区首页 >问答首页 >加快shell脚本/增强shell脚本的性能

问加快shell脚本/增强shell脚本的性能
EN

Stack Overflow用户

提问于 2021-09-16 12:22:49

回答 2查看 362关注 0票数 0

有办法加快下面的shell脚本吗？我每天需要40分钟才能更新大约150000份文件。当然，考虑到要创建和更新的文件数量，这可能是可以接受的。我不否认这一点。但是，如果有一种更有效的方法来写这个或者完全重写逻辑，我会对它敞开心扉。拜托，我在找人帮忙

    #!/bin/bash
    
    DATA_FILE_SOURCE="<path_to_source_data/${1}"
    DATA_FILE_DEST="<path_to_dest>"
    
    for fname in $(ls -1 "${DATA_FILE_SOURCE}")
    do
        for line in $(cat "${DATA_FILE_SOURCE}"/"${fname}")
        do
            FILE_TO_WRITE_TO=$(echo "${line}" | awk -F',' '{print $1"."$2".daily.csv"}')
            CONTENT_TO_WRITE=$(echo "${line}" | cut -d, -f3-)
            if [[ ! -f "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}" ]]
            then
                echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
            else
                if ! grep -Fxq "${CONTENT_TO_WRITE}" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
                then
                  sed -i "/${1}/d" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
"${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
                    echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
                fi
            fi
        done
    done

bash

shell

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-09-16 14:03:59

您发布的脚本中仍然有一些不清楚的部分，比如sed命令。虽然我用更明智的做法重写了它，更少的是外部调用，但女巫确实应该加快它的速度。

#!/usr/bin/env sh

DATA_FILE_SOURCE="<path_to_source_data/$1"
DATA_FILE_DEST="<path_to_dest>"

for fname in "$DATA_FILE_SOURCE/"*; do
  while IFS=, read -r a b content || [ "$a" ]; do
    destfile="$DATA_FILE_DEST/$a.$b.daily.csv"
    if grep -Fxq "$content" "$destfile"; then
        sed -i "/$1/d" "$destfile"
    fi
    printf '%s\n' "$content" >>"$destfile"
  done < "$fname"
done

票数 4

Stack Overflow用户

发布于 2021-09-16 13:59:31

使其并行(尽可能多)。

！/bin/bash设置-e -o管道故障声明-ir MAX_PARALLELISM=20 #选择限制声明-i pid声明-a pids #.对于“${DATA_FILE_SOURCE}/”*中的fname，请执行if ((${pids@} >= MAX_PARALLELISM))；然后等待-p pid -n回显${pidspid}失败，${?}“1>&2 unset 'pidspid‘fi，而IFS=读取-r行；do FILE_TO_WRITE_TO=”.完成< "${fname}“&#在这里分叉pids$!="${fname}”中为pid执行的“${fname}”；请等待-n“$(Pid)”“{pidspid} "${pidspid} ${pidspid}失败，${pidspid}”1>&2完成“。

下面是一个直接运行的框架，展示了上面的工具是如何工作的(最多需要处理36个项，最多有20个并行进程)：

#!/bin/bash set -e -o pipefail -ir MAX_PARALLELISM=20 #选取一个限制声明-i pid声明-a pids do_something_and_maybe_fail() {睡眠$(随机%10)返回some_name_{a.f}{0.5}.txt中的fname $(随机%2*5)}；如果(${pids@} >= MAX_PARALLELISM)执行# 36项；然后等待-p pid‘pidspid}${pidspid}失败，${?}“1>&2 unset 'pidspid’fi do_something_and_maybe_fail &# forking”pids$!="${fname}“echo "${#pids@}运行”1>&2 for pid，在“${！pids@}”中运行“1>&2 for pid”；在“${！pids@}”中，等待-n“$(Pid)”“{pidspid} "${pidspid}与${?}”1>&2 do “一起失败。

在为每一行处理一行时严格避免外部进程(如awk、grep和cut)。fork()ing与

相比效率极低。

- Running one single `awk` / `grep` / `cut` process on an entire input file (to preprocess all lines at once for easier processing in `bash`) and feeding the whole output into (e.g.) a `bash` loop.
- Using Bash expansions instead, where feasible, e.g. `"${line/,/.}"` and other tricks from the `EXPANSION` section of the `man bash` page, without `fork()`ing any further processes.

非主题旁注：

- `ls -1` is unnecessary. First, `ls` won’t write multiple columns unless the output is a terminal, so a plain `ls` would do. Second, `bash` expansions are usually a cleaner and more efficient choice. (You can use `nullglob` to correctly handle empty directories / “no match” cases.)

- Looping over the output from `cat` is a (less common) [useless use of `cat`](https://blog.sanctum.geek.nz/useless-use-of-cat/) case. Feed the file into a loop in `bash` instead and read it line by line. (This also gives you more line format flexibility.)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69208364

复制

相似问题

问加快shell脚本/增强shell脚本的性能
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问加快shell脚本/增强shell脚本的性能EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问加快shell脚本/增强shell脚本的性能
EN