我目前正在调试一个shell脚本,它在数据管道中充当主脚本。为了运行管道,您需要向shell脚本提供一组参数。在那里,shell脚本依次调用6个不同的脚本,其中4个是R脚本,2个是Python脚本,将内容写到日志文件中,等等。基本上,我的想法是使用这个脚本自动化一个需要很长时间才能运行的数据管道。
现在,如果任何单独的R或Python脚本在shell脚本中中断,它只会跳转到它应该调用的下一个脚本。但是,运行script 03.py需要完全运行和处理输入到脚本01.R和02.R的数据,否则03将产生错误的输出数据,这些数据随后将被写出并在以后的脚本中进一步处理。
我想要做的是,1.如果R脚本中的任何一个脚本中有错误,则中断整个shell脚本2.输出一条消息,告诉我这个错误发生在哪里
下面是调用各个脚本的master.sh shell脚本示例。
#############
# STEP 2 : RUNNING SCRIPTS
#############
# A - 01.R
#################################################################
# log_file - this needs to be reassigned for every individual script
log_file=01.log
current_time=$(date)
echo "Current time: $current_time"
echo "Now running script 01. Log file output being written to $log_file_dir$log_file."
Rscript 01.R -f $input_file -s $sql_db > $log_file_dir$log_file
# current time/date
current_time=$(date)
echo "Current time: $current_time"
# B - 02.R
#################################################################
log_file=02.log
current_time=$(date)
echo "Current time: $current_time"
echo "Now running script 02. Log file output being written to $log_file_dir$log_file"
Rscript 02.R -f $input_file -s $sql_db > $log_file_dir$log_file
# PRINT OUT TIMINGS
current_time=$(date)
echo "Current time: $current_time"在整个master.sh脚本中重复此过程,直到脚本06.R,之后它将整理从输出文件和日志文件检索到的一些数据,并将它们打印到stout。
下面是我当前使用的master.sh输出的一些示例输出,它显示了即使01.R生成了一个错误,脚本仍然继续运行。
file: test-data/minisample.txt
There are a total of 101 elements in file.
Using the main database.
Writing log-files to this directory: log_files/minisample/.
Writing output-csv with classifications to output/minisample.csv.
Current time: Wed Nov 14 18:19:53 UTC 2018
Now running script 01. Log file output being written to log_files/minisample/01.log.
Loading required package: stringi
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: RMySQL
Loading required package: DBI
Loading required package: methods
Loading required package: hms
Error: The following 2 arguments need to be provided:
-f <input file>.csv
-s <MySQL db name>
Execution halted
Current time: Wed Nov 14 18:19:54 UTC 2018
./master.sh: line 95: -1: substring expression < 0
./master.sh: line 100: -1: substring expression < 0
./master.sh: line 104: -1: substring expression < 0
Total time taken to run script 01.R:
Average time taken per user to run script 01.R:
Total time taken to run pipeline so far [01/06]:
Average time taken per user to run pipeline so far [01/06]:
Current time: Wed Nov 14 18:19:54 UTC 2018
Now running script 02. Log file output being written to log_files/minisample/02.log看到R脚本01.R产生了一个错误,我想让脚本master.sh停止。但是怎么做呢?如有任何帮助,我们将不胜感激,提前感谢!
发布于 2018-11-15 05:19:04
正如另一位用户提到的,只需运行set -e就会使您的脚本在出现第一个错误时终止。但是,如果您想要更多的控制,您也可以使用${?}或简单的$?检查退出状态,假设您的程序在成功时给出退出代码0,否则为非零值。
#!/bin/bash
url=https://nosuchaddress1234.com/nosuchpage.html
error_file=errorFile.txt
wget ${url} 2> ${error_file}
exit_status=${?}
if [ ${exit_status} -ne 0 ]; then
echo -n "wget ${url} "
if [ ${exit_status} -eq 4 ]; then
echo "- Network failure."
elif [ ${exit_status} -eq 8 ]; then
echo "- Server issued an error response."
else
echo "- Other error"
fi
echo "See ${error_file} for more details"
exit ${exit_status};
fi发布于 2018-11-15 06:29:15
我喜欢把一些样板放在大多数脚本的顶部,像这样-
trap 'echo >&2 "ERROR in $0 at line $LINENO, Aborting"; exit $LINENO;' ERR
set -u 在调试时编写代码时,我通常会添加
set -x还有很多带有冒号的痕迹“注释”-
: this will parse its args but only show under set -x那么诀窍就是确保你所知道的任何错误都得到了处理。条件语句使用错误,所以这些是安全的。
if grep foo nonexistantfile
then : do the success stuff
else : if you *want* a failout here, just call false
false here will abort # args don't matter :)
fi同样,如果您只想捕获并忽略一个已知的可能错误-
ls $mightNotExist ||: # || says "do on fail"; : is an alias for "true"只要检查你可能的错误就行了。那么唯一会使你的脚本崩溃的事情就是失败。
https://stackoverflow.com/questions/53307140
复制相似问题