文章/答案/技术大牛

发布

社区首页 >问答首页 >如何加快在bash脚本中读取txt文件的速度

问如何加快在bash脚本中读取txt文件的速度
EN

Stack Overflow用户

提问于 2015-08-06 14:45:29

回答 1查看 590关注 0票数 1

我正在编写一个脚本，读取每天24小时的温度数据，为较小的域提取纬度-经度区域。每个数据文件中有三列温度、经度、纬度和188426行。

> ==> 20120810234500.txt <==
> 0.0362,-12.5000,33.5000
> -0.0188,-12.5000,33.5400
> -0.0732,-12.5000,33.5800
> -0.1263,-12.5000,33.6200
> -0.1778,-12.5000,33.6600
> -0.2278,-12.5000,33.7000
> -0.2761,-12.5000,33.7400
> -0.3226,-12.5000,33.7800
> -0.3677,-12.5000,33.8200
> -0.4115,-12.5000,33.8600

我使用了for、while循环和awk命令来读取数据，但是读取、提取和抓取新的较小的文件需要太长时间(至少对我来说是这样)。在这里您可以看到脚本的相关部分

 # Start 24 hours loop
   lom1=-3
   lom2=3
   lam1=35
   lam2=42

   nhoras=24
   n=1
   while [ $n -le $nhoras ]
   do

    # File name (nom_file) and length (nstation=188426)
     nom_file=`awk -v i=$n 'BEGIN { FS = ","} NR==i { print $1 }' lista_datos.txt`
     nstation=`awk 'END{print NR}' $nom_file`

    # Original data came from windows system and has carriage returns
     dos2unix -q $nom_file

     # Date, time values from file name
     year=`echo $nom_file | cut -c 1-4`
     month=`echo $nom_file | cut -c 5-6`
     day=`echo $nom_file | cut -c 7-8`
     hour=`echo $nom_file | cut -c 9-14`

     # Part of the string to write in the new smaller file
     var1=`echo $nom_file | awk '{print substr($0,1,4) " " substr($0,5,2) " " substr($0,7,2) " " substr($0,9,6)}'`

     # Read rows 65000 to 125000 to gain processing time
     m=65000
     #while [ $m -le $nstation ]  # Bucle extración datos
     while [ $m -le 125000 ]  # Bucle extración datos
     do

        station_id=$m
        elevation=1.5   
        lat=`awk -v i=$m 'BEGIN { FS = ","} NR==i { print $3 }' $nom_file`
        lon=`awk -v i=$m 'BEGIN { FS = ","} NR==i { print $2 }' $nom_file`

    # As lon/lat are floating point I use this workaround to get a smaller region
    lom1=`echo $lon'>'$lon1 | bc -l`
    lom2=`echo $lon'<'$lon2 | bc -l`
    lam1=`echo $lat'>'$lat1 | bc -l`
    lam2=`echo $lat'<'$lat2 | bc -l`

    if [ $lom1 -eq 1 ] && [ $lom2 -eq 1 ];
    then
      if [ $lam1 -eq 1 ] && [ $lam2 -eq 1 ];
      then

       # Second part of the string to write in the new smaller file
         var2=`awk -v i=$m -v e=$elevation 'BEGIN { FS = ","} NR==i { print "'${station_id}' " $3 " " $2 " '${elevation}' 000 " $1 " 000" }' $nom_file`

       # Paste
         paste <(echo "$var1") <(echo "$var2") -d ' ' >> out.txt

       fi # final condición lat
     fi # final condición lon

        m=$(( $m + 1 ))

     done # End of extracting loop

     # Save results
     cat cabecera-dp-s.txt out.txt > dp-s$year-$month-$day-$hour

     rm out.txt
     n=$(( $n + 1 ))

   done  # End 24 hours loop

到目前为止，处理一个输入文件需要两个小时。是否有任何选择来加快这一进程？

提前感谢

bash

scripting

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-08-07 13:26:48

感谢所有的评论，特别是感谢@fedorqui

随着awk的正确使用，处理速度急剧提高。我第一次尝试在2小时内处理一个文件，现在在93分钟内处理了24个文件。应该有改进的余地，但现在对我来说还可以。再次感谢。

我附上了脚本，也许这对某人有用

#!/bin/bash

# RUTAS
base=/home/meteo/PROJECTES/TERMED
dades=$base/DADES
files=$base/FILES
msg_data=$dades/MSG/Agosto
treball=$base/TREBALL

# INICIO DEL SCRIPT

cd $treball

rm * 

# Header for final output
cp $files/cabecera-dp-s.txt ./


# Inicio bucle dia
for dia         

in 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

do

   cp $msg_data/$dia/* ./

   ls 2*.txt > lista_datos.txt

   awk '{print substr($0,9,6)}' lista_datos.txt > lista_horas.txt

   nhoras=`awk 'END{print NR}' lista_horas.txt`

   # Inicio bucle hora
   n=1
   while [ $n -le $nhoras ]
   do

    # File name and size
     nom_file=`awk -v i=$n 'BEGIN { FS = ","} NR==i { print $1 }' lista_datos.txt`
     nstation=`awk 'END{print NR}' $nom_file`

    # avoid carriage returns
     dos2unix -q $nom_file

     # Date values
     year=`echo $nom_file | cut -c 1-4`
     month=`echo $nom_file | cut -c 5-6`
     day=`echo $nom_file | cut -c 7-8`
     hour=`echo $nom_file | cut -c 9-14`

     # Extract region, thanks fedorqui       
     awk -F, '$2>=-3 && $2<=3 && $3>=35 && $3<=42' $nom_file > output-$year$month$day$hour.txt

     # Parte 1 de la línea de datos RAMS
     var1=`echo $nom_file | awk '{print substr($0,1,4) " " substr($0,5,2) " " substr($0,7,2) " " substr($0,9,6)}'`

     # station_id, latitud, longitud, elevación y temperatura para cada punto
     m=1
     nstation=`awk 'END{print NR}' output-$year$month$day$hour.txt`

     while [ $m -le $nstation ]  # Bucle extración datos
     do
        station_id=$m
        elevation=1.5   

       # Parte 2 de la línea de datos RAMS
         var2=`awk -v i=$m -v e=$elevation 'BEGIN { FS = ","} NR==i { print "'${station_id}' " $3 " " $2 " '${elevation}' 000 " $1 " 000" }' output-$year$month$day$hour.txt`

       # Pegamos las dos partes para construir la línea de datos 
        paste <(echo "$var1") <(echo "$var2") -d ' ' >> out.txt

        m=$(( $m + 1 ))

     done # Final bucle extracción datos

     # Guardamos la salida con el formato y nombre RAMS
     cat cabecera-dp-s.txt out.txt > dp-s$year-$month-$day-$hour

     n=$(( $n + 1 ))

     rm out.txt
   done  # Final bucle horas

   # Borra datos para evitar conflicto con lista_horas, lista_datos
   rm *txt

done            # Final bucle dia

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31858859

复制

相似问题

问如何加快在bash脚本中读取txt文件的速度
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何加快在bash脚本中读取txt文件的速度EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何加快在bash脚本中读取txt文件的速度
EN