这是第2版,由于时间分配了一些改进,从第1版的建议。
Kubuntu22.04.1这个脚本中的问题1,如何将终端scroll-back从1000行设置为999000行?
问题2:您希望看到这个脚本的added吗?添加到脚本count_words.sh中以分析单词
用于计数file.txt中出现的单词的脚本。
file.txt的202号线状态定位。关于210行,说明你的关键词。
脚本显示man和person的比例,以及在政府文件、法律文件和圣经中的其他词语。将数据放入file.txt。运行count_words.sh
假设person是一个诡计的词。
看看布莱克的字典和圣经。
粘贴到file.txt中的示例文本
考虑使用“民事诉讼法律规则”(RoCP)进行第一次测试( https://www.ontario.ca/laws/regulation/900194/v87 )
样本RoCP输出:
fuzzy match *man*
58 manner
40 management
15 mandatory
12 claimant
6 claimants
6 performance
4 demand
3 claimants,
3 (demandeur)
3 (mandatory
3 manner,
2 claimant.
1 claimant,
1 claimants.
1 claimants:
1 demand)
1 demand,
1 demands
1 management,
1 manager
1 manager.
1 managerial
1 managing
1 (mandat
1 manner.
1 manner;
1 many
1 many,
1 non-performance
1 nonperformance
1 performance.
1 performance;
32 versions, fuzzy match *man*
fuzzy match *person*
668 person
122 persons
108 personal
40 person,
33 personally
13 person.
11 person;
4 (person
3 in-person
3 persons,
3 persons.
2 personally,
2 persons:
1 (personne)
1 persons;
15 versions, fuzzy match *person*
Navigation hints: Ctrl-Shift-Home = Top Ctrl-Shift-UParrow Ctrl-Shift-F = Find
Source content. : Government of Ontario SEARCH SEARCH LAWS Search contact us français Topics + ...
Compare keywords, exact match, not a fuzzy match :
_______ spirit
69 being
_______ soul
_______ woman
_______ man
668 person脚本
#!/bin/sh
# time bash /home/x/Music/count_words.sh # copy and paste into Terminal
# Script to count word occurrences in file.txt
# script compares keywords
# example 6 keywords below were:
# spirit being soul woman man person
# Why use this script?
# Use this script to show the ratio, proportion, among keywords.
# Why use this script? example 1
# Use script to show the ratio of keyword
# man to person
# in Government documents
# in Legal documents.
#
# For example in the
# Legal RULES OF CIVIL PROCEDURE at
# https://www.ontario.ca/laws/regulation/900194/v87
# As of 3-September-2022 the ratio was
# _______ man
# 668 person
# the word man is not used in the RULES OF CIVIL PROCEDURE, but,
# the word person is used 668 times.
# With this script we now have the fact, a ratio, of
# 0 man to
# 668 person.
#
# Going forward, Black's dictionary may be useful to
# research what these words mean.
# to clarify person RoCP fuzzy search output:
# 668 person
# 122 persons
# 108 personal
# 40 person,
# 33 personally
# 13 person.
# 11 person;
# 4 (person
# 3 in-person
# 3 persons,
# 3 persons.
# 2 personally,
# 2 persons:
# 1 (personne)
# 1 persons;
#----------------------------------------------------
# 1014 Total, RoCP fuzzy search
#
# 1014 can be confirmed, double-checked, with a one Line script:
#
# grep -io 'person' /tmp/file.txt | wc -l
#
# aa = bookmark
# Why use this script? example 2
# With Bible as source text, compare words used, like:
# spirit being soul women man person
#
# Discover how Bible frowns upon the word person .
# Download Bible
# https://www.gutenberg.org/ebooks/10
# Script shows:
# Compare keywords, exact match, not a fuzzy match :
# 342 spirit
# 287 being
# 290 soul
# 239 woman
# 1978 man
# 35 person
#
# In addition, try doing Bible manual searches for phrases like:
# living soul King James Version compared to a
# quickening spirit King James Version at:
# https://www.biblegateway.com/quicksearch/?quicksearch=living+soul&version=KJV
#
# Genesis 2:7 KJV
# And the Lord God formed man of the dust of the ground, and
# breathed into his nostrils the breath of life; and
# man became a living soul.
#
# 1 Corinthians 15:45 KJV
# And so it is written,
# The first man Adam was made a living soul;
# the last Adam was made a quickening spirit.
#
# or manual search for phrases like:
# living being New International Version
# life-giving spirit New International Version
# One example, How to run script?
# 1. set Terminal scrollback to greater than 1000 Lines, say 999000 Lines.
# 2. Put your text into /home/x/Music/file.txt
# 3. Run script /home/x/Music/count_words.sh
# x = whoami ~ Bob ~ user etc...
#
# time bash /home/x/Music/count_words.sh # copy and paste into Terminal
# Put text into a file called file.txt example, Go to web page:
# RULES OF CIVIL PROCEDURE
# https://www.ontario.ca/laws/regulation/900194/v87
#
# or
#
# Courts of Justice Act
# https://www.ontario.ca/laws/statute/90c43
#
# or
#
# any Act / document in your Jurisdiction
#
# and do:
# Ctrl-a = highlight all of web page
# Ctrl-v = paste all of web page into a file called file.txt
# file Location = /home/x/Music/file.txt
# source file = src1 = /home/x/Music/file.txt
# run
# time bash /home/x/Music/count_words.sh # copy and paste into Terminal
# aa = bookmark
# How to _?
# How to remove all comment lines starting with hash # ?
# grep -v '^#' count_words.sh > count_words2.sh
# but she-bang #!/bin/sh is also removed
# How to keep she-bang #!/bin/sh then
# delete lines starting with comments # ?
# sed -i '/^\s*\(#[^!].*\|#$\)/d' count_words.sh
# How to remove empty lines including she-bang?
# sed -E '/(^$|^#)/d' count_words.sh > count_words3.sh
# How to show only comments ?
# grep '^\#.* count_words.sh
# How to show only comments and translate # to space ?
# grep '^\#.* count_words.sh |tr '#' ' ' > count_words_comments1.sh # Get comments
# almost ready for spell-check comments
# How to make this script spell-check ready?
# Show only comments, tr # to space, remove http Lines plus more.
# grep '^\#.* count_words.sh |tr '#' ' ' |sed '/http/d' |sed '/grep/d' |sed '/sed -i/d' |sed '/sed -E/d' |sed '/time bash/d' |tr '_______' ' ' > count_words_comments2.sh # No http sed -i sed -E time bash _______
# aa = bookmark
# Question 1
# In this script for Kubuntu 22.04.1,
# how to set the Terminal scroll-back from
# 1000 Lines to
# 999000 Lines?
# Question 2
# What feature would you like to see added to this script?
# added to script count_words.sh to analyze words
# Declare source file,
# change path and filename to fit your needs.
src1='/home/x/Music/file.txt'
# Declare keywords
# Change keywords to fit your needs.
# After displaying List then compare frequency of
# keywords below # 1 2 3 4 5 6 :
keyword1='spirit' #
keyword2='being' #
keyword3='soul' #
keyword4='woman' #
keyword5='man' # only man not demand not many ...
keyword6='person' # only person not personal not in-person ...
# Example with respect to RULES OF CIVIL PROCEDURE RoCP :
# _______ spirit
# 69 being
# _______ soul
# _______ woman
# _______ man
# 668 person
# set -e # future Scripting
# set -u # future Scripting
# clear screen because displaying over 7000 lines from last time this script ran.
clear
# copy file.txt to /tmp
cp "$src1" /tmp/file.txt || exit
cp "$src1" /tmp/file1.txt || exit
echo
blue=$(tput setaf 4)
normal=$(tput sgr0)
printf "%80s\n" "${blue}Script to count word occurrences in file.txt .${normal}"
echo
echo "${blue}About Line 202 state location of file.txt .${normal}"
echo "${blue}About Line 210 state your keywords.${normal}"
echo
echo "${blue}Script shows ratio of man and person and other words${normal}"
echo "${blue}in Government documents and ${normal}"
echo "${blue}in Legal documents and ${normal}"
echo "${blue}in Bible. ${normal}"
echo "${blue}Put data into file.txt . ${normal}"
echo "${blue}Run count_words.sh ${normal}"
echo
echo "Check IF files Exist:"
# test, is file.txt a file in correct location?
if test ! -f "$src1"
then
echo "Error: $src1 not found" >&2
exit 1
else
echo "Ok found: $src1 "
fi
# test for /tmp/file.txt using /tmp as a RAM scratchpad,
# because files cleared on Restart and less drive wear.
if test ! -f "/tmp/file.txt"
then
echo "Error: $src1 not found" >&2
exit 1
else
echo "Ok found: /tmp/file.txt using /tmp as a RAM scratchpad"
fi
# test for /tmp/file1.txt using /tmp as a RAM scratchpad,
# because files cleared on Restart and less drive wear.
if test ! -f "/tmp/file1.txt"
then
echo "Error: $src1 not found" >&2
exit 1
else
echo "Ok found: /tmp/file1.txt using /tmp as a RAM scratchpad"
fi
echo
echo
# Print the first "n" characters of /tmp/file.txt .
# Why? Show file contents being used for this word analysis.
# Why? To remind us as to what document is being analyzed.
# Capture the first 2222 characters of file with sed
# translate squeeze spaces, in case start pages are full of spaces
# translate newline to space
# translate tab to space remove Junk characters for display
# Print the first 397 characters as a reminder
# make file to be used later in script /tmp/print_first_n_characters.txt
# https://www.baeldung.com/linux/display-first-n-characters-of-file
echo "Source content. :" |tr -s '\n' ' '
sed -z 's/^\(.\{2222\}\).*/\1/' /tmp/file.txt \
|tr -s ' ' '\n' \
|tr '\n' ' ' \
|tr '\t' ' ' \
|sed -z 's/^\(.\{397\}\).*/\1/' \
|tee /tmp/print_first_n_characters.txt
echo " ..."
echo
echo
# align output of a basic count
# printf "%6s" "" ; ShellCheck said if 1 variable string then 1 argument to pass
echo "Count:"
printf "%6s" ""
wc /tmp/file1.txt
# basic count plus verbiage for /tmp/file1.txt
wc /tmp/file1.txt \
|awk '{print "Lines: " $1 "\tWords: " $2 "\tCharacters: " $3 }' \
|tr -s '\n' ' '
echo ". Longest Line:" \
|tr -s '\n' ' '
wc -L /tmp/file1.txt
# clean up #1, remove unseen characters
sed "s/\r.*\r/ /g" /tmp/file1.txt \
|tr -cd '\11\12\40-\176' > /tmp/file2.txt
# basic count plus verbiage for /tmp/file2.txt
wc /tmp/file2.txt \
|awk '{print "Lines: " $1 "\tWords: " $2 "\tCharacters: " $3 }' \
|tr -s '\n' ' '
echo ". Longest Line:" \
|tr -s '\n' ' '
wc -L /tmp/file2.txt
# clean up #2
# squeeze space
# convert space to new line
# convert AZ to az : all Lower case...
cat < /tmp/file2.txt \
|tr -s " " \
|tr '[:space:]' '\n' \
|tr '[:upper:]' '[:lower:]' \
|sort \
|uniq -c \
|sort -k1,1nr > /tmp/file3.txt
# basic count plus verbiage for /tmp/file3.txt, show progress made
wc /tmp/file3.txt \
|awk '{print "Lines: " $1 "\tWords: " $2 "\tCharacters: " $3 }' \
|tr -s '\n' ' '
echo ". Longest Line:" \
|tr -s '\n' ' '
wc -L /tmp/file3.txt
echo
echo
# aa = bookmark
echo "Show filtering characters from file ..."
echo "Count characters, Show character set used in file, Show filename:"
echo
# character counts are with respect to RULES OF CIVIL PROCEDURE RoCP in file.txt
# 133 characters used in RoCP /tmp/file1.txt
echo "$(od -c /tmp/file1.txt \
|grep -oP "^\d+ +\K.*" \
|tr -s ' ' '\n' \
|LC_ALL=C sort -u \
|tr -d '\n')" \
|wc -c \
|tr -s '\n' ' ' \
|tr '\n' ' '
echo "$(od -c /tmp/file1.txt \
|grep -oP "^\d+ +\K.*" \
|tr -s ' ' '\n' \
|LC_ALL=C sort -u \
|tr -d '\n')" \
|tr -s '\n' ' '
echo "/tmp/file1.txt"
# 79 characters used in RoCP /tmp/file2.txt
echo "$(od -c /tmp/file2.txt \
|grep -oP "^\d+ +\K.*" \
|tr -s ' ' '\n' \
|LC_ALL=C sort -u \
|tr -d '\n')" \
|wc -c \
|tr -s '\n' ' ' \
|tr '\n' ' '
echo "$(od -c /tmp/file2.txt \
|grep -oP "^\d+ +\K.*" \
|tr -s ' ' '\n' \
|LC_ALL=C sort -u \
|tr -d '\n')" \
|tr -s '\n' ' '
echo "/tmp/file2.txt"
# 51 characters used in RoCP /tmp/file3.txt, shows progress of filters
echo "$(od -c /tmp/file3.txt \
|grep -oP "^\d+ +\K.*" \
|tr -s ' ' '\n' \
|LC_ALL=C sort -u \
|tr -d '\n')" \
|wc -c \
|tr -s '\n' ' ' \
|tr '\n' ' '
echo "$(od -c /tmp/file3.txt \
|grep -oP "^\d+ +\K.*" \
|tr -s ' ' '\n' \
|LC_ALL=C sort -u \
|tr -d '\n')" \
|tr -s '\n' ' '
echo "/tmp/file3.txt"
echo
echo
# aa = bookmark
# List title
printf "%40s\n" "${blue}Sort and count number of word occurrences${normal}"
printf "%40s\n" "${blue}List /tmp/file3.txt :${normal}"
# output example: person personal persons in-person personally ...
# result #1, List, count word occurrences
# cat -A /tmp/file3.txt # show all during testing
cat /tmp/file3.txt
wc -l /tmp/file3.txt |tr -s '\n' ' '
echo "Lines for count word occurrences"
echo
echo "${blue}-------------------------------------------------------------------------${normal}"
echo
# result #2, compare frequency of keywords 1 2 3 4 5 6 fuzzy match
printf "%40s\n" "${blue}compare keywords, fuzzy match${normal}"
echo
echo
echo " fuzzy match *$keyword1*"
grep -i "$keyword1" /tmp/file3.txt
grep -c "$keyword1" /tmp/file3.txt |tr -s '\n' ' '
echo "versions, fuzzy match *$keyword1*"
echo
echo
echo
echo " fuzzy match *$keyword2*"
grep -i "$keyword2" /tmp/file3.txt
grep -c "$keyword2" /tmp/file3.txt |tr -s '\n' ' '
echo "versions, fuzzy match *$keyword2*"
echo
echo
echo
echo " fuzzy match *$keyword3*"
grep -i "$keyword3" /tmp/file3.txt
grep -c "$keyword3" /tmp/file3.txt |tr -s '\n' ' '
echo "versions, fuzzy match *$keyword3*"
echo
echo
echo
echo " fuzzy match *$keyword4*"
grep -i "$keyword4" /tmp/file3.txt
grep -c "$keyword4" /tmp/file3.txt |tr -s '\n' ' '
echo "versions, fuzzy match *$keyword4*"
echo
echo
echo
echo " fuzzy match *$keyword5*"
grep -i "$keyword5" /tmp/file3.txt
grep -c "$keyword5" /tmp/file3.txt |tr -s '\n' ' '
echo "versions, fuzzy match *$keyword5*"
echo
echo
echo
echo " fuzzy match *$keyword6*"
grep -i "$keyword6" /tmp/file3.txt
grep -c "$keyword6" /tmp/file3.txt |tr -s '\n' ' '
echo "versions, fuzzy match *$keyword6*"
echo
echo
# test file had output of over 7000 Lines,
# here is a reminder to navigate those 7000 + Lines
echo "Navigation hints: Ctrl-Shift-Home = Top Ctrl-Shift-UParrow Ctrl-Shift-F = Find"
# Print the first "n" characters of file.txt .
# Print the first 77 characters.
# Why? Show file contents being used for this word analysis.
# Why? To remind us as to what document is being analyzed.
# https://www.baeldung.com/linux/display-first-n-characters-of-file
echo "Source content. :" |tr -s '\n' ' '
sed -z 's/^\(.\{77\}\).*/\1/' /tmp/print_first_n_characters.txt \
|tr -s ' ' '\n' \
|tr '\n' ' '
echo " ..."
echo
echo
# result #3, compare frequency of keywords 1 2 3 4 5 6, exact matches
printf "%9s\n" "${blue}Compare keywords, exact match, not a fuzzy match :${normal}"
# grep < file3 > /dev/null && echo "grep result exist" || echo "grep result absent, not exist"
grep -E "(^| )$keyword1( |$)" < /tmp/file3.txt > /dev/null \
&& grep -E "(^| )$keyword1( |$)" /tmp/file3.txt \
|| echo "_______ $keyword1"
grep -E "(^| )$keyword2( |$)" < /tmp/file3.txt > /dev/null \
&& grep -E "(^| )$keyword2( |$)" /tmp/file3.txt \
|| echo "_______ $keyword2"
grep -E "(^| )$keyword3( |$)" < /tmp/file3.txt > /dev/null \
&& grep -E "(^| )$keyword3( |$)" /tmp/file3.txt \
|| echo "_______ $keyword3"
grep -E "(^| )$keyword4( |$)" < /tmp/file3.txt > /dev/null \
&& grep -E "(^| )$keyword4( |$)" /tmp/file3.txt \
|| echo "_______ $keyword4"
grep -E "(^| )$keyword5( |$)" < /tmp/file3.txt > /dev/null \
&& grep -E "(^| )$keyword5( |$)" /tmp/file3.txt \
|| echo "_______ $keyword5"
grep -E "(^| )$keyword6( |$)" < /tmp/file3.txt > /dev/null \
&& grep -E "(^| )$keyword6( |$)" /tmp/file3.txt \
|| echo "_______ $keyword6"
echo
exit
#
#
#
#
#
#
#
# Copyright September 2022
# count_words.sh version 2a
#
# This script was tested on:
# 1. Kubuntu 22.04.1
#
# This script was tested on shellcheck:
# 2. https://www.shellcheck.net/
# found errors of SC2005 (style): Useless echo?
# unsure how to fix above errors
#
# This script was tested with:
# 3. various Acts and Bibles
#
# script posted to code review
# https://codereview.stackexchange.com/
#
# This program is free software:
# you can redistribute it and/or modify
# it under the terms of the GNU General Public License as
# published by the Free Software Foundation,
# either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY;
# without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# See the GNU General Public License for more details.
#
# You should have received a copy of the
# GNU General Public License along with this program.
# If not, see
# http://www.gnu.org/licenses/
# Software Disclaimer
# There are inherent dangers in the use of any software available for download on the Internet, and we caution you to make sure that you completely understand the potential risks before downloading any of the software.
#
# The Software and code samples available on this website are provided "as is" without warranty of any kind, either express or implied. Use at your own risk.
#
# The use of the software and scripts downloaded on this site is done at your own discretion and risk and with agreement that you will be solely responsible for any damage to your computer system or loss of data that results from such activities. You are solely responsible for adequate protection and backup of the data and equipment used in connection with any of the software, and we will not be liable for any damages that you may suffer in connection with using, modifying or distributing any of this software. No advice or information, whether oral or written, obtained by you from us or from this website shall create any warranty for the software.
#
#
# We make no warranty that the software will meet your requirements.
#
# We make no warranty that the software will be uninterrupted, timely, secure or error-free.
#
# We make no warranty that the results that may be obtained from the use of the software will be effective, accurate or reliable.
#
# We make no warranty that the quality of the software will meet your expectations.
#
# We make no warranty that any errors in the software obtained from us will be corrected.
#
# The software, code sample and their documentation made available on this website:
# could include technical or other mistakes, inaccuracies or typographical errors. We may make changes to the software or documentation made available on its web site at any time without prior-notice.
# may be out of date, and we make no commitment to update such materials.
# We assume no responsibility for errors or omissions in the software or documentation available from its web site.
#
# In no event shall we be liable to you or any third parties for any special, punitive, incidental, indirect or consequential damages of any kind, or any damages whatsoever, including, without limitation, those resulting from loss of use, data or profits, and on any theory of liability, arising out of or in connection with the use of this software.
#
#
# Use this script at your own risk.
# This script might cause your computer to melt-down plus 200 mega bytes.
# i have no understanding of what is really happening behind the scenes.
# We have no understanding of what is really happening behind the scenes.
# i have no understanding of the actual process behind each command.
# We have no understanding of the actual process behind each command.
# i have no understanding of the Kernel.
# We have no understanding of the Kernel.
# Yes, there are observed effects.
# But, the paper map is not the territory.
# Use this script at your own risk.
#
#
#
# version 1a
# count_words.sh
# posted 1-September-2022
# https://codereview.stackexchange.com/questions/279347/bash-script-to-count-word-occurrences-in-file-txt
# Thank you janos
# Thank you Toby Speight
#
#
# version 2a
# count_words.sh
# posted 3-September-2022
#
#
#发布于 2022-09-05 19:49:17
关于代码的前一次迭代的许多反馈意见也适用于以下内容:
这种编码模式多次出现:
如果测试!-f " $src1“,那么回送”错误:$src1未找到“>&2退出1回声”确定找到:$src1“fi
使用函数以避免重复工作,例如:
fatal() {
echo "Error: $*" >&2
exit 1
}
validate_file_exists() {
local path
path=$1
if test ! -f "$path"; then
fatal "file does not exist: $path"
else
echo "Ok, file exists: $path"
fi
}
validate_file_exists "$src1"
validate_file_exists "/tmp/file.txt"
validate_file_exists "/tmp/file1.txt"这对于处理/tmp/file1.txt、/tmp/file2.txt、/tmp/file3.txt的复杂重复代码尤其重要。我在回答您的前一个问题时给您的代码已经准备好使用,我强烈建议您使用它。
最后一个例子是:
print_fuzzy_match() {
local keyword
keyword=$1
echo " fuzzy match *$keyword*"
grep -i "$keyword" /tmp/file3.txt
grep -c "$keyword" /tmp/file3.txt |tr -s '\n' ' '
echo "versions, fuzzy match *$keyword*"
echo
echo
echo
}
print_fuzzy_match "$keyword1"
print_fuzzy_match "$keyword2"
# ...而不是像这样的许多行:
echo "${blue}Script“显示人与人的比率以及其他单词:${words}”echo“在政府文件中,${words}”echo“在法律文件中,${蓝色}在法律文件中,${Nor}”回声“在”圣经“中,${Nor}”echo“将数据放入file.txt,${words}”echo“{blue}运行count_words.sh ${Nor}”“。
这里的文档通常更容易使用:
cat << EOF
${blue}Script shows ratio of man and person and other words${normal}
${blue}in Government documents and ${normal}
${blue}in Legal documents and ${normal}
${blue}in Bible. ${normal}
${blue}Put data into file.txt . ${normal}
${blue}Run count_words.sh ${normal}
EOFhttps://codereview.stackexchange.com/questions/279432
复制相似问题