如何将 UNIX 中的 grep 命令限制为特定文件和 shell 脚本中的特定目录

2024-06-03 • 问答

我正在尝试使用 grep 命令在包含特定目录中数千行的文件的每一行中查找唯一字符串，并将输出保存在新文件（例如 output.txt）中的一行中。然后我对更多的文件重复这个过程，将所有输出存储在同一个文件中。

但是，使用grep命令输出的每一行远远超过了每个文件的总行数，而且输出也只包含一个文件而不是数百个文件的结果。此外，为了不更改目录中的任何文件，我将目录复制到另一个位置以保护原始目录。下面是我写的代码（带注释）供您查看和帮助；

#!/bin/bash
IFS=$'\n' read -d '' -r -a genomes < list.txt
j=0
m=0
grep -o Eco.* ./${genomes[j]}_ani_output.txt | cut -c 59- | cut -d '/' -f 1 >> list1
IFS=$'\n' read -d '' -r -a genome_names < list1 
rm list1

while [ $m -eq 0 ] 
do
    for((i=0; i<=16547; i++))
    do
        grep -w ${genome_names[i]} ./${genomes[j]}_ani_output.txt >> list2 # grep command to fine unique genome names in a specific order. This grep command in the for loop is the problem. generates wrong outputs
    done

    if ((i=16547))
    then      
        cut -f 3 list2 >> list3
        rm list2                    

        IFS=$'\n' read -d '' -r -a values < list3 
        echo "${genomes[j]} ${values[*]}" >> table
        i=$((i-i))            
        j=$((j+1))              
        rm list3
        unset name 
    fi

    if ((j=1000))
    then
        m=$((m+1))
    fi
done

第一个 cut 命令用于获取名为 list1 的文件的第二列中的基因组名称，如附图所示。第二次切割用于获取每个基因组的值。数字 16547 是在每个文件中找到的总行数。

content of file with unique genome names in second column retrieved using first cut command

如何将 UNIX 中的 grep 命令限制为特定文件和 shell 脚本中的特定目录

cjq790323 回答：如何将 UNIX 中的 grep 命令限制为特定文件和 shell 脚本中的特定目录

大家都在问