根据部分匹配合并两个文件

我有两个文件

FileA.txt

ID
479432_Sros_4274
330214_NIDE2792
517722_CJLT1_010100003977
257310_BB0482
...

FileB.txt(**仅用于帮助您识别匹配项)

members   category
6085.XP_002168109,**479432_Sros_4274**,4956.XP_002495993.1,457425.SSHG_03214,51511.ENSCSAVP000  P
7159.AAEL006372-PA,**257310_BB0482** J
**517722_CJLT1_010100003977**,701176.VIBRN418_17773,9785.ENSLAFP00000010769,28377.ENSACAP00000014901,4081.Solyc03g120250.2.1,3847.GLYMA18G02240.1 U
500485.XP_002561312.1,1042876.PPS_0730,222929.XP_003071446.1,**330214_NIDE2792**  S
...

预期产量

Output.txt

ID  category
479432_Sros_4274  P
330214_NIDE2792  S
517722_CJLT1_010100003977  U
257310_BB0482  J
...

基于对其他问题的回答,我已经在awk和R中尝试了一些代码,但无法获得所需的输出。

qwe541171815 回答:根据部分匹配合并两个文件

这是一种实现方式:

$ awk '
NR==FNR {                  # process file1
    if(FNR==1)             # print header,no newline
        printf $1
    a[$1]                  # hash data
    next
}
{                          # process file2
    if(FNR==1)             # print the other half of the header
        print OFS $2
    for(i in a)            # loop all items in hash
        if($1 ~ i)         # check for partial match
            print i,$2     # if found,output
}' file1 file2             # mind the order

输出(按照file2的顺序,注意输出的最后一行的部分匹配,作为警告留在这里):

ID category
479432_Sros_4274 P
257310_BB0482 J
517722_CJLT1_010100003977 U
330214_NIDE2792 S
ID S
,

请您尝试以下。

awk '
BEGIN{
  print "ID  category"
}
FNR==NR{
  a[$0]
  next
}
{
  for(i in a){
    if(match($0,i)){
      print i,$NF
    }
  }
}
'  Input_filea   Input_fileb

说明: 添加上述代码的说明。

awk '                               ##Starting awk program here.
BEGIN{                              ##Starting BEGIN section from here.
  print "ID  category"              ##Printing string ID,category here.
}                                   ##Closing BLOCK for BEGIN section.
FNR==NR{                            ##Checking condition FNR==NR which will be TRUE when 1st Input_file is being read.
  a[$0]                             ##Creating an array named a whose index is $).
  next                              ##next will skip all further statements from here.
}
{
  for(i in a){                      ##Traversing through array a with for loop.
    if(match($0,i)){                ##Checking condition if match is having a proper regex matched then do following.
      print i,$NF                   ##Printing variable i and $NF of current line.
    }
  }
}
'  Input_filea   Input_fileb        ##Mentioning Input_file names here.
本文链接:https://www.f2er.com/3081664.html

大家都在问