我想通过删除除第一行匹配模式'FAT1'以外的所有行来编辑gtf文件,并修改坐标(第3列和第4列)。
f = lambda x: {k:v for k,v in x.items() if x.most_common(1)[0][1] == v}
df['Max'] = df['Col3'].apply(f)
print (df)
Col1 Col2 Col3 Max
0 123 [A,A,B,C] {'A': 2,'B': 2,'C': 1} {'A': 2,'B': 2}
1 456 [A,C,C] {'A': 1,'B': 1,'C': 2} {'C': 2}
2 789 [A,D,D] {'A': 3,'D': 2} {'A': 3}
预期产量
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1 havana exon 137682 137965 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.16"; gene_source "havana";
1 havana gene 139790 140339 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.14"; gene_source "havana";
1 havana exon 140001 140101 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
1 havana gene 143401 145401 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
我尝试过类似的事情。
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1 havana exon 137682 137965 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.16"; gene_source "havana";
1 havana gene 139790 140339 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.14"; gene_source "havana";
1 havana exon 147653 148000 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
但是我敢肯定会有更合理的解决方案。