我有一个很大的实验室数据库,一些ID具有多个结果,我还创建了另一个带有首字母+年龄+性别变量的关键变量,用于与医院病历的其他匹配。但是,我注意到有时不同的名字首字母具有相同的医院ID。我想编写一个函数来检测这种不一致性。
数据库示例:
df=data.frame(ID=c("5606","5606","5728","5824","5824"),key2=c("TN35M","TN35M","JJ26M","CD47F","DG44M","DG44M"),date_sample=c("12/03/2012","12/03/2012","19/04/2012","21/05/2012","19/10/2012","19/10/2012"),service=c("ORTHO","ORTHO","BLOC","VISC","BLOC"),germe=c("Acinetobacter sp","Burkholderia pseudomallei","Stenotrophomonas maltophilia","Staphylococcus haemolyticus"," Enterobacter cloacae","Escherichia coli","Pseudomonas aeruginosa"))
ID key2 date_sample service germe
5606 TN35M 12/03/2012 ORTHO Acinetobacter sp
5606 TN35M 12/03/2012 ORTHO Burkholderia pseudomallei
5728 JJ26M 19/04/2012 BLOC Stenotrophomonas maltophilia
5824 CD47F 21/05/2012 VISC Staphylococcus haemolyticus
5824 CD47F 21/05/2012 VISC Enterobacter cloacae
5824 DG44M 19/10/2012 BLOC Escherichia coli
5824 DG44M 19/10/2012 BLOC Pseudomonas aeruginosa
每个ID应该具有一个唯一的key2变量。如何比较同一“ ID”变量的“ key2”变量行,并有一个输出变量来检测所有不连贯的行,以确保将每个ID分配给一个唯一的患者,但不会被多个1个病人?
喜欢:
ID key2 date_sample service germe incoherence
5606 TN35M 12/03/2012 ORTHO Acinetobacter sp N
5606 TN35M 12/03/2012 ORTHO Burkholderia pseudomallei N
5728 JJ26M 19/04/2012 BLOC Stenotrophomonas maltophilia N
5824 CD47F 21/05/2012 VISC Staphylococcus haemolyticus Y
5824 CD47F 21/05/2012 VISC Enterobacter cloacae Y
5824 DG44M 19/10/2012 BLOC Escherichia coli Y
5824 DG44M 19/10/2012 BLOC Pseudomonas aeruginosa Y