我是NLP的新手,并尝试解决部分语音标签问题。我有一个句子及其词性,我想以整行的('word','pos_tag')形式写成一行。
1. aImIroawi/ADJ TIOIna/N ::/PUN
2. qIdImi/PRE bIzuHI/ADJ oametatI/N "/PUN aImIroawi/ADJ sInIkIlIna/N bIganEnI/N
3. weyI/CON IkeyI/ADJ menafIsIti/N iyu/V_AUX zImexII/V_REL "/PUN zIbIlI/V_REL 4. gIguyI/ADJ ameleKaKIta/N neyIru/V_GER ::/PUN
这是我拥有的数据
for line in corpus:
lene =line.split()
sentence.append(lene)
#print(sentence)
#print (lene)
for word in lene:
w,tag = word.split('/')
words.append(w)
tags.append(tag)
#print(len(lene))
for line in corpus:
onesentence=line.split()
print(len(onesentence))
for elem in onesentence:
for i in range(len(onesentence)):
elem = words[i],tags[i]
print(elem)
('qIdImi','PRE')
('bIzuHI','ADJ')
('oametatI','N')
('"','PUN')
('aImIroawi','ADJ')
('sInIkIlIna','N')
('bIganEnI','N')
('weyI','CON')
('IkeyI','ADJ')
在上面是我得到的结果,但是我想在数组列表中的一个句子中写所有带有相应pos的单词。像
[ [('aImIroawi','ADJ'),('TIOIna','N'),('::','PUN')],[('qIdImi','PRE'),('bIzuHI',('oametatI',.....]]