使用Python中的计数器将字符串列表中的值分组

我有一段代码:

paragraphs = ['The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets,ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.','PVC/PVDC blister pack','Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.','\n']

final_ref = [['Blister','Foil','Aluminium'],['Blister','Base Web','PVC/PVDC'],['Bottle','Cylindrically shaped Bottles','Polyethylene'],'Screw Type Cap','Polypropylene'],'PVC'],'PVD/PVDC'],'Square Shaped Bottle','Polyethylene']]

colours = ['White','Yellow','Blue','Red','Green','Black','Brown','Silver','Purple','Navy blue','Gray','Orange','Maroon','pink','colourless','blue']

TEXT_WITHOUT_COLOUR = 'Stage {counter} : Package Description: {sen} Values: {values}'

TEXT_WITH_COLOUR = TEXT_WITHOUT_COLOUR + ' Colour: {colour}'

counter = 1
result = []


def is_missing(words,sen):
    for w in words:
        if w.lower() not in sen.lower():
            return True
    return False


for words in final_ref:
    for sen in paragraphs:
        if is_missing(words,sen):
            continue

        kwargs = {
            'counter': counter,'sen': sen,'values': str(words)
        }

        if words[0] == 'Bottle':
            for wd in colours:
                if wd.lower() in sen.lower():
                    kwargs['colour'] = wd
                    break
            text_const = TEXT_WITH_COLOUR
        else:
            text_const = TEXT_WITHOUT_COLOUR

        result.append(text_const.format(**kwargs).replace('\n','').replace('\t',''))
        counter += 1

print(result)

返回的输出为:

["Stage 1 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister','Aluminium']","Stage 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets,ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle','Polyethylene'] Colour: White","Stage 3 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets,'Polypropylene'] Colour: White"]

我要做的是检查“包装说明”的内容,如果相同,我想将所有不同的“值”分组在同一组编号下

所以,我希望输出采用以下格式:

["Group 1: Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister',"Group 2: Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets,'Polyethylene']Colour: white",'Polypropylene']Colour: white"]

有人可以帮我吗?

qqq691608414 回答:使用Python中的计数器将字符串列表中的值分组

此解决方案主要基于正则表达式和循环。 第一个正则表达式在“ Package Description:”和“ Values:”之间找到文本模式,第二个正则表达式用“ Stage + Number”替换“ Group”及其相应的组号。

import re

unique_desc = [] #every unique description is stored 
output      = [] 

for desc in result:

    compare = re.search(r'Package Description:(.*?)Values:',desc).group(1).replace(' ','') #clean spaces

    if compare in unique_desc:  

        group = str(unique_desc.index(compare)+1) #index starts in 0 and group in 1     
        desc = re.sub('Stage \d','Group '+group,desc)
        output.append(desc)

    else: 

        unique_desc.append(compare)     
        group = str(len(unique_desc))    #new group

        desc = re.sub('Stage \d',desc)
        output.append(desc)

和名为output的列表中的结果(新的):

print(output)
["Group 1 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister','Foil','Aluminium']","Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets,ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle','Cylindrically shaped Bottles','Polyethylene'] Colour: White",'Screw Type Cap','Polypropylene'] Colour: White"]

进行排序。 由于这3个文本的开头均以组号开头,因此可以使用sorted()。我在输出列表的第二个元素中放置了数字5,以向您展示它的运行方式:

output = ["Group 1 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister',"Group 5 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets,'Polypropylene'] Colour: White"]

output = sorted(output)

print(output)

它会打印:

["Group 1 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister','Polypropylene'] Colour: White",'Polyethylene'] Colour: White"]

如果您的组数限制为9,则该方法有效。记住,当您将数字作为字符串订购时,它会像这样:

string_numbers = ['1','2','3','4','5','10','12','21']

sort = sorted(string_numbers)

print(sort)
['1','21','5']

如果您认为可能是这种情况,那么您应该提出一个有关如何排序列表的新问题。它需要进行新的开发,因为它不像使用sorted()那样简单。

本文链接:https://www.f2er.com/3052201.html

大家都在问