如何从python的数据框中提取句子并保持段落键？

2024-05-16 • 问答

我有一个数据框，其中包含如下1604段：

我想提取所有句子（甚至使用点以NAIVE的方式），并提供一个新的数据帧，在每行中都有一个句子和上一列的值，尤其是段落关键字（主要是索引的第一列中的索引）左）

我已经进行了努力，可以为每个句子提供以下章节列：

 # Create lists to fill with values
l_col1 = []
l_col2 = []

# iterate over each row and fill our lists
for ix,row in dfAstroNova.iterrows():
    for value in row['sentences']:
        l_col1.append(value)
        l_col2.append(row['chapter'])

# Create new dataframe from the two lists
df= pd.DataFrame({'sentences': l_col1,'chapter': l_col2 })
df=df.rename(columns={"sentences":"sents"});

这给了我这个数据框（dfAstroNova是原始数据框的名称）

如您所见，我具有章节密钥。我的问题是如何添加段落关键字（这是将主数据框中的列文本数添加到新数据框中）

那么，我还有另一列显示该句子属于原始数据帧中的哪个段落，或者更好的是为每个句子包括对应段落的另一列？

# Create lists to fill with values l_col1 = [] l_col2 = [] l_col3 = [] # iterate over each row and fill our lists for ix,row in dfAstroNova.iterrows(): for value in row['sentences']: l_col1.append(value) l_col2.append(row['chapter']) l_col3.append(row['text']) # Create new dataframe from the two lists df= pd.DataFrame({'sentences': l_col1,'chapter': l_col2,'paragraph': l_col3}) df=df.rename(columns={"sentences":"sents"});

如何从python的数据框中提取句子并保持段落键？

dqxgllhxs 回答：如何从python的数据框中提取句子并保持段落键？

大家都在问