如何使用发布日期列遍历数据框以制作每日映射表

2024-04-28 • 问答

我正在使用Python 3.7来完成此任务。我有一个存储博客ID，博客名称和发布日期的数据框。我需要将其转换为一个新的数据框，该数据框将每一天和URL映射到ID。我需要这样做才能运行到前一天的日期（撰写本文时为20191103）。假设包括一篇文章一天之内不能被重新发布两次，并且博客的发布日期之前不需要有任何日期。

示例：

data = [[1234,'Blog1',20191030],[1235,20191101],[1237,20191102],[1236,'Blog2',20191101]]
df = pd.DataFrame(data,columns = ['ID','Blog Name','Publish Date'])
df.head()

开始数据框：

     ID  Blog Name  Publish Date
0   1234    Blog1   20191030
1   1235    Blog1   20191101
2   1237    Blog1   20191102
3   1236    Blog2   20191101

目标：最终数据框：

   Blog Name  Date    ID
0   Blog1   20191030 1234
1   Blog1   20191031 1234
2   Blog1   20191101 1235
3   Blog1   20191102 1237
4   Blog1   20191103 1237
5   Blog1   20191101 1236
6   Blog2   20191102 1236
7   Blog2   20191103 1236

我主要不确定如何最好地遍历数据框，是否在原始数据框中创建一个带有“下一个发布日期”的列，然后在新数据框中为“发布日期”之间的每个日期创建一行和“下一个发布日期”？

解决方案：（由代码不同提供）

# Your Publish Date column is string,Need to convert it to Timestamp
df['Publish Date'] = pd.to_datetime(df['Publish Date'],format='%Y%m%d')

def summarize(g):
    # A date range that covers from the first Publish Date to the current day
    d = pd.date_range(g['Publish Date'].min(),pd.Timestamp.now(),name='Publish Date').to_frame(index=False)

    # The merge. This also has the effect of filling any gap in the
    # Publish Date
    return pd.merge_asof(d,g,on='Publish Date')


df.sort_values(['Blog Name','Publish Date']) \
    .groupby('Blog Name').apply(summarize) \
    .reset_index(drop=True)

# Your Publish Date column is string,Need to convert it to Timestamp df['Publish Date'] = pd.to_datetime(df['Publish Date'],format='%Y%m%d') def summarize(g): # A date range that covers from the first Publish Date to the current day d = pd.date_range(g['Publish Date'].min(),pd.Timestamp.now(),name='Publish Date').to_frame(index=False) # The merge. This also has the effect of filling any gap in the # Publish Date return pd.merge_asof(d,g,on='Publish Date') df.sort_values(['Blog Name','Publish Date']) \ .groupby('Blog Name').apply(summarize) \ .reset_index(drop=True)

Publish Date ID Blog Name 0 2019-10-30 1234 Blog1 1 2019-10-31 1234 Blog1 2 2019-11-01 1235 Blog1 3 2019-11-02 1237 Blog1 4 2019-11-03 1237 Blog1 5 2019-11-04 1237 Blog1 6 2019-11-01 1236 Blog2 7 2019-11-02 1236 Blog2 8 2019-11-03 1236 Blog2 9 2019-11-04 1236 Blog2

如何使用发布日期列遍历数据框以制作每日映射表

my183300 回答：如何使用发布日期列遍历数据框以制作每日映射表

大家都在问