我想从http://www.reddit.com/new/.rss?sort=new获取rss提要,并将其放在SQL表中。
我能够将RSS feed转换为python(下面的代码)
我只是不知道如何从这里导入到SQL数据库?
我正在研究一款Jupyter笔记本,只是需要一些帮助才能使该项目起步。我还想确保所有内容都是DISTINCT,而不是重复。
import feedparser
a_reddit_rss_url = 'http://www.reddit.com/new/.rss?sort=new'
feed = feedparser.parse( a_reddit_rss_url )
if (feed['bozo'] == 1):
print("Error Reading/Parsing Feed XML Data")
else:
for item in feed[ "items" ]:
print(item) ```
``` python
import feedparser
from bs4 import BeautifulSoup
from bs4.element import Comment
def tag_visible(element):
if element.parent.name in ['style','script','head','title','meta','[document]']:
return False
if isinstance(element,Comment):
return False
return True
def text_from_html(body):
soup = BeautifulSoup(body,'html.parser')
texts = soup.findAll(text=True)
visible_texts = filter(tag_visible,texts)
return u" ".join(t.strip() for t in visible_texts)
# Define URL of the RSS Feed I want
a_reddit_rss_url = 'http://www.reddit.com/new/.rss?sort=new'
feed = feedparser.parse( a_reddit_rss_url )
if (feed['bozo'] == 1):
print("Error Reading/Parsing Feed XML Data")
else:
for item in feed[ "items" ]:
dttm = item[ "date" ]
title = item[ "title" ]
summary_text = text_from_html(item[ "summary" ])
link = item[ "link" ]
print("====================")
print("Title: {} ({})\nTimestamp: {}".format(title,link,dttm))
print("--------------------\nSummary:\n{}".format(summary_text))
带有日期,标题,摘要和链接的SQL表/数据库都有自己的列。