需要优化抓取代码——选择带参数的URL

这是使用搜索参数获取 url 的简单代码。它确实有效,但我认为它需要优化。

def target_url(search_term,include_term,intext_term,target_site_in,page):
    
    base_template_0 = f'https://www.google.com/search?q={search_term}+"{include_term}"+intext:{intext_term}+site:{target_site_in}&hl=en&rlz='
    base_template_1 = f'https://www.google.com/search?q={search_term}+"{include_term}"+intext:{intext_term}&hl=en&rlz='
    base_template_2 = f'https://www.google.com/search?q={search_term}+"{include_term}"&hl=en&rlz='
    base_template_3 = f'https://www.google.com/search?q={search_term}&hl=en&rlz='

    search_term = search_term.replace(' ','+')

    base_url_0 = base_template_0.format(search_term)
    base_url_1 = base_template_1.format(search_term)
    base_url_2 = base_template_2.format(search_term)
    base_url_3 = base_template_3.format(search_term)

    url_template_0 = base_url_0 + '&start={}'
    url_template_1 = base_url_1 + '&start={}'
    url_template_2 = base_url_2 + '&start={}'
    url_template_3 = base_url_3 + '&start={}'

    if page == 0 and search_term and include_term and intext_term and target_site:
        return base_url_0
    if page == 0 and search_term and include_term and intext_term:
        return base_url_1
    if page == 0 and search_term and include_term:
        return base_url_2
    if page == 0 and search_term:
        return base_url_3
    else:
        if search_term and include_term and intext_term and target_site:
            return url_template_0.format(page)
        if search_term and include_term and intext_term:
            return url_template_1.format(page)
        if search_term and include_term:
            return url_template_2.format(page)
        if search_term:
            return url_template_3.format(page)


需要四个参数:search_term、inclusion_term、input_term、target_site_in - 在每种情况下,条件 URL 的指定方式不同。

给我一​​个更好的优化想法。

queencce 回答:需要优化抓取代码——选择带参数的URL

您可以创建一个方法来为您提供最终搜索查询,而不是拥有多个模板字符串并对其进行选择:


def get_search_query(search_term,include_term,intext_term,target_site_in):
  response = search_term.replace(' ','+')
  if include_term:
    response = f"{response}+{include_term}"
  if intext_term:
    response = f"{response}+intext:{intext_term}"
  if target_site_in:
    response = f"{response}+site:{target_site_in}"
  return response

现在在你的方法中你可以调用它

def target_url(search_term,target_site_in,page):
  query = get_search_query(search_term,target_site_in)
  url = f'https://www.google.com/search?q={query}&hl=en&rlz='
  if page != 0:
    url = f"{url}&page={page}"
  return url
本文链接:https://www.f2er.com/23466.html

大家都在问