如何使用BeautifulSoup报废手机号码

2024-05-02 • 问答

我只想报废以下格式的手机：

+1 NXX-NXX-XXXX

N=digits 2–9,X=digits 0–9

+1 is the country code that includes the US,there are 17 other countries,e.g.,Canada,Caribbean Islands.

假设我们需要找到所有以986和965开头的数字，等等（我们有一组数字）作为第一个NXX。

这是我获取电子邮件的代码：

    email = soup(text=re.compile(r'[A-Za-z0-9\.\+_-]+@[A-Za-z0-9\._-]+\.[a-zA-Z]*'))

    _emailtokens = str(email).replace("\\t","").replace("\\n","").split(' ')

    if len(_emailtokens):
        print([match.group(0) for token in _emailtokens for match in [re.search(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)",str(token.strip()))] if match])

但是我需要更改它才能获得手机。

假设您已编写了一个抓取器，以将数字（移动和非移动）的字符串存储在列表中（在您的情况下，您很可能已将数字拆分为基于在您的代码上），那么以下代码片段（使用正则表达式）可能会对您有所帮助。

代码

import re

#NXX-NXX-XXXX
#NXX 986 or 965
#N=digits 2–9,X=digits 0–9

#here is the regex pattern you need
pattern = r'(?=[2-9]{1}[0-9]{2}-[2-9]{1}[0-9]{2}-[0-9]{4}$)((?P<hello>986.+)|(?P<world>965.+))'

#Note: give your groups (986 and 965) a sensible name,I am using hello and world for demonstration

sent = ['986-233-8901','965-345-8745','123-456-7890','986-134-5987','1234','$5@67^73']
#Matched,Matched,None,None

regexp = re.compile(pattern)

#the matched results
result = [regexp.match(item) for item in sent]
#change to regexp.search() if needed

#a way to retrieve group elements with prefix 986 (group hello)
hello_group = [item.group('hello') for item in result if item is not None]

输出

print(result)
#[<re.Match object; span=(0,12),match='986-233-8901'>,<re.Match object; span=(0,match='965-345-8745'>,None]

print(hello_group)
#['986-233-8901',None]

如何使用BeautifulSoup报废手机号码

uerk123 回答：如何使用BeautifulSoup报废手机号码

大家都在问