manager.List

我正在尝试使用多重处理来加快AWS Lambda Python中的解析器的速度。

它在本地运行良好,但是当我在lambda上尝试时,出现以下错误。为什么会出现此错误以及如何解决?

 {   "errorMessage": "[Errno 24] Too many open files","errorType":
 "OSError","stackTrace": [
    "  File \"/var/task/lambda_function.py\",line 64,in lambda_handler\n    p.start()\n","  File \"/var/lang/lib/python3.7/multiprocessing/process.py\",line 112,in start\n    self._popen = self._Popen(self)\n","  File \"/var/lang/lib/python3.7/multiprocessing/context.py\",line 223,in _Popen\n    return
 _default_context.get_context().Process._Popen(process_obj)\n",line 277,in _Popen\n    return Popen(process_obj)\n","  File \"/var/lang/lib/python3.7/multiprocessing/popen_fork.py\",line 20,in __init__\n    self._launch(process_obj)\n",line 69,in _launch\n    parent_r,child_w = os.pipe()\n"   ] }

我的代码如下:

def parse(item,L):
    r = requests.get(item[0],headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/41.0.2228.0 Safari/537.3'},cookies={'name': 'Parser','User-Agent': 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'})
    if (r.status_code == 200):
        if item[1] not in L:
            L.append([r.text,r.status_code,item[1]])
    else:
        L.append([None,item[1]])

def lambda_handler(event,context):
    # Make DB Connection

    with Manager() as manager:
        L = manager.list()  # <-- can be shared between processes.
        processes = []

        for item in sqlSelectVars:
            p = multiprocessing.Process(target=parse,args=(item,L))
            processes.append(p)
            p.start()

        for process in processes:
            process.join()

        # Commit my parsed values to the DB with PyMySQL
        try:
            cursor.executemany(sqlUpdateRawHtml,L)
            cursor.connection.commit()
sddpy 回答:manager.List

只需根据您的描述。我想原因是您创建了太多的流程

从Lambda文档AWS Lambda Limits中,我们可以获得:

  

文件描述符:1,024

     

执行进程/线程:1,024

因此,这部分需要更改:

        for item in sqlSelectVars:
            p = multiprocessing.Process(target=parse,args=(item,L))

另一方面,您想“使用多重处理来加快解析器的速度”。实际上,如果您只是在parse 中进行计算,那么太多的进程将毫无用处。取而代之的是,流程创建需要更多开销。

本文链接:https://www.f2er.com/3152455.html

大家都在问