根据文档中的文本用python重命名.doc或.docx

我有一个问题,要根据文档中的特定文本更改.doc或.docx文件名。

我已经可以使用.txt文件建立此功能。使用以下代码:

import os
import re
pat = "ID number(\\d\\d\\d\\d\\d)"         #This is for the text to be found in the file
ext = '.txt'                                #Type of file the python is searching for
mydir = ''  #Path or directory where python is doing its magic

for arch in os.listdir(mydir):              
    archpath = os.path.join(mydir,arch)
    with open(archpath) as f:
        txt = f.read()
    s = re.search(pat,txt)
    if s is None:
        continue
    name = s.group(1)
    newpath = os.path.join(mydir,name)
    if not os.path.exists(newpath):
        os.rename(archpath,newpath + ext)

有人对此有任何看法吗?

zhaojiyong 回答:根据文档中的文本用python重命名.doc或.docx

您将需要python-docx

from docx import Document
for arch in os.listdir(mydir):              
    archpath = os.path.join(mydir,arch)
    document = Document(archpath)
    for para in document.paragraphs:
        s = re.search(pat,para.text)
    if s is None:
        continue
    name = s.group(1)
    newpath = os.path.join(mydir,name)
    if not os.path.exists(newpath):
        os.rename(archpath,newpath+'docx')
,

找到答案。问题在我身上。我试图寻找一个价值。但是我需要指定一个单元格。由于该值在表中。

这是结果:

import os
import re
import sys
pat = "(\d+)"       #Type of string/value that is being renamed
ext = '.docx'       #Type of file the python is searching for
mydir = ''  #Path or directory where python is doing its magic

from docx import Document
for arch in os.listdir(mydir):
    archpath = os.path.join(mydir,arch)
    document = Document(archpath)
    table = document.tables[0]
    s = re.search(pat,table.cell(1,2).text)
    if s is None:
        continue
    name = s.group(1)
    newpath = os.path.join(mydir,newpath + ext)
print (newpath + ext)
input("Press Enter to exit")

需要考虑的是,此方法仅适用于Word 2007及更高版本可用的.docx文件。由于python-docx不适用于早期版本或.doc文件

所以我的下一个项目是实现从.doc到.docx的转换器

感谢大家的参与。

本文链接:https://www.f2er.com/2972108.html

大家都在问