解析,查找章节并写成单独的文件

我很难获得正确的代码来从这本电子书中解析出各个章节,然后将27个章节打印出到自己的文本文件中。我走得最远的是打印“ CHAPTER-1.txt”。我不想硬编码任何东西,也不确定我完全错过了哪里。

infile = open('dracula.txt','r')

readlines = infile.readlines()

toc_list = readlines[74:185]

toc_text_lines = []
for line in toc_list:
    if len(line) > 1:
    stripped_line = line.strip()
    toc_text_lines.append(stripped_line)

#print(len(toc_text_lines))

chaptitles = []
for text_lines in toc_text_lines:
    split_text_line = text_lines.split()
    if split_text_line[-1].isdigit():
    chaptitles.append(text_lines)

#print(len(chaptitles))
print(chaptitles)

infile.close()

import re

with open('dracula.txt') as f:
   book = f.readlines()



while book:
    line = book.pop(0)
if "CHAPTER" in line and book.pop(0) == '\n':
    for title in chapters_names_list: ['CHAPTER I.','CHAPTER II.','CHAPTER III.']
with open("{}.txt".format(chapters_names_list),'w') :
wangleaishang3 回答:解析,查找章节并写成单独的文件

我认为您可以从生成器中受益,假设其中一本电子书太大而无法容纳到内存中,则会遇到一些问题。

您可以做的是构建某种数据处理管道,首先在文件系统中查找文件(ebook.txt),但要记住,一旦我们拥有了所有的功能,就必须使所有功能尽可能通用。文件名,我们将其打开并一次产生一行,最后我们在每一行中扫描“ CHAPTER I。”,“ CHAPTER II。”等

import os
import re
import fnmatch

def find_files(pattern,path):
    """
    Here you can find all the filenames that match a specific pattern
    using shell wildcard pattern that way you avoid hardcoding
    the file pattern i.e 'dracula.txt'
    """
    for root,dirs,files in os.walk(path):
        for name in fnmatch.filter(files,pattern):
            yield os.path.join(root,name)

def file_opener(filenames):
    """
    Open a sequence of filenames one at a time
    and make sure to close the file once we are done 
    scanning its content.
    """
    for filename in filenames:
        if filename.endswith('.txt'):
            f = open(filename,'rt')
        yield f
        f.close()

def chain_generators(iterators):
    """
    Chain a sequence of iterators together
    """
    for it in iterators:
        # Look up yield from if you're unsure what it does
        yield from it

def grep(pattern,lines):
    """
    Look for a pattern in a line i.e 'CHAPTER I.'
    """
    pat = re.compile(pattern)
    for line in lines:
        if pat.search(line):
            yield line

# A simple way to use these functions together

logs = find_files('dracula*','Path/to/files')
files = file_opener(logs)
lines = chain_generators(files)
each_line = grep('CHAPTER I.',lines)
for match in each_line:
    print(match)

您可以在这些实现的基础上完成您想要做的事情。

让我知道这是否有帮助。

本文链接:https://www.f2er.com/3162738.html

大家都在问