如何在不导入任何模块的情况下将文本包装到x个字符的段落中？

2024-05-21 • 问答

我有一篇从文章中解析出来的单词列表（小写）。我使用.join（）将它们连接在一起，并在空格中插入了一个长字符串。标点符号将被视为单词（例如，前后带有空格）。

我想将此字符串写入每行最多X个字符（在这种情况下为90个字符）的文件，而不会打断任何单词。每行不能以空格开头或以空格结尾。

作为任务的一部分，我不允许导入模块，根据我的理解，textwrap会有所帮助。

我基本上有一个while循环嵌套在for循环中，该循环遍历字符串的每90个字符，首先检查它是否不是空格（即，在单词中间）。然后while循环将迭代字符串，直到到达下一个空格（即，将单词合并到同一行）。然后，我检查这行（减去开头和结尾的空格）是否长于90个字符，如果是，则while循环向后迭代，并到达超过90个字符的单词之前的字符。

x = 0
for i in range(89,len(text),90):
    while text[i] != " ":
        i += 1
    if len(text[x:i].strip()) > 90:
        while text[i - 1] != " ":
            i = i - 1
    file.write("".join(text[x:i]).strip() + "\n")
    x = i

与具有正确输出的文件进行比较之后，该代码适用于文件的90％。有时候，有些行会超过90个字符，而没有将多余的单词包装到下一行。

EX：

一行实际输出（93个字符）：

非同寻常的事，但我从来没有被迫读过专利药品广告

在新行中“拼写”的预期输出（84个字符+ 8个字符）：

非同寻常的事，但我从未被\ nnppelled看过专利药品广告

是否有更好的方法可以做到这一点？任何建议，将不胜感激。

在构建要输出的每一行时，您可以考虑使用“缓冲区”来保存数据。读取每个新单词时，请检查是否将其添加到“缓冲区”中会超出行长，如果打印出来，则您将打印“缓冲区”，然后从不适合该句子的单词开始重置“缓冲区”

data = """Lorem ipsum dolor sit amet,consectetur adipiscing elit. Duis a risus nisi. Nunc arcu sapien,ornare sit amet pretium id,faucibus et ante. Curabitur cursus iaculis nunc id convallis. Mauris at enim finibus,fermentum est non,fringilla orci. Proin nibh orci,tincidunt sed dolor eget,iaculis sodales justo. Fusce ultrices volutpat sapien,in tincidunt arcu. Vivamus at tincidunt tortor. Sed non cursus turpis. Sed tempor neque ligula,in elementum magna vehicula in. Duis ultricies elementum pellentesque. Pellentesque pharetra nec lorem at finibus. Pellentesque sodales ligula sed quam iaculis semper. Proin vulputate,arcu et laoreet ultrices,orci lacus pellentesque justo,ut pretium arcu odio at tellus. Maecenas sit amet nisi vel elit sagittis tristique ac nec diam. Suspendisse non lacus purus. Sed vulputate finibus facilisis."""
sentence_limit = 40
buffer = ""
for word in data.split():
    word_length = len(word)
    buffer_length = len(buffer)

    if word_length > sentence_limit:
        print(f"ERROR: the word '{word}' is longer than the sentence limit of {sentence_limit}")
        break

    if buffer_length + word_length < sentence_limit:
        if buffer:
            buffer += " "
        buffer += word
    else:
        print(buffer)
        buffer = word
print(buffer)

输出

Lorem ipsum dolor sit amet,consectetur
adipiscing elit. Duis a risus nisi. Nunc
arcu sapien,faucibus et ante. Curabitur cursus
iaculis nunc id convallis. Mauris at
enim finibus,iaculis
sodales justo. Fusce ultrices volutpat
sapien,in tincidunt arcu. Vivamus at
tincidunt tortor. Sed non cursus turpis.
Sed tempor neque ligula,in elementum
magna vehicula in. Duis ultricies
elementum pellentesque. Pellentesque
pharetra nec lorem at finibus.
Pellentesque sodales ligula sed quam
iaculis semper. Proin vulputate,arcu et
laoreet ultrices,orci lacus
pellentesque justo,ut pretium arcu odio
at tellus. Maecenas sit amet nisi vel
elit sagittis tristique ac nec diam.
Suspendisse non lacus purus. Sed
vulputate finibus facilisis.

使用正则表达式：

import re

with open('f0.txt','r') as f:
    # file must be 1 long single line of text)
    text = f.read().rstrip()
    for line in re.finditer(r'(.{1,70})(?:$|\s)',text):
        print(line.group(1))

不使用正则表达式的另一种方法：

# Constant
J = 70

# output list
out = []

with open('f0.txt','r') as f:
    # assumes file is 1 long line of text
    line = f.read().rstrip()

    i = 0

    while i+J < len(line):
        idx = line.rfind(' ',i,i+J)
        if idx != -1:
            out.append(line[i:idx])
            i = idx+1
        else:
            out.append(line[i:i+J] + '-')
            i += J

    out.append(line[i:]) # get ending line portion

for line in out:
    print(line)

这是文件内容（1个长单字符串）：

I have basically a while loop nested in a for loop that goes through every 90 characters of the string,and firstly checks if it is not a space (ie. in the middle of a word). The while loop would then iterate through the string until it reaches the next space (ie. incorporates the word unto the same line). I then check if this line,minus the leading and trailing whitespaces,is longer than 90 characters,and if it is,the while loop iterates backwards and reaches the character before the word that extends over 90 characters.

输出：

I have basically a while loop nested in a for loop that goes through
every 90 characters of the string,and firstly checks if it is not a
space (ie. in the middle of a word). The while loop would then
iterate through the string until it reaches the next space (ie.
incorporates the word unto the same line). I then check if this line,is longer than 90
characters,the while loop iterates backwards and
reaches the character before the word that extends over 90 characters.

如何在不导入任何模块的情况下将文本包装到x个字符的段落中？

zydx123 回答：如何在不导入任何模块的情况下将文本包装到x个字符的段落中？

大家都在问