在构建要输出的每一行时,您可以考虑使用“缓冲区”来保存数据。读取每个新单词时,请检查是否将其添加到“缓冲区”中会超出行长,如果打印出来,则您将打印“缓冲区”,然后从不适合该句子的单词开始重置“缓冲区”
data = """Lorem ipsum dolor sit amet,consectetur adipiscing elit. Duis a risus nisi. Nunc arcu sapien,ornare sit amet pretium id,faucibus et ante. Curabitur cursus iaculis nunc id convallis. Mauris at enim finibus,fermentum est non,fringilla orci. Proin nibh orci,tincidunt sed dolor eget,iaculis sodales justo. Fusce ultrices volutpat sapien,in tincidunt arcu. Vivamus at tincidunt tortor. Sed non cursus turpis. Sed tempor neque ligula,in elementum magna vehicula in. Duis ultricies elementum pellentesque. Pellentesque pharetra nec lorem at finibus. Pellentesque sodales ligula sed quam iaculis semper. Proin vulputate,arcu et laoreet ultrices,orci lacus pellentesque justo,ut pretium arcu odio at tellus. Maecenas sit amet nisi vel elit sagittis tristique ac nec diam. Suspendisse non lacus purus. Sed vulputate finibus facilisis."""
sentence_limit = 40
buffer = ""
for word in data.split():
word_length = len(word)
buffer_length = len(buffer)
if word_length > sentence_limit:
print(f"ERROR: the word '{word}' is longer than the sentence limit of {sentence_limit}")
break
if buffer_length + word_length < sentence_limit:
if buffer:
buffer += " "
buffer += word
else:
print(buffer)
buffer = word
print(buffer)
输出
Lorem ipsum dolor sit amet,consectetur
adipiscing elit. Duis a risus nisi. Nunc
arcu sapien,faucibus et ante. Curabitur cursus
iaculis nunc id convallis. Mauris at
enim finibus,iaculis
sodales justo. Fusce ultrices volutpat
sapien,in tincidunt arcu. Vivamus at
tincidunt tortor. Sed non cursus turpis.
Sed tempor neque ligula,in elementum
magna vehicula in. Duis ultricies
elementum pellentesque. Pellentesque
pharetra nec lorem at finibus.
Pellentesque sodales ligula sed quam
iaculis semper. Proin vulputate,arcu et
laoreet ultrices,orci lacus
pellentesque justo,ut pretium arcu odio
at tellus. Maecenas sit amet nisi vel
elit sagittis tristique ac nec diam.
Suspendisse non lacus purus. Sed
vulputate finibus facilisis.
,
使用正则表达式:
import re
with open('f0.txt','r') as f:
# file must be 1 long single line of text)
text = f.read().rstrip()
for line in re.finditer(r'(.{1,70})(?:$|\s)',text):
print(line.group(1))
不使用正则表达式的另一种方法:
# Constant
J = 70
# output list
out = []
with open('f0.txt','r') as f:
# assumes file is 1 long line of text
line = f.read().rstrip()
i = 0
while i+J < len(line):
idx = line.rfind(' ',i,i+J)
if idx != -1:
out.append(line[i:idx])
i = idx+1
else:
out.append(line[i:i+J] + '-')
i += J
out.append(line[i:]) # get ending line portion
for line in out:
print(line)
这是文件内容(1个长单字符串):
I have basically a while loop nested in a for loop that goes through every 90 characters of the string,and firstly checks if it is not a space (ie. in the middle of a word). The while loop would then iterate through the string until it reaches the next space (ie. incorporates the word unto the same line). I then check if this line,minus the leading and trailing whitespaces,is longer than 90 characters,and if it is,the while loop iterates backwards and reaches the character before the word that extends over 90 characters.
输出:
I have basically a while loop nested in a for loop that goes through
every 90 characters of the string,and firstly checks if it is not a
space (ie. in the middle of a word). The while loop would then
iterate through the string until it reaches the next space (ie.
incorporates the word unto the same line). I then check if this line,is longer than 90
characters,the while loop iterates backwards and
reaches the character before the word that extends over 90 characters.
本文链接:https://www.f2er.com/3127655.html