Python 如何在事先未知的循环大小上循环on？_Python_Pandas_Loops

Python 如何在事先未知的循环大小上循环on？

python pandas loops

Python 如何在事先未知的循环大小上循环on？,python,pandas,loops,Python,Pandas,Loops,我有一个很大的数据框，里面有章节号、标题、副标题和文本，都是字符串。我想按时间顺序过滤掉标题和副标题之间的特定文本片段，但是章节的副标题数量不固定。因此，我不知道循环的边界我能够找到所有标题和副标题的索引，并找到和提取我需要的特定文本，但我只能在手动输入每个副标题字符串时这样做 import pandas as pd # Example of the contents of the file series = (["1.1.1.1", "lots of useless text", "mor

我有一个很大的数据框，里面有章节号、标题、副标题和文本，都是字符串。我想按时间顺序过滤掉标题和副标题之间的特定文本片段，但是章节的副标题数量不固定。因此，我不知道循环的边界

我能够找到所有标题和副标题的索引，并找到和提取我需要的特定文本，但我只能在手动输入每个副标题字符串时这样做

import pandas as pd

# Example of the contents of the file
series = (["1.1.1.1", "lots of useless text", "more useless text", "I want this text", "1.1.1.2","I want this text","Not this text","1.1.1.3","1.1.2.1","some lines of text","1.2.1.1","Interesting text","1.2.1.2" ])

# These two operations are to get the same structure as I have in my imported file
df2 = pd.DataFrame(series)
df2 = df2.iloc[:,0]

# Start of finding the first chapter
title = 1
subtitle = 1

# Change to string to find the location of the string
string_title = "1."+ str(title)+"."+str(subtitle)
process_loc = df2[df2.str.contains(string_title, na=False)]
idx = process_loc.index

#Locate text I want
true_text   = df2.str[0] == "I"
# Locate text for the subtitle.
text_range  = df2.loc[idx[0]:idx[2]]
text_list   = text_range[true_text == True]

#Loop over all subtitles to get all the subtitles and text I want in 1 DataFrame
new_df2 = pd.DataFrame(columns=['Ordered'])
new_df2 = new_df2.append(process_loc.to_frame())
new_df2 = new_df2.append(text_list.to_frame())

我想要输出：

1.1.1
我想要这个文本
1.1.1.2
我想要这个文本
1.1.1.3
1.1.2.1
1.2.1.1
有趣的文本
1.2.1.2

是否可以循环此操作，或者我必须手动查找所有字幕编号？

您可以使用查找与您的条件匹配的行，例如，查找所有以

开头的行，或者查找数字后跟点的行：

df2[df2.str.match('^I.*|^\d\..*')]

输出：

0              1.1.1.1
3     I want this text
4              1.1.1.2
5     I want this text
7              1.1.1.3
8              1.1.2.1
10             1.2.1.1
11    Interesting text
12             1.2.1.2

你试过使用正则表达式吗？如果你知道你想要什么，你可以使用pandas loc函数和正则表达式来收集你想要的。谢谢！知道这一点，我就可以从一开始就省下几个小时。