Python 在for循环时如何在数据帧中添加行?
我想在现有数据帧中添加一行,其中没有匹配的正则表达式值。 比如说,Python 在for循环时如何在数据帧中添加行?,python,dataframe,Python,Dataframe,我想在现有数据帧中添加一行,其中没有匹配的正则表达式值。 比如说, import pandas as pd import numpy as np import re lst = ['Sarah Kim', 'Added by January 21'] df = pd.DataFrame(lst) df.columns = ['Info'] name_pat = r"^[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+" date_pat =
import pandas as pd
import numpy as np
import re
lst = ['Sarah Kim', 'Added by January 21']
df = pd.DataFrame(lst)
df.columns = ['Info']
name_pat = r"^[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+"
date_pat = r"\b(\w*Added on\w*)\b"
title_pat = r"\b(\w*at\w*)\b"
for index, row in dff.iterrows():
if re.findall(name_pat, str(row['Info'])):
print("Name matched")
elif re.findall(title_pat, str(row['Info'])):
print("Title matched")
if re.findall(title_pat, str(row['Info'])) == None:
# Add a row here in the dataframe
elif re.findall(date_pat, str(row['Info'])):
print("Date matched")
if re.findall(date_pat, str(row['Info'])) == None:
# Add a row here in the dataframe
Info
0 Sarah Kim
1 Added on January 21
2 Jesus A. Moore
3 Marketer
4 Added on May 30
5 Bobbie J. Garcia
6 CEO
7 Anita Jobe
8 Designer
9 Added on January 3
...
998 Michael B. Reedy
999 Salesman
1000 Added on December 13
所以在我的数据框df中,我没有标题,只有名字和日期。在循环df时,我想为标题添加一个空列输出为: Info 0 Sarah Kim 1 Added on January 21 Info 0 Sarah Kim 1 None 2 Added on January 21 信息 0莎拉·金 1于1月21日添加 我的预期产出是: Info 0 Sarah Kim 1 Added on January 21 Info 0 Sarah Kim 1 None 2 Added on January 21 信息 0莎拉·金 1无 2于一月二十一日增补 有什么方法可以添加空列,还是有更好的方法 +++ 我使用的数据集只是一列多行。行具有某种结构,重复“名称、标题、日期”的数据。比如说,
import pandas as pd
import numpy as np
import re
lst = ['Sarah Kim', 'Added by January 21']
df = pd.DataFrame(lst)
df.columns = ['Info']
name_pat = r"^[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+"
date_pat = r"\b(\w*Added on\w*)\b"
title_pat = r"\b(\w*at\w*)\b"
for index, row in dff.iterrows():
if re.findall(name_pat, str(row['Info'])):
print("Name matched")
elif re.findall(title_pat, str(row['Info'])):
print("Title matched")
if re.findall(title_pat, str(row['Info'])) == None:
# Add a row here in the dataframe
elif re.findall(date_pat, str(row['Info'])):
print("Date matched")
if re.findall(date_pat, str(row['Info'])) == None:
# Add a row here in the dataframe
Info
0 Sarah Kim
1 Added on January 21
2 Jesus A. Moore
3 Marketer
4 Added on May 30
5 Bobbie J. Garcia
6 CEO
7 Anita Jobe
8 Designer
9 Added on January 3
...
998 Michael B. Reedy
999 Salesman
1000 Added on December 13
信息
0莎拉·金
1于1月21日添加
2耶稣·A·摩尔
3营销人员
4于5月30日添加
5鲍比·J·加西亚
6首席执行官
7安妮塔·乔布
8设计师
9于一月三日增补
...
998迈克尔·B·里德
999推销员
12月13日增加1000
我已经对数据帧进行了切片,因此只能提取如下所示的数据帧:
Info
0 Sarah Kim
1 Added on January 21
信息
0莎拉·金
1于1月21日添加
我尝试为每个部分运行一个循环,如果缺少日期或标题,我将填充一个空行。最后,我将:
Info
0 Sarah Kim
1 **NULL**
2 Added on January 21
3 Jesus A. Moore
4 Marketer
5 Added on May 30
6 Bobbie J. Garcia
7 CEO
8 **NULL**
9 Anita Jobe
10 Designer
11 Added on January 3
...
998 Michael B. Reedy
999 Salesman
1000 Added on December 13
信息
0莎拉·金
1**NULL**
2于一月二十一日增补
3耶稣·A·摩尔
4营销人员
5于5月30日添加
6鲍比·J·加西亚
7首席执行官
8**空**
9安妮塔·乔布
10设计师
11于一月三日增补
...
998迈克尔·B·里德
999推销员
12月13日增加1000
我看到您有一个包含信息的长数据帧,并且每一组信息都是不同的。我认为您的目标可能是拥有一个包含3列的数据框架 姓名、职务和日期 下面是我处理这个问题的方法和一些代码示例。我会利用这个方法,这样我就可以绑定信息并使用您现有的数据帧来创建一个新的数据帧 我也在根据上面列出的内容做出一些假设。首先,我假设只有Title和Date字段可能会丢失。第二,我将假设的顺序是名称,标题和日期,就像你上面提到的
#first step create test data
test_list = ['Sarah Kim','Added on January 21','Jesus A. Moore','Marketer','Added on May 30','Bobbie J. Garcia','CEO','Anita Jobe','Designer','Added on January 3']
test_df =pd.DataFrame(test_list,columns=['Info'])
# second step use your regex to get what type of column each info value is
name_pat = r"^[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+"
date_pat = r"\b(\w*Added on\w*)\b"
title_pat = r"\b(\w*at\w*)\b"
test_df['Col'] = test_df['Info'].apply(lambda x: 'Name' if re.findall(name_pat, x) else ('Date' if re.findall(date_pat,x) else 'Title'))
# third step is to get the next values from our dataframe using df.shift
test_df['Next_col'] = test_df['Col'].shift(-1)
test_df['Next_col2'] = test_df['Col'].shift(-2)
test_df['Next_val1'] = test_df['Info'].shift(-1)
test_df['Next_val2'] = test_df['Info'].shift(-2)
# Now filter to only the names and apply a function to get our name, title and date
new_df = test_df[test_df['Col']=='Name']
def apply_func(row):
name = row['Info']
title = None
date = None
if row['Next_col']=='Title':
title = row['Next_val1']
elif row['Next_col']=='Date':
date = row['Next_val1']
if row['Next_col2']=='Date':
date = row['Next_val2']
row['Name'] = name
row['Title'] = title
row['date'] = date
return row
final_df = new_df.apply(apply_func,axis=1)[['Name','Title','date']].reset_index(drop=True)
print(final_df)
Name Title date
0 Sarah Kim None Added on January 21
1 Jesus A. Moore Marketer Added on May 30
2 Bobbie J. Garcia CEO None
3 Anita Jobe Designer Added on January 3
也许有一种方法,我们可以用更少的代码行来实现这一点。我欢迎任何能够提高效率的人,但我相信这应该是可行的。如果你想把它展平成一个数组
flattened_df = pd.DataFrame(final_df.values.flatten(),columns=['Info'])
print(flattened_df)
Info
0 Sarah Kim
1 None
2 Added on January 21
3 Jesus A. Moore
4 Marketer
5 Added on May 30
6 Bobbie J. Garcia
7 CEO
8 None
9 Anita Jobe
10 Designer
11 Added on January 3
有什么办法可以添加一个空列吗?是的,你试过了吗?最好是使用矢量化操作,你应该阅读熊猫文档。无论如何,这方面有很多资源,你能澄清问题所在吗?@AMC你能至少给我一些关于研究内容的资源吗?我不需要完整的代码来解决这个问题,但更重要的是,我在解决这个问题时遇到了问题。是的,我试图添加一个空列,但没有一个有效。我发现熊猫官方文档非常好!请多解释,无法理解您的问题。