Python 将一列中的字符串值从一个df检查到另一个df中的另一列
假设我有两只熊猫,看起来像这样:Python 将一列中的字符串值从一个df检查到另一个df中的另一列,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有两只熊猫,看起来像这样: data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]] data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]] first_df = pd.DataFrame(data_set_1, columns = ['Word_
data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]
first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_1, columns = ['Words', 'Numbers'])
是否有方法将第二个DF中的单词
列与第一个DF中的单词集
列进行比较。理想情况下,任何匹配值都会保存到新的DF中
示例输出:
Output:
Column 1 Column 2
----------- ------------
'A big string of words', 'string of' 30
'Big string of words', 'Big swords'
这里的逻辑是在每个索引级别查找匹配的字符串对象,然后使用此命令将其连接起来以获得最终结果
any(x在first_df['Word\u set'][i]中,x在j.split()中)
请查看此代码:
import pandas as pd
data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]
first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_2, columns = ['Words', 'Numbers'])
col1 = []
for i, j in zip(range(3),second_df['Words']):
if any(x in first_df['Word_set'][i] for x in j.split()):
col1.append(', '.join([first_df['Word_set'][i], j]))
col2 = list(first_df['Numbers'][first_df['Numbers'] == second_df['Numbers']])
df = pd.DataFrame(
data= [col1, col2],
index=['Column 1', 'Column 2']
).T
print(df)
Column 1 Column 2
0 A big string of words, string of 30
1 Big string of words, Big swords None
输出:
import pandas as pd
data_set_1 = [['A big string of words', 30], ['Random data point', 60], ['Big string of words', 50]]
data_set_2 = [['string of', 30], ['Character value', 40], ['Big swords', 90]]
first_df = pd.DataFrame(data_set_1, columns = ['Word_set', 'Numbers'])
second_df = pd.DataFrame(data_set_2, columns = ['Words', 'Numbers'])
col1 = []
for i, j in zip(range(3),second_df['Words']):
if any(x in first_df['Word_set'][i] for x in j.split()):
col1.append(', '.join([first_df['Word_set'][i], j]))
col2 = list(first_df['Numbers'][first_df['Numbers'] == second_df['Numbers']])
df = pd.DataFrame(
data= [col1, col2],
index=['Column 1', 'Column 2']
).T
print(df)
Column 1 Column 2
0 A big string of words, string of 30
1 Big string of words, Big swords None
谢谢,这解决了那个问题。你能解释一下你的代码吗?@LuiHelleSee现在可以了!