Python正在检查行是否包含字符串
我试图使一个程序,将排序发现的密码哈希与CSV文件包含哈希和电子邮件。我试图从ex.csv获取“Email”,从find.txt获取“Pass”,其中散列值重合。但是我得到了一个错误-Python正在检查行是否包含字符串,python,pandas,Python,Pandas,我试图使一个程序,将排序发现的密码哈希与CSV文件包含哈希和电子邮件。我试图从ex.csv获取“Email”,从find.txt获取“Pass”,其中散列值重合。但是我得到了一个错误-raisevalueerror( ValueError:数据帧的真值不明确。请使用a.empty、a.bool()、a.item()、a.any()或a.all()。 我的代码- import pandas as pd import numpy as np ex = pd.read_csv("ex.csv",de
raisevalueerror(
ValueError:数据帧的真值不明确。请使用a.empty、a.bool()、a.item()、a.any()或a.all()。
我的代码-
import pandas as pd
import numpy as np
ex = pd.read_csv("ex.csv",delimiter=",")
found = pd.read_csv("found.txt",delimiter=":")
temp = ex[["Hash","Email"]]
te = found[["Hash","Pass"]]
for index,row in te.iterrows(): #Looping through file
if temp.loc[temp['Hash'] == row['Hash'][index]]: # If pandas can't locate Hash string inside a first file, list is empty. And I am comparing that here
print(temp['Email'][index]) # If successful, print out the
print(te['Pass'][index]) # found values in the console
ex.csv中的示例:
Hash Email
0 210ac64b3c5a570e177b26bb8d1e3e93f72081fd example@example.com
1 707a1b7b7d9a12112738bcef3acc22aa09e8c915 example@example.com
2 24529d87ea25b05daba92c2b7d219a470c3ff3a0 example@example.com
found.txt中的示例:
Hash Pass
0 f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1 pass1
1 ecdc5a7c21b2eb84dfe498657039a4296cbad3f4 pass2
2 f61946739c01cff69974093452057c90c3e0ba14 pass3
或者,也许有更好的方法可以遍历行并检查该行是否包含来自另一个文件行的字符串?;)
我存储了这样的值
1) c.csv
2) d.csv
要打印匹配项,请使用以下代码:
for _, row in te.iterrows():
rowHash = row.Hash
matches = temp.Hash == rowHash # boolean mask
if matches.any():
mails = temp[matches].Email.tolist()
print(f'Found: {rowHash} / {row.Pass} / {", ".join(mails)}')
彻底比较我的代码和你的代码。
我认为,这样的比较将允许您定位代码中的错误
你写得不准确,但我想你的错误发生在
if
说明(我的版本不同)
编辑
你也可以尝试另一个概念。由于按索引查找,它应该运行
比上面的循环快得多
# Set 'Hash' column as the index in both DataFrames
temp2 = temp.set_index('Hash')
te2 = te.set_index('Hash')
# Loop over rows in 'te2', index (Hash) in 'teHash'
for teHash, row in te2.iterrows():
try:
res = temp2.loc[teHash] # Attempt to find corresponding row(s) in 'temp2'
if isinstance(res, pd.Series): # Single match found
mails = res.Email
else: # Multiple matches found
mails = ', '.join(res.Email)
print(f'Found: {teHash} / {row.Pass} / {mails}')
except KeyError:
pass # Not found
谢谢你的回答。但是使用您的代码,我得到了错误-回溯(最近一次调用最后一次):文件“a.py”,第14行,在if temp['Hash'].str==row['Hash'][index]:indexer:string index out of ofrange@Mike2233我正在考虑两个数据帧的大小相同。这就是为什么。ex.csv文件更大。但我一点指纹都没有。它只是跳转到索引之外range@Mike2233我已经编辑了代码,但看起来Valdi_Bo的答案更好,而且工作也更容易。
Hash,Pass
f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1,pass1
ecdc5a7c21b2eb84dfe498657039a4296cbad3f4,pass2
f61946739c01cff69974093452057c90c3e0ba14,pass3
for _, row in te.iterrows():
rowHash = row.Hash
matches = temp.Hash == rowHash # boolean mask
if matches.any():
mails = temp[matches].Email.tolist()
print(f'Found: {rowHash} / {row.Pass} / {", ".join(mails)}')
# Set 'Hash' column as the index in both DataFrames
temp2 = temp.set_index('Hash')
te2 = te.set_index('Hash')
# Loop over rows in 'te2', index (Hash) in 'teHash'
for teHash, row in te2.iterrows():
try:
res = temp2.loc[teHash] # Attempt to find corresponding row(s) in 'temp2'
if isinstance(res, pd.Series): # Single match found
mails = res.Email
else: # Multiple matches found
mails = ', '.join(res.Email)
print(f'Found: {teHash} / {row.Pass} / {mails}')
except KeyError:
pass # Not found