Python正在检查行是否包含字符串_Python_Pandas

Python正在检查行是否包含字符串

python pandas

Python正在检查行是否包含字符串,python,pandas,Python,Pandas,我试图使一个程序，将排序发现的密码哈希与CSV文件包含哈希和电子邮件。我试图从ex.csv获取“Email”，从find.txt获取“Pass”，其中散列值重合。但是我得到了一个错误-raisevalueerror( ValueError:数据帧的真值不明确。请使用a.empty、a.bool（）、a.item（）、a.any（）或a.all（）。我的代码- import pandas as pd import numpy as np ex = pd.read_csv("ex.csv",de

我试图使一个程序，将排序发现的密码哈希与CSV文件包含哈希和电子邮件。我试图从ex.csv获取“Email”，从find.txt获取“Pass”，其中散列值重合。但是我得到了一个错误-

raisevalueerror(
ValueError:数据帧的真值不明确。请使用a.empty、a.bool（）、a.item（）、a.any（）或a.all（）。

我的代码-

import pandas as pd
import numpy as np

ex = pd.read_csv("ex.csv",delimiter=",")
found = pd.read_csv("found.txt",delimiter=":")

temp = ex[["Hash","Email"]]
te = found[["Hash","Pass"]]

for index,row in te.iterrows(): #Looping through file
    if temp.loc[temp['Hash'] == row['Hash'][index]]: # If pandas can't locate Hash string inside a first file, list is empty. And I am comparing that here
        print(temp['Email'][index]) # If successful, print out the
        print(te['Pass'][index])    # found values in the console

ex.csv中的示例：

                                          Hash                    Email
0     210ac64b3c5a570e177b26bb8d1e3e93f72081fd  example@example.com
1     707a1b7b7d9a12112738bcef3acc22aa09e8c915  example@example.com
2     24529d87ea25b05daba92c2b7d219a470c3ff3a0  example@example.com

found.txt中的示例：

                                         Hash         Pass
0    f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1     pass1
1    ecdc5a7c21b2eb84dfe498657039a4296cbad3f4     pass2
2    f61946739c01cff69974093452057c90c3e0ba14     pass3

或者，也许有更好的方法可以遍历行并检查该行是否包含来自另一个文件行的字符串？；）

我存储了这样的值

1） c.csv

2） d.csv

要打印匹配项，请使用以下代码：

for _, row in te.iterrows():
    rowHash = row.Hash
    matches = temp.Hash == rowHash  # boolean mask
    if matches.any():
        mails = temp[matches].Email.tolist()
        print(f'Found:  {rowHash} / {row.Pass} / {", ".join(mails)}')

彻底比较我的代码和你的代码。我认为，这样的比较将允许您定位代码中的错误

你写得不准确，但我想你的错误发生在

if

说明（我的版本不同）

编辑你也可以尝试另一个概念。由于按索引查找，它应该运行比上面的循环快得多

# Set 'Hash' column as the index in both DataFrames
temp2 = temp.set_index('Hash')
te2 = te.set_index('Hash')
# Loop over rows in 'te2', index (Hash) in 'teHash'
for teHash, row in te2.iterrows():
    try:
        res = temp2.loc[teHash]  # Attempt to find corresponding row(s) in 'temp2'
        if isinstance(res, pd.Series):  # Single match found
            mails = res.Email
        else:                           # Multiple matches found
            mails = ', '.join(res.Email)
        print(f'Found: {teHash} / {row.Pass} / {mails}')
    except KeyError:
        pass      # Not found

谢谢你的回答。但是使用您的代码，我得到了错误-回溯（最近一次调用最后一次）：文件“a.py”，第14行，在if temp['Hash'].str==row['Hash'][index]：indexer:string index out of ofrange@Mike2233我正在考虑两个数据帧的大小相同。这就是为什么。ex.csv文件更大。但我一点指纹都没有。它只是跳转到索引之外range@Mike2233我已经编辑了代码，但看起来Valdi_Bo的答案更好，而且工作也更容易。

Hash,Pass
f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1,pass1
ecdc5a7c21b2eb84dfe498657039a4296cbad3f4,pass2
f61946739c01cff69974093452057c90c3e0ba14,pass3

for _, row in te.iterrows():
    rowHash = row.Hash
    matches = temp.Hash == rowHash  # boolean mask
    if matches.any():
        mails = temp[matches].Email.tolist()
        print(f'Found:  {rowHash} / {row.Pass} / {", ".join(mails)}')

# Set 'Hash' column as the index in both DataFrames
temp2 = temp.set_index('Hash')
te2 = te.set_index('Hash')
# Loop over rows in 'te2', index (Hash) in 'teHash'
for teHash, row in te2.iterrows():
    try:
        res = temp2.loc[teHash]  # Attempt to find corresponding row(s) in 'temp2'
        if isinstance(res, pd.Series):  # Single match found
            mails = res.Email
        else:                           # Multiple matches found
            mails = ', '.join(res.Email)
        print(f'Found: {teHash} / {row.Pass} / {mails}')
    except KeyError:
        pass      # Not found