Python 3.x 在测试数据帧内容的真值时，如何解决ValueError？python_Python 3.x_Pandas_Valueerror

Python 3.x 在测试数据帧内容的真值时，如何解决ValueError？python

python-3.x pandas

Python 3.x 在测试数据帧内容的真值时，如何解决ValueError？python,python-3.x,pandas,valueerror,Python 3.x,Pandas,Valueerror,我有一个像这样的数据框 done sentence 3_tags 0 0 ['What', 'were', 'the', '...] ['WP', 'VBD', 'DT'] 1 0 ['What', 'was', 'the', '...] ['WP', 'VBD', 'DT'] 2 0 ['Why', 'did', 'John', '...] ['WP', 'VBD', 'NN']

我有一个像这样的数据框

   done    sentence                        3_tags
0  0       ['What', 'were', 'the', '...]   ['WP', 'VBD', 'DT']
1  0       ['What', 'was', 'the', '...]    ['WP', 'VBD', 'DT']
2  0       ['Why', 'did', 'John', '...]    ['WP', 'VBD', 'NN']
...

对于每一行，我想检查列“3_tags”中的列表是否在列表temp1上，如下所示：

a = pd.read_csv('sentences.csv')
temp1 = [ ['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT'] ]
q = a['3_tags'] 
q in temp1

对于第0行的第一句话，“3_tags”的值=['WP'，'VBD'，'DT']，它是temp1，因此我希望上面的结果是：

True

但是，我得到了这个错误：

ValueError:数组长度不同：1对3

我怀疑q的数据类型存在一些问题：

print(type(q))
<class 'pandas.core.series.Series'>

打印（类型（q））

问题是q是一个系列，而temp1包含列表？我应该怎么做才能得到逻辑结果“True”

您希望这些列表改为元组。
然后使用

pd.Series.isin

*temp1, = map(tuple, temp1)

q = a['3_tags'].apply(tuple)

q.isin(temp1)

0     True
1     True
2    False
Name: 3_tags, dtype: bool

但是，

'3_tags'

列似乎由类似列表的字符串组成。在本例中，我们希望使用

ast.literal\u eval

from ast import literal_eval

*temp1, = map(tuple, temp1)

q = a['3_tags'].apply(lambda x: tuple(literal_eval(x)))

q.isin(temp1)

0     True
1     True
2    False
Name: 3_tags, dtype: bool

设置1

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN')))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str, map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN'))))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

设置2

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN')))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str, map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN'))))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

在设置中，我不明白如何按照您显示的方式准备（非常大的）数据帧。如何将其转换为元组？设置是生成变量

和

temp1

。你不应该做任何事。这是为其他可能想测试它的人准备的。你只需要使用上面部分的代码。谢谢，明白了。当我使用顶部部分时，它会给出另一个错误：ValueError:Buffer在执行q=a['3_tags']时有错误的维度数（预期为1，得到2）。应用（元组）然后打印（q），我得到：([，'，D，T，，，，，，，，，N，N，，，，，，，，I，…这意味着你的数据帧都乱七八糟。在你的帖子中，

'3_标记'

中的元素是列表，而它们是类似列表的字符串。我会更新我的帖子来说明这一点。事实上，如果你有能力，你应该提供一种方法来准确地重现你的数据。