Python 比较数据帧列中的多个字符串
我在Python3.x中有以下数据框架,其中有几个数字列和两个带字符串的列:Python 比较数据帧列中的多个字符串,python,string,python-3.x,pandas,Python,String,Python 3.x,Pandas,我在Python3.x中有以下数据框架,其中有几个数字列和两个带字符串的列: import numpy as np import pandas as pd dict = {"numericvals": np.repeat(25, 8), "numeric":np.repeat(42, 8), "first":["beneficiary, duke", "compose", "herd primary", "stall", "deep", "regular summary c
import numpy as np
import pandas as pd
dict = {"numericvals": np.repeat(25, 8),
"numeric":np.repeat(42, 8),
"first":["beneficiary, duke", "compose", "herd primary", "stall", "deep", "regular summary classify", "timber", "property”],
"second": ["abcde”, "abcde”, "abcde”, "abcde”, "abcde”, "abcde”, "abcde”, "abcde”]}
df = pd.DataFrame(dict1)
df = df[['numeric', 'numericvals', 'first', 'second']]
print(df)
numeric numericvals first second
0 42 25 beneficiary, duke abcde
1 42 25 compose abcde
2 42 25 herd primary abcde
3 42 25 stall abcde
4 42 25 deep abcde
5 42 25 regular summary classify abcde
6 42 25 timber abcde
7 42 25 property abcde
列first
包含一个或多个字符串。如果有多个,则用空格或逗号分隔
我的目标是创建一个列,记录第一个
中的字符串长度,这些字符串的长度比第二个
中的字符串长或短。如果大小相同,则应忽略此情况
我的想法是创建两个列表:
longer = []
shorter = []
如果first
中的字符串较长,请通过longer
中的len()
追加字符串长度。如果字符串较短,请通过len()
在short
中记录字符串长度
以下是分析的样子(数据帧格式):
我不知道如何处理
first
中的多个字符串,尤其是当有3个字符串时。在pandas中应该如何进行这种比较?您可以使用pandas.DataFrame.apply
:
这适用于任意数量的字符串,假设任何空格或逗号表示新字符串
以下是输出:
numeric numericvals first second longer shorter
0 42 25 beneficiary, duke abcde [11] [4]
1 42 25 compose abcde [7] [0]
2 42 25 herd primary abcde [7] [4]
3 42 25 stall abcde [0] [0]
4 42 25 deep abcde [0] [4]
5 42 25 regular summary classify abcde [7, 7, 8] [0]
6 42 25 timber abcde [6] [0]
7 42 25 property abcde [8] [0]
我尽力了。希望这有帮助
import operator
def transform(df, op):
lengths = [len(s) for s in df['first'].replace(',', ' ').split()]
return [f for f in lengths if op(f, len(df.second))] or [0]
df['longer'] = df.apply(transform, axis=1, args=[operator.gt])
df['shorter'] = df.apply(transform, axis=1, args=[operator.lt])
numeric numericvals first second longer shorter
0 42 25 beneficiary, duke abcde [11] [4]
1 42 25 compose abcde [7] [0]
2 42 25 herd primary abcde [7] [4]
3 42 25 stall abcde [0] [0]
4 42 25 deep abcde [0] [4]
5 42 25 regular summary classify abcde [7, 7, 8] [0]
6 42 25 timber abcde [6] [0]
7 42 25 property abcde [8] [0]