python中最简单的功能映射器
我正在尝试使用python3制作一个最简单的功能映射器。两个目的:获得最佳性能并了解如何编程python: 这是我的代码,它不起作用:python中最简单的功能映射器,python,pandas,Python,Pandas,我正在尝试使用python3制作一个最简单的功能映射器。两个目的:获得最佳性能并了解如何编程python: 这是我的代码,它不起作用: import pandas as pd source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'], 'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']}) #trim
import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})
#trim column value selecting first two symbols
def s_trim(x):
return x[:2]
#make new column from two selecting first two symbols from each
def s_trim_concat(x,y):
return '%s-%s' % (x[:2],y[:2])
features = [
('trim',['Country'],s_trim),
('trim1',['Country','City'],s_trim_concat),
('trim2',['City','Country'],s_trim_concat)
]
for feature_name, columns, func in features:
source[feature_name] = source[columns].apply(func, axis=1)
print(source)
更新:现在代码可以工作了,但我不得不使函数复杂化,所以我仍然在寻找好的解决方案,允许使用简单函数而不需要内部类型转换:
import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})
#trim column value selecting first two symbols
def s_trim(x):
return x.str[:2]
#make new column from two selecting first two symbols from each
def s_trim_concat(row):
x = row[0]
y = row[1]
return '%s-%s' % (x[:2],y[:2])
features = [
('trim',['Country'],s_trim),
('trim1',['Country','City'],s_trim_concat),
('trim2',['City','Country'],s_trim_concat)
]
for feature_name, columns, func in features:
if len(columns) == 1:
source[feature_name] = source[columns].apply(func)
else:
source[feature_name] = source[columns].apply(func, axis=1)
print(source)
我认为问题在于,您正在将一个列表传递到s_trim_concat,而不是两个单独的参数 您能否提供此示例的最终输出的示例。首先,我需要澄清从s_trim_concat返回的值应该与哪个键关联 更新 试试这个:
import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})
#trim column value selecting first two symbols
def s_trim(x):
return x[:2]
#make new column from two selecting first two symbols from each
def s_trim_concat(x,y):
return '%s-%s' % (x[:2],y[:2])
features = [
('trim',['Country'],s_trim),
('trim1',['Country','City'],s_trim_concat),
('trim2',['City','Country'],s_trim_concat)
]
for feature_name, columns, func in features:
source[feature_name] = apply(func, columns)
print(source)
也许我已经找到了一个解决方案:
import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York1', 'New-York', 'Sankt-Petersburg', 'New-York']})
#trim column value selecting first two symbols
def s_trim(x):
return x.str[:2]
#make new column from two selecting first two symbols from each
def s_trim_concat(x,y):
return '%s-%s' % (x[:2],y[:2])
features = [
('trim',['Country'],s_trim),
('trim1',['Country','City'],s_trim_concat),
('trim2',['City','Country'],s_trim_concat)
]
for feature_name, columns, func in features:
source[feature_name] = source[columns].apply(
func if len(columns) == 1
else lambda x: func(x[0],x[1]), axis=1)
print(source)
这应该做什么?在解决分类或回归任务之前,我想添加新的转换列,即清理源数据或对它们进行规范化。在我的示例代码中,我希望s_trim to cat column to two symbol,s_trim_concat-从两个符号中生成一个列。也就是说,对于“USA”,“纽约”要获得“US-Ne”,我将获得“trim”作为“US”,“US”,“Ru”,“US”和“trim1”作为“US-Ne”,“US-Ne”,“Ru-Sa”,“US-Ne”。在我的帖子中,我指出了错误的列名,更正了。apply函数在您的解决方案中不起作用-python找不到它