String Pandas系列中字符串元素的大写后两个索引
我有以下熊猫系列:String Pandas系列中字符串元素的大写后两个索引,string,python-3.x,pandas,methods,series,String,Python 3.x,Pandas,Methods,Series,我有以下熊猫系列: test_series = pd.Series(['canton, nc', 'leicester, nc', 'asheville, nc', 'candler, nc', 'marshall, nc', 'waynesville, nc', 'fletcher, nc', 'hendersonville, nc', 'old fort, nc', 'horse shoe, nc', 'black mountain, nc', 'm
test_series = pd.Series(['canton, nc', 'leicester, nc', 'asheville, nc', 'candler, nc',
'marshall, nc', 'waynesville, nc', 'fletcher, nc',
'hendersonville, nc', 'old fort, nc', 'horse shoe, nc',
'black mountain, nc', 'maggie valley, nc', 'burnsville, nc',
'weaverville, nc', 'zirconia, nc', 'swannanoa, nc',
'hot springs, nc', 'arden, nc', 'east flat rock, nc', 'marion, nc',
'mars hill, nc', 'flat rock, nc', 'rutherfordton, nc', 'clyde, nc',
'saluda, nc', 'alexander, nc', 'fairview, nc', 'mill spring, nc',
'brevard, nc', 'mills river, nc', 'penrose, nc',
'pisgah forest, nc', 'barnardsville, nc', 'etowah, nc',
'travelers rest, sc', 'lake lure, nc', 'montreat, nc', 'dana, nc',
'greenville, sc', 'flag pond, tn', 'laurel park, nc'])
我想把州字母缩写成大写。这是我最好的猜测:
test_series.str[-2:].upper()
,但我得到了一个属性错误。执行此操作最有效的方法是什么?首先选择所有不带last 2的值,并通过以下方式将转换为大写的值添加到last 2:
为了获得更好的性能,请使用列表理解和数据集中无NAN:
test_series = pd.Series([i[:-2] + i[-2:].upper() for i in test_series])
均一性测试:
(test_series.str[:-2] + test_series.str[-2:].str.upper() == pd.Series([i[:-2] + i[-2:].upper() for i in test_series])).all()
True
时间:
%timeit test_series.str[:-2] + test_series.str[-2:].str.upper()
1000 loops, best of 3: 1.1 ms per loop
%timeit pd.Series([i[:-2] + i[-2:].upper() for i in test_series])
1000 loops, best of 3: 245 µs per loop
输出:
0 canton, NC
1 leicester, NC
2 asheville, NC
3 candler, NC
4 marshall, NC
5 waynesville, NC
6 fletcher, NC
7 hendersonville, NC
8 old fort, NC
9 horse shoe, NC
10 black mountain, NC
11 maggie valley, NC
12 burnsville, NC
13 weaverville, NC
14 zirconia, NC
15 swannanoa, NC
16 hot springs, NC
17 arden, NC
18 east flat rock, NC
19 marion, NC
20 mars hill, NC
21 flat rock, NC
22 rutherfordton, NC
23 clyde, NC
24 saluda, NC
25 alexander, NC
26 fairview, NC
27 mill spring, NC
28 brevard, NC
29 mills river, NC
30 penrose, NC
31 pisgah forest, NC
32 barnardsville, NC
33 etowah, NC
34 travelers rest, SC
35 lake lure, NC
36 montreat, NC
37 dana, NC
38 greenville, SC
39 flag pond, TN
40 laurel park, NC
dtype: object
是的,但最好使用pandas
.str
解决方案,因为如果至少有一个NaN
,纯python就会失败。
0 canton, NC
1 leicester, NC
2 asheville, NC
3 candler, NC
4 marshall, NC
5 waynesville, NC
6 fletcher, NC
7 hendersonville, NC
8 old fort, NC
9 horse shoe, NC
10 black mountain, NC
11 maggie valley, NC
12 burnsville, NC
13 weaverville, NC
14 zirconia, NC
15 swannanoa, NC
16 hot springs, NC
17 arden, NC
18 east flat rock, NC
19 marion, NC
20 mars hill, NC
21 flat rock, NC
22 rutherfordton, NC
23 clyde, NC
24 saluda, NC
25 alexander, NC
26 fairview, NC
27 mill spring, NC
28 brevard, NC
29 mills river, NC
30 penrose, NC
31 pisgah forest, NC
32 barnardsville, NC
33 etowah, NC
34 travelers rest, SC
35 lake lure, NC
36 montreat, NC
37 dana, NC
38 greenville, SC
39 flag pond, TN
40 laurel park, NC
dtype: object