Python 在列中添加值
我有一个数据帧df,我想在cast和genres列中添加“/” 因此每个单元格包含3'/'Python 在列中添加值,python,add,Python,Add,我有一个数据帧df,我想在cast和genres列中添加“/” 因此每个单元格包含3'/' id movie cast genres runtime 1 Furious a/b/c/d a/b 23 2 Minions a/b/c a/b/c 55 3 Mission a/b a 67 4 Kingsman a/b/c/d a/b/c/d 23 5 Sta
id movie cast genres runtime
1 Furious a/b/c/d a/b 23
2 Minions a/b/c a/b/c 55
3 Mission a/b a 67
4 Kingsman a/b/c/d a/b/c/d 23
5 Star Wars a a/b/c 45
所以,它的输出如下所示
id movie cast genres runtime
1 Furious a/b/c/d a/b// 23
2 Minions a/b/c/ a/b/c/ 55
3 Mission a/b// a/// 67
4 Kingsman a/b/c/d a/b/c/d 23
5 Star Wars a/// a/b/c/ 45
对每列中的每个元素使用此函数来更新它们
def update_string(string):
total_occ = 3 #total no. of occurrences of character '/'
for element in string: # for each element,
if element == "/": # if there is '/', decrease 'total_occ'
total_occ=total_occ-1;
for i in range(total_occ): # add remaining no. of '/' at the end
string+="/"
return string
x = "a/b"
print(update_string(x))
输出为:
a/b//
您可以按
/
进行拆分,用空字符串填充结果列表,直到其大小为4,然后再次使用/
进行连接
用于更改整个列中的值
试试这个:
将熊猫作为pd导入
从io导入StringIO
df=pd.read\u csv(StringIO(““”id电影演员类型运行时
1愤怒的a/b/c/d a/b 23
2名随从a/b/c a/b/c 55
3特派团a/b a 67
4金斯曼a/b/c/d a/b/c/d 23
5《星球大战》a/b/c 45“”,sep=r“\s\s+”)
def pad_单元(值):
部分=值。拆分(“/”)
部件+=[“”]*(4个部件)
返回“/”连接(部件)
df[“铸造”]=df[“铸造”].应用(焊盘单元)
df[“类型”]=df[“类型”]。应用(填充单元格)
打印(df)
以下是一种定义自定义函数的方法:
def add_values(df, *cols):
for col in cols:
# amount of "/" to add at each row
c = df[col].str.count('/').rsub(3)
# translate the above to as many "/" as required
ap = [i * '/' for i in c.tolist()]
# Add the above to the corresponding column
df[col] = [i + j for i,j in zip(df[col], ap)]
return df
add_values(df, 'cast', 'genres')
id movie cast genres runtime
0 1 Furious a/b/c/d a/b// 23
1 2 Minions a/b/c/ a/b/c/ 55
2 3 Mission a/b// a/// 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 StarWars a/// a/b/c/ 45
In [217]: df
Out[217]:
id movie cast genres runtime
0 1 Furious a/b/c/d a/b 23
1 2 Minions a/b/c a/b/c 55
2 3 Mission a/b a 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 Star Wars a a/b/c 45
In [218]: from itertools import chain, zip_longest
In [219]: def ensure_slashes(x):
...: return ''.join(chain.from_iterable(zip_longest(x.split('/'), '///', fillvalue='')))
...:
...:
In [220]: df[['cast','genres']] = df[['cast','genres']].applymap(ensure_slashes)
In [221]: df
Out[221]:
id movie cast genres runtime
0 1 Furious a/b/c/d a/b// 23
1 2 Minions a/b/c/ a/b/c/ 55
2 3 Mission a/b// a/// 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 Star Wars a/// a/b/c/ 45
给你:
=^^=
import pandas as pd
from io import StringIO
# create raw data
raw_data = StringIO("""
id movie cast genres runtime
1 Furious a/b/c/d a/b 23
2 Minions a/b/c a/b/c 55
3 Mission a/b a 67
4 Kingsman a/b/c/d a/b/c/d 23
5 Star_Wars a a/b/c 45
""")
# load data into data frame
df = pd.read_csv(raw_data, sep=' ')
# iterate over rows and add character
for index, row in df.iterrows():
count_character_cast = row['cast'].count('/')
if count_character_cast < 3:
df.set_value(index, 'cast', row['cast']+'/'*(3-int(count_character_cast)))
count_character_genres = row['genres'].count('/')
if count_character_genres < 3:
df.set_value(index, 'genres', row['genres'] + '/' * (3 - int(count_character_genres)))
具有以下特点和功能的简短解决方案:
def add_values(df, *cols):
for col in cols:
# amount of "/" to add at each row
c = df[col].str.count('/').rsub(3)
# translate the above to as many "/" as required
ap = [i * '/' for i in c.tolist()]
# Add the above to the corresponding column
df[col] = [i + j for i,j in zip(df[col], ap)]
return df
add_values(df, 'cast', 'genres')
id movie cast genres runtime
0 1 Furious a/b/c/d a/b// 23
1 2 Minions a/b/c/ a/b/c/ 55
2 3 Mission a/b// a/// 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 StarWars a/// a/b/c/ 45
In [217]: df
Out[217]:
id movie cast genres runtime
0 1 Furious a/b/c/d a/b 23
1 2 Minions a/b/c a/b/c 55
2 3 Mission a/b a 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 Star Wars a a/b/c 45
In [218]: from itertools import chain, zip_longest
In [219]: def ensure_slashes(x):
...: return ''.join(chain.from_iterable(zip_longest(x.split('/'), '///', fillvalue='')))
...:
...:
In [220]: df[['cast','genres']] = df[['cast','genres']].applymap(ensure_slashes)
In [221]: df
Out[221]:
id movie cast genres runtime
0 1 Furious a/b/c/d a/b// 23
1 2 Minions a/b/c/ a/b/c/ 55
2 3 Mission a/b// a/// 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 Star Wars a/// a/b/c/ 45
应用的关键功能是:
def ensure_slashes(x):
return ''.join(chain.from_iterable(zip_longest(x.split('/'), '///', fillvalue='')))
好的,我们的想法是创建一个函数来完成必要的工作,并将其应用于想要的列: 该函数将用空字符串替换当前的斜杠,并在单元格中创建字符串的zip和正好包含3个元素的常量斜杠列表 其结果是该拉链的各元素的浓缩,以及它的工作原理:) 输出:
id movie cast genres runtime
1 furious a/b/c/ a/b// 23
2 Mininons a/b/c/ a/b/c/ 55
3 mission a/b// a/// 67
4 Kingsman a/b/c/ a/b/c/ 23
5 star Wars a/// a/b/c/ 45
分享您编写的代码,并解释代码的错误。这显示了你的努力。这看起来像作业/家庭作业问题。你应该先试试自己,然后问自己什么时候卡住了。谢谢,这是我想要的完美解决方案。