Python 3.x 使用Python进行循环
下面是一个数据集 目标是在至少一行上选择至少有1个作曲家和1个发行者的歌曲id。例如,SongID4有两行,其中有两个不同的作曲家,但没有发行商,而SongID1没有作曲家。我们的目标是使用Python(pandas)拒绝此类excel表格。有什么建议吗Python 3.x 使用Python进行循环,python-3.x,pandas,Python 3.x,Pandas,下面是一个数据集 目标是在至少一行上选择至少有1个作曲家和1个发行者的歌曲id。例如,SongID4有两行,其中有两个不同的作曲家,但没有发行商,而SongID1没有作曲家。我们的目标是使用Python(pandas)拒绝此类excel表格。有什么建议吗 import pandas as pd import numpy as np import smtplib from email.mime.image import MIMEImage from email.mime.multipart im
import pandas as pd
import numpy as np
import smtplib
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
df_header = pd.read_csv('New York Yankees Twins at Yankees-FNG-042318.csv',header=None,skiprows=1)
cuesheetprepareremail = df_header.iloc[0,7]
print(cuesheetprepareremail)
df = pd.read_csv('New York Yankees Twins at Yankees-FNG-042318.csv',
names=['CUE','SONG TITLE','USAGE','RUNNING TIME','COMPOSER','COMPOSER PRO','COMPOSER % SHARE','PUBLISHER',' PUBLISHER PRO','PUBLISHER % SHARE' ,'TRACK ID','LIBRARY','ARTIST','START TIME'
],skiprows=7)
#select all rows with same cue number
columns = ['CUE','COMPOSER','PUBLISHER']
df1 = pd.DataFrame(df,columns=columns)
df1 = df1.replace('', np.NaN)
gp = df1.groupby('CUE').count()
fileToSend = 'New York Yankees Twins at Yankees-FNG-042318.csv'
emailfrom = ''
emailto = 'xyz@abc.com'
username= ''
password = ''
msg = MIMEMultipart()
msg['Subject'] = 'Enco error testing'
msg['From'] = emailfrom
msg['To'] = emailto
msg.preamble = 'Enco error testing'
if gp[(gp['COMPOSER'] == 0) | (gp['PUBLISHER'] == 0)] :
# Send the email via our own SMTP server.
server = smtplib.SMTP('localhost')
server.starttls()
server.login(username,password)
server.sendmail(emailfrom, emailto, msg.as_string())
server.quit()
给定您的
DataFrame
df
Song_Id SONG TITLE *USAGE RUNNING COMPOSER(s) COMPOSE PUBLISHER(s)
0 1 Testing Moment BGI ASCAP audio
1 2 Rented Dreams-JP BGI Andrew ABRAMUS Nova
2 2 Paul UBC
3 2 Molly UBC
4 3 Gridiron Rock BGI Brian ASCAP Client
5 3 Daniel ASCAP
6 4 Rock Run BGI Sharron ASCAP
7 4 Kyle Towns ASCAP
您应该用np.NaN
填充空字符串,然后可以使用groupby
+count
,并将逻辑应用于分组对象
import numpy as np
df = df.replace('', np.NaN)
gp = df.groupby('Song_Id').count()
gp[(gp['COMPOSER(s)'] > 0) & (gp['PUBLISHER(s)'] > 0)]
# *USAGE COMPOSE COMPOSER(s) PUBLISHER(s) RUNNING SONG TITLE
#Song_Id
#2 1 3 3 1 0 1
#3 1 2 2 1 0 1
您应该设置问题的格式,将数据集作为代码包含在内。谢谢您的帮助。我试图用你的建议编辑我的代码,但它抛出了一个错误。任何建议都会有帮助。