Python 迭代并将数据帧NAN写回MySQL_Python_Mysql_Iteration_Pandas

Python 迭代并将数据帧NAN写回MySQL

python mysql pandas

Python 迭代并将数据帧NAN写回MySQL,python,mysql,iteration,pandas,Python,Mysql,Iteration,Pandas,我正试图将回归结果写回MySQL，但在迭代拟合值和让NaN作为空值写入时遇到问题。最初，我是这样进行迭代的： for i in dataframe: cur = cnx.cursor() query = ("UPDATE Regression_Data.Input SET FITTEDVALUES="+(dataframe['yhat'].__str__())+" where timecount="+(datafrane['timecount'].__str__())+";")

我正试图将回归结果写回MySQL，但在迭代拟合值和让NaN作为空值写入时遇到问题。最初，我是这样进行迭代的：

for i in dataframe:
    cur = cnx.cursor()
    query = ("UPDATE Regression_Data.Input SET FITTEDVALUES="+(dataframe['yhat'].__str__())+" where timecount="+(datafrane['timecount'].__str__())+";")
    cur.execute(query)
    cnx.commit()
    cur.close()

…他们对我说：

 "mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'NaN'

因此，我一直在尝试通过仅在yhat不等于NaN时请求Python提交来过滤掉NaN：

for i in dataframe:
    if cleandf['yhat']>(-1000):
        cur = cnx.cursor()
        query = ("UPDATE Regression_Data.Input SET FITTEDVALUES="+(dataframe['yhat'].__str__())+" where timecount="+(datafrane['timecount'].__str__())+";")
        cur.execute(query)
        cnx.commit()
       cur.close()

但我明白了：

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

所以，我试着用上面的语法来解决这个问题：

if cleandf['yhat'][i]>(-1000):

但你要明白：

ValueError: Can only tuple-index with a MultiIndex

然后尝试将itterows（）添加到这两个项目中，如中所示：

 for i in dataframe.iterrows():
        if cleandf['yhat'][i]>(-1000):

但问题和上面一样

我不确定我在这里做错了什么，但假设这是数据帧中的迭代。但是，即使我得到了正确的迭代，我也希望在出现NaN的地方将null写入SQL

那么，你认为我应该怎么做

我没有一个完整的答案，但也许我有一些建议可能会有所帮助。我相信您认为您的

dataframe

是一个类似于SQL记录集的对象

for i in dataframe

这将迭代dataframe中的列名字符串

将采用列名，而不是行

dataframe['yhat']

这将返回整列（

pandas.Series

，它是一个

numpy.ndarray

），而不是一个值。因此：

dataframe['yhat'].__str__()

将给出整个列的字符串表示形式，供人们阅读。对于您的查询，它肯定不是一个可以转换为字符串的值

if cleandf['yhat']>(-1000)

这会产生一个错误，因为同样，

cleandf['yhat']

是一个完整的值数组，而不仅仅是一个值。将其视为一整列，而不是一行中的值

if cleandf['yhat'][i]>(-1000):

这越来越接近了，但是您确实希望

在这里是一个整数，而不是另一个列名

for i in dataframe.iterrows():
    if cleandf['yhat'][i]>(-1000):

使用

iterrows

似乎是适合您的。但是，

接受每行的值，而不是可以索引到列中的整数（

cleandf['yhat']

是一个完整的列）

另外，请注意，熊猫有更好的方法来检查丢失的值，而不是依赖一个巨大的负数。试着这样做：

non_missing_index = pandas.isnull(dataframe['yhat'])
cleandf = dataframe[non_missing_index]
for row in cleandf.iterrows():
    row_index, row_values = row
    query = ("UPDATE Regression_Data.Input SET FITTEDVALUES="+(row_values['yhat'].__str__())+" where timecount="+(row_values['timecount'].__str__())+";")
    execute_my_query(query)

我期望，您可以比我更好地实现

execute\u my\u query

。然而，这个解决方案并不是您想要的。您确实希望遍历所有行并执行两种类型的插入。试试这个：

for row in dataframe.iterrows():
    row_index, row_values = row
    if pandas.isnull(row_values['yhat']):
        pass # populate the 'null' insert query here
    else:
        query = ("UPDATE Regression_Data.Input SET FITTEDVALUES="+(row_values['yhat'].__str__())+" where timecount="+(row_values['timecount'].__str__())+";")
    execute_my_query(query)

希望有帮助。

您是否尝试过使用

write\u-frame

和

read\u-frame

，比如在？Awesome中。非常有用。如果我还有问题，我会告诉你的。