使用PythonDocx迭代dataframe中的值以打印到Word_Python_Pandas_Loops_Docx

使用PythonDocx迭代dataframe中的值以打印到Word

python pandas loops

使用PythonDocx迭代dataframe中的值以打印到Word,python,pandas,loops,docx,Python,Pandas,Loops,Docx,我正在使用一个熊猫数据框架，其中包含标题、来源和来自谷歌新闻API的各种新闻文章的链接。然后，我将数据分类为我用来查找文章的各种关键字。我试图遍历“关键字”列以整齐地打印数据，然后使用PythonDocx将迭代导出到Word 为了提取GoogleNews数据，我使用了一个for循环，在列表中设置了各种关键字。它看起来像： for i in list: googlenews=GoogleNews() googlenews.get_news(i) googlenews.set

我正在使用一个熊猫数据框架，其中包含标题、来源和来自谷歌新闻API的各种新闻文章的链接。然后，我将数据分类为我用来查找文章的各种关键字。我试图遍历“关键字”列以整齐地打印数据，然后使用PythonDocx将迭代导出到Word

为了提取GoogleNews数据，我使用了一个for循环，在列表中设置了各种关键字。它看起来像：

for i in list:
    googlenews=GoogleNews()
    googlenews.get_news(i)
    googlenews.set_lang('en')
    googlenews.set_period('1d')
    result=googlenews.result()
    df_ivar = pd.DataFrame(result)
    df_ivar = df_ivar[df_ivar['date'].notna()]
    df_ivar = df_ivar[df_ivar["date"].str.contains('hours ago')] # to only pull articles from within the last 24 hours
    df_ivar = df_ivar[['site', 'title', 'desc', 'link']]
    df_ivar['keyword'] = i
    df = df_ivar.append(df, ignore_index=True)

for i in df.index:
    document.add_heading(df['keyword'][i], level=1)
    p = document.add_paragraph().add_run(dfs['title'][i]).underline = True
    document.add_paragraph(df['desc'][i], style='List Bullet')
    document.add_paragraph(df['link'][i], style='List Bullet')
    document.add_paragraph('Source: ' + df['site'][i], style='List Bullet')

到目前为止，我已经找到了一种正确打印数据的方法，但我无法找到一种只显示每个关键字一次，然后在相应关键字下打印所有文章标题、描述和链接的方法

我的数据当前如下所示：

article 1    link 1    description 1    keyword 1
article 2    link 2    description 2    keyword 1
article 3    link 3    description 3    keyword 2
article 4    link 4    description 4    keyword 3

导出时，我希望python docx文档能够分类显示数据，例如：

keyword 1
article 1
article 2

keyword 2
article 3

keyword 3
article 4

我的PythonDocx脚本运行正常，但每次打印文档时，我都会在每个文章名称之前显示关键字，而我只希望关键字显示一次，并在其下方发布任何相关文章。目前，我的for循环如下所示：

for i in list:
    googlenews=GoogleNews()
    googlenews.get_news(i)
    googlenews.set_lang('en')
    googlenews.set_period('1d')
    result=googlenews.result()
    df_ivar = pd.DataFrame(result)
    df_ivar = df_ivar[df_ivar['date'].notna()]
    df_ivar = df_ivar[df_ivar["date"].str.contains('hours ago')] # to only pull articles from within the last 24 hours
    df_ivar = df_ivar[['site', 'title', 'desc', 'link']]
    df_ivar['keyword'] = i
    df = df_ivar.append(df, ignore_index=True)

for i in df.index:
    document.add_heading(df['keyword'][i], level=1)
    p = document.add_paragraph().add_run(dfs['title'][i]).underline = True
    document.add_paragraph(df['desc'][i], style='List Bullet')
    document.add_paragraph(df['link'][i], style='List Bullet')
    document.add_paragraph('Source: ' + df['site'][i], style='List Bullet')

任何帮助或指导都将不胜感激！提前谢谢你

您可以使用

关键字

作为参数。此函数的返回值将是组的名称（在这种特殊情况下为

关键字

），以及此关键字的

数据帧

。然后，您可以将

名称

用于

添加标题

函数，并使用已构建的剩余逻辑，但要迭代组变量（

for i in g.index

）

df.groupby（'keyword'）中的名称g的

：
文件.添加标题（名称，级别=1）
对于g.index中的i：
p=document.add_段落（）.add_运行（df['title'][i]）。underline=True
文件.添加段落（df['desc'][i]，style='List Bullet'）
文件.添加段落（df['link'][i]，style='List Bullet'）
文件.添加段落（'Source:'+df['site'][i]，style='List Bullet'））

这太完美了，正是我想要的！非常感谢。很高兴我能帮忙。