Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将函数返回的键值作为新列追加到Dataframe_Python_Pandas - Fatal编程技术网

Python 将函数返回的键值作为新列追加到Dataframe

Python 将函数返回的键值作为新列追加到Dataframe,python,pandas,Python,Pandas,我有一个带有URL列表的数据框架,我想为其提取几个值。然后,应将返回的键/值添加到原始数据帧中,并将键作为新列和相应的值 我以为这会奇迹般地发生在你身上 result\u type='expand',它显然没有。当我尝试 df5["data"] = df5.apply(lambda x: request_function(x['url']),axis=1, result_type='expand') 最后,我将所有结果都放在一个数据列中: [{'title': ['Python Noteboo

我有一个带有URL列表的数据框架,我想为其提取几个值。然后,应将返回的键/值添加到原始数据帧中,并将键作为新列和相应的值

我以为这会奇迹般地发生在你身上
result\u type='expand'
,它显然没有。当我尝试

df5["data"] = df5.apply(lambda x: request_function(x['url']),axis=1, result_type='expand')
最后,我将所有结果都放在一个数据列中:

[{'title': ['Python Notebooks: Connect to Google Search Console API and Extract Data - Adapt'], 'description': []}]
我的目标是得到一个包含以下3列的数据帧:

| URL | Title |说明|
这是我的密码:

import requests
from requests_html import HTMLSession
import pandas as pd
from urllib import parse

ex_dic = {'url': ['https://www.searchenginejournal.com/reorganizing-xml-sitemaps-python/295539/', 'https://searchengineland.com/check-urls-indexed-google-using-python-259773', 'https://adaptpartners.com/technical-seo/python-notebooks-connect-to-google-search-console-api-and-extract-data/']}

df5 = pd.DataFrame(ex_dic)
df5

def request_function(url):
    try:
        found_results = []
        r = session.get(url)
        title = r.html.xpath('//title/text()')
        description = r.html.xpath("//meta[@name='description']/@content")
        found_results.append({ 'title': title, 'description': description})
        return found_results


    except requests.RequestException:
        print("Connectivity error")      
    except (KeyError):
        print("anoter error")

df5.apply(lambda x: request_function(x['url']),axis=1, result_type='expand')

ex_dic
应该是dict的列表,以便您可以更新应用的属性

import requests
from requests_html import HTMLSession
import pandas as pd
from urllib import parse

ex_dic = {'url': ['https://www.searchenginejournal.com/reorganizing-xml-sitemaps-python/295539/', 'https://searchengineland.com/check-urls-indexed-google-using-python-259773', 'https://adaptpartners.com/technical-seo/python-notebooks-connect-to-google-search-console-api-and-extract-data/']}

ex_dic['url'] = [{'url': item} for item in ex_dic['url']]

df5 = pd.DataFrame(ex_dic)
session = HTMLSession()

def request_function(url):
    try:
        print(url)
        r = session.get(url['url'])
        title = r.html.xpath('//title/text()')
        description = r.html.xpath("//meta[@name='description']/@content")
        url.update({ 'title': title, 'description': description})
        return url


    except requests.RequestException:
        print("Connectivity error")      
    except (KeyError):
        print("anoter error")

df6 = df5.apply(lambda x: request_function(x['url']),axis=1, result_type='expand')
print df6

ex_dic
应该是dict的列表,以便您可以更新应用的属性

import requests
from requests_html import HTMLSession
import pandas as pd
from urllib import parse

ex_dic = {'url': ['https://www.searchenginejournal.com/reorganizing-xml-sitemaps-python/295539/', 'https://searchengineland.com/check-urls-indexed-google-using-python-259773', 'https://adaptpartners.com/technical-seo/python-notebooks-connect-to-google-search-console-api-and-extract-data/']}

ex_dic['url'] = [{'url': item} for item in ex_dic['url']]

df5 = pd.DataFrame(ex_dic)
session = HTMLSession()

def request_function(url):
    try:
        print(url)
        r = session.get(url['url'])
        title = r.html.xpath('//title/text()')
        description = r.html.xpath("//meta[@name='description']/@content")
        url.update({ 'title': title, 'description': description})
        return url


    except requests.RequestException:
        print("Connectivity error")      
    except (KeyError):
        print("anoter error")

df6 = df5.apply(lambda x: request_function(x['url']),axis=1, result_type='expand')
print df6

如果您的函数只返回一个字典,而不是一个字典列表,那么它实际上是按照您的预期工作的。此外,键的内部只提供一个字符串,而不是一个列表。然后它就如你所期望的那样工作了。请参见我的示例代码:

import requests
import pandas as pd
from urllib import parse

ex_dic = {'url': ['https://www.searchenginejournal.com/reorganizing-xml-sitemaps-python/295539/', 'https://searchengineland.com/check-urls-indexed-google-using-python-259773', 'https://adaptpartners.com/technical-seo/python-notebooks-connect-to-google-search-console-api-and-extract-data/']}

df5 = pd.DataFrame(ex_dic)
#rint(df5)

def request_function(url):
    return {'title': 'Python Notebooks: Connect to Google Search Console API and Extract Data - Adapt', 
            'description': ''}


df6 = df5.apply(lambda x: request_function(x['url']), axis=1, result_type='expand')
df7 = pd.concat([df5,df6],1)


df7
给你这个:

您也可以调整lambda函数:

df6 = df5.apply(lambda x: request_function(x['url'])[0], axis=1, result_type='expand')

但是您仍然需要确保键值是字符串,而不是列表。

如果您的函数只返回一个字典,而不是一个字典列表,那么它实际上可以按照您的预期工作。此外,键的内部只提供一个字符串,而不是一个列表。然后它就如你所期望的那样工作了。请参见我的示例代码:

import requests
import pandas as pd
from urllib import parse

ex_dic = {'url': ['https://www.searchenginejournal.com/reorganizing-xml-sitemaps-python/295539/', 'https://searchengineland.com/check-urls-indexed-google-using-python-259773', 'https://adaptpartners.com/technical-seo/python-notebooks-connect-to-google-search-console-api-and-extract-data/']}

df5 = pd.DataFrame(ex_dic)
#rint(df5)

def request_function(url):
    return {'title': 'Python Notebooks: Connect to Google Search Console API and Extract Data - Adapt', 
            'description': ''}


df6 = df5.apply(lambda x: request_function(x['url']), axis=1, result_type='expand')
df7 = pd.concat([df5,df6],1)


df7
给你这个:

您也可以调整lambda函数:

df6 = df5.apply(lambda x: request_function(x['url'])[0], axis=1, result_type='expand')
但是您仍然需要确保键值是字符串,而不是列表