Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Python-数据帧url解析问题_Python 3.x_Urllib - Fatal编程技术网

Python 3.x Python-数据帧url解析问题

Python 3.x Python-数据帧url解析问题,python-3.x,urllib,Python 3.x,Urllib,我正在尝试将域名从url从一列转到另一列。它在一个类似字符串的对象上工作,当我应用到dataframe时,它不工作。如何将其应用于数据帧 尝试: from urllib.parse import urlparse import pandas as pd id1 = [1,2,3] ls = ['https://google.com/tensoflow','https://math.com/some/website',np.NaN] df = pd.DataFrame({'id':id1,'url

我正在尝试将域名从url从一列转到另一列。它在一个类似字符串的对象上工作,当我应用到dataframe时,它不工作。如何将其应用于数据帧

尝试:

from urllib.parse import urlparse
import pandas as pd
id1 = [1,2,3]
ls = ['https://google.com/tensoflow','https://math.com/some/website',np.NaN]
df = pd.DataFrame({'id':id1,'url':ls})
df
# urlparse(df['url']) # ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
# df['url'].map(urlparse) # AttributeError: 'float' object has no attribute 'decode'
处理字符串:

string = 'https://google.com/tensoflow'
parsed_uri = urlparse(string)
result = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
result
正在查找专栏:

col3
https://google.com/
https://math.com/
nan

错误

您可以尝试这样的方法

这里我使用了pandas.Series.apply()来解决这个问题

»初始化和导入

>>> from urllib.parse import urlparse
>>> import pandas as pd
>>> id1 = [1,2,3]
>>> import numpy as np
>>> ls = ['https://google.com/tensoflow','https://math.com/some/website',np.NaN]
>>> ls
['https://google.com/tensoflow', 'https://math.com/some/website', nan]
>>> 
»检查新创建的数据帧

>>> df = pd.DataFrame({'id':id1,'url':ls})
>>> df
   id                            url
0   1   https://google.com/tensoflow
1   2  https://math.com/some/website
2   3                            NaN
>>> 
>>> df["url"]
0     https://google.com/tensoflow
1    https://math.com/some/website
2                              NaN
Name: url, dtype: object
>>>
»使用url列上的
pandas.Series.apply(func)
应用函数

»将上述结果存储在变量中(不是强制性的,只是为了简单起见)

»最后

>>> df2 = pd.DataFrame({"col3": s})
>>> df2
                  col3
0  https://google.com/
1    https://math.com/
2                  nan
>>> 
»为确保什么是
s
和什么是
df2
,请检查类型(同样,不是强制性的)

>类型
>>> 
>>> 
>>>类型(df2)
>>> 
参考链接:


请发布您收到的完整错误消息。@ForceBru刚刚添加了错误
>>> s = df["url"].apply(lambda url: "{uri.scheme}://{uri.netloc}/".format(uri=urlparse(url)) if not pd.isna(url) else str(np.nan))
>>> s
0    https://google.com/
1      https://math.com/
2                    nan
Name: url, dtype: object
>>> 
>>> df2 = pd.DataFrame({"col3": s})
>>> df2
                  col3
0  https://google.com/
1    https://math.com/
2                  nan
>>>