Python 读取csv后从单元格进行数据帧切片_Python_String_Python 3.x_Pandas_Series

Python 读取csv后从单元格进行数据帧切片

python string python-3.x pandas

Python 读取csv后从单元格进行数据帧切片,python,string,python-3.x,pandas,series,Python,String,Python 3.x,Pandas,Series,我正在使用CSV和数据帧从Twitter分析中读取数据我想从某个单元格中提取url 输出如下：此过程如下所示 tweet number tweet id tweet link tweet text 1 1.0086341313026E+018 "tweet link goes here" tweet text goes here https://example.com" 我如何分割此tweet文本以获取其url

我正在使用CSV和数据帧从Twitter分析中读取数据

我想从某个单元格中提取url

输出如下：此过程如下所示

tweet number tweet id               tweet link              tweet text
1            1.0086341313026E+018   "tweet link goes here"  tweet text goes here https://example.com"

我如何分割此tweet文本以获取其url？我无法使用[-1:-12]对其进行切片，因为有许多tweet具有不同的字符数

我相信你想要：

print (df['tweet text'].str[-12:-1])
0    example.com
Name: tweet text, dtype: object

更通用的解决方案是使用所有链接的列表，如有必要，使用str[0]索引选择“第一”：

我相信你想要：

print (df['tweet text'].str[-12:-1])
0    example.com
Name: tweet text, dtype: object

更通用的解决方案是使用所有链接的列表，如有必要，使用str[0]索引选择“第一”：

如果域名长度是可变的，而不是总是11个字符长，那么这里有一个替代方案：

In [2]: df['tweet text'].str.split('//').str[-1]

Out[2]:
1    example.com
Name: tweet text, dtype: object

如果域名长度是可变的，而不是总是11个字符长，那么这里有一个替代方案：

In [2]: df['tweet text'].str.split('//').str[-1]

Out[2]:
1    example.com
Name: tweet text, dtype: object

以下是一种使用字符串列表并查找有效URL的方法：

s = pd.Series(['tweet text goes here https://example.com',
               'some http://other.com example',
               'www.thirdexample.com is here'])

test_strings = ['http', 'www']

def url_finder(x):
    return next(i for i in x.split() if any(t in i for t in test_strings))

res = s.apply(url_finder)

print(res)

0     https://example.com
1        http://other.com
2    www.thirdexample.com
dtype: object

以下是一种使用字符串列表并查找有效URL的方法：

s = pd.Series(['tweet text goes here https://example.com',
               'some http://other.com example',
               'www.thirdexample.com is here'])

test_strings = ['http', 'www']

def url_finder(x):
    return next(i for i in x.split() if any(t in i for t in test_strings))

res = s.apply(url_finder)

print(res)

0     https://example.com
1        http://other.com
2    www.thirdexample.com
dtype: object

这正是我需要的。谢谢，这正是我需要的。谢谢。更好的是df['tweet text'].str.split'/'.str[-1]谢谢，认为肯定有比apply更好的方法，但找不到，将进行编辑。更好的是df['tweet text'].str.split'/'.str[-1]谢谢，认为肯定有比apply更好的方法，但找不到，将进行编辑。