Python 从字符串列中提取整数_Python_Pandas_For Loop

Python 从字符串列中提取整数

python pandas for-loop

Python 从字符串列中提取整数,python,pandas,for-loop,Python,Pandas,For Loop,我有两个数据帧：longdf和shortdf。Longdf是“主”列表，我需要基本上匹配shortdf到Longdf的值，匹配的值替换其他列中的值。longdf和shortdf都需要大量的数据清理目标是达到df‘目标’。我试图使用一个for循环，我想1）提取df单元格中的所有数字，2）从单元格中去除空白/单元格空间。第一：为什么这个for循环不起作用？第二：有更好的方法吗 import pandas as pd a = pd.Series(['EY', 'BAIN', 'KPMG', 'EY

我有两个数据帧：longdf和shortdf。Longdf是“主”列表，我需要基本上匹配shortdf到Longdf的值，匹配的值替换其他列中的值。longdf和shortdf都需要大量的数据清理

目标是达到df‘目标’。我试图使用一个for循环，我想1）提取df单元格中的所有数字，2）从单元格中去除空白/单元格空间。第一：为什么这个for循环不起作用？第二：有更好的方法吗

import pandas as pd

a = pd.Series(['EY', 'BAIN', 'KPMG', 'EY'])
b = pd.Series(['   10wow this is terrible data8 ', '10/ USED TO BE ANOTHER NUMBER/ 2', ' OMG 106 OMG ', '    10?7'])
y = pd.Series(['BAIN', 'KPMG', 'EY', 'EY' ])
z = pd.Series([108, 102, 106, 107 ])

goal = pd.DataFrame
shortdf = pd.DataFrame({'consultant': a, 'invoice_number':b})
longdf = shortdf.copy(deep=True)
goal = pd.DataFrame({'consultant': y, 'invoice_number':z})

shortinvoice = shortdf['invoice_number']
longinvoice = longdf['invoice_number']

frames = [shortinvoice, longinvoice]
new_list=[]

for eachitemer in frames:
    eachitemer.str.extract('(\d+)').astype(float) #extracing all numbers in the df cell
    eachitemer.str.strip() #strip the blank/whitespaces in between the numbers
    new_list.append(eachitemer)

new_short_df = new_list[0]
new_long_df = new_list[1]

如果我理解正确，您希望获取一系列包含整数的字符串，并删除所有非整数字符。这不需要for循环。相反，您可以用一个简单的正则表达式来解决它

b.replace('\D+', '', regex=True).astype(int)

正则表达式将所有非数字字符（由

\D

表示）替换为空字符串，删除任何非数字字符

.astype（int）

将序列转换为整数类型。您可以按正常方式将结果合并到最终数据帧中：

result = pd.DataFrame({
    'consultant': a, 
    'invoice_number': b.replace('\D+', '', regex=True).astype(int)
})

我很困惑，为什么您的

shortdf

和

longdf

完全相同？好问题：我开始问一个较长的问题，但后来把它打断了，并且从未更改变量名称。