Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从DataFrame中选择包含以整数开头的字符串的行_Python_String_Pandas_Dataframe_Search - Fatal编程技术网

Python 从DataFrame中选择包含以整数开头的字符串的行

Python 从DataFrame中选择包含以整数开头的字符串的行,python,string,pandas,dataframe,search,Python,String,Pandas,Dataframe,Search,我已经创建了一个包含一个字符串列的数据框架。我想将它的一些行复制到第二个数据帧中:只是第一个空格前的字符是大于或等于300的整数,第一个空格后的字符是“百老汇”的行。在以下示例中,仅应复制第一行 我更愿意解决这个问题,而不是简单地用纯Python编写布尔表达式。让我们假设我想说服某人使用熊猫而不是不使用熊猫的Python的好处。多谢各位 d = { "address": [ "300 Broadway", #Ok. "300 Wall Stre

我已经创建了一个包含一个字符串列的数据框架。我想将它的一些行复制到第二个数据帧中:只是第一个空格前的字符是大于或等于300的整数,第一个空格后的字符是“百老汇”的行。在以下示例中,仅应复制第一行

我更愿意解决这个问题,而不是简单地用纯Python编写布尔表达式。让我们假设我想说服某人使用熊猫而不是不使用熊猫的Python的好处。多谢各位


d = {
    "address": [
        "300 Broadway",      #Ok.
        "300 Wall Street",   #Sorry, not "Broadway".
        "100-10 Broadway",   #Sorry, "100-10" is not an integer.
        "299 Broadway",      #Sorry, 299 is less than 300.
        "Broadway"           #Sorry, no space at all.
    ]
}

df = pd.DataFrame(d)
df2 = df[what goes here?]   #Broadway addresses greater than or equal to 300
print(df2)

我认为最好先清理一下数据,例如:

# prepare data
df[['number', 'street']] = df.address.str.split('\s+', n=1, expand=True)
df['number'] = pd.to_numeric(df.number, errors='coerce')
第一行将地址拆分为数字和街道,第二行将数字转换为实际整数,请注意,非整数的值将转换为
NaN
。然后你可以做:

# create mask to filter
mask = df.number.ge(300) & df.street.str.contains("Broadway")
print(df[mask])
基本上创建一个布尔掩码,其中数字大于或等于300,街道为百老汇。综上所述,你有:

# prepare data
df[['number', 'street']] = df.address.str.split('\s+', n=1, expand=True)
df['number'] = pd.to_numeric(df.number, errors='coerce')

# create mask to filter
mask = df.number.eq(300) & df.street.str.contains("Broadway")
print(df[mask])
输出

        address  number    street
0  300 Broadway   300.0  Broadway
        address
0  300 Broadway

请注意,此解决方案假定您的数据具有以下模式:
Number Street

我认为最好先清理一下数据,例如:

# prepare data
df[['number', 'street']] = df.address.str.split('\s+', n=1, expand=True)
df['number'] = pd.to_numeric(df.number, errors='coerce')
第一行将地址拆分为数字和街道,第二行将数字转换为实际整数,请注意,非整数的值将转换为
NaN
。然后你可以做:

# create mask to filter
mask = df.number.ge(300) & df.street.str.contains("Broadway")
print(df[mask])
基本上创建一个布尔掩码,其中数字大于或等于300,街道为百老汇。综上所述,你有:

# prepare data
df[['number', 'street']] = df.address.str.split('\s+', n=1, expand=True)
df['number'] = pd.to_numeric(df.number, errors='coerce')

# create mask to filter
mask = df.number.eq(300) & df.street.str.contains("Broadway")
print(df[mask])
输出

        address  number    street
0  300 Broadway   300.0  Broadway
        address
0  300 Broadway

请注意,此解决方案假定您的数据具有以下模式:
Number Street

您可以使用
str.contains
str.extract
ge

# rows which contain broadway
m1 = df['address'].str.contains('(?i)broadway')
# extract the numbers from the string and check if they are greater of equal to 300
m2 = df['address'].str.extract('(\d+)')[0].astype(float).ge(300)

# get all the rows which have True for both conditions
df[m1&m2]
输出

        address  number    street
0  300 Broadway   300.0  Broadway
        address
0  300 Broadway

您可以使用
str.contains
str.extract
ge

# rows which contain broadway
m1 = df['address'].str.contains('(?i)broadway')
# extract the numbers from the string and check if they are greater of equal to 300
m2 = df['address'].str.extract('(\d+)')[0].astype(float).ge(300)

# get all the rows which have True for both conditions
df[m1&m2]
输出

        address  number    street
0  300 Broadway   300.0  Broadway
        address
0  300 Broadway