Python 如何从字符串中提取2个坐标对?
我是python新手,遇到了一个我无法解决的小问题。我试着从一个字符串中提取出两个坐标对,但由于该字符串没有像逗号一样的av公共分隔符,所以陷入了困境 我的字符串如下所示:Python 如何从字符串中提取2个坐标对?,python,regex,python-3.x,coordinates,Python,Regex,Python 3.x,Coordinates,我是python新手,遇到了一个我无法解决的小问题。我试着从一个字符串中提取出两个坐标对,但由于该字符串没有像逗号一样的av公共分隔符,所以陷入了困境 我的字符串如下所示: rexp_bbox = r"(^.+BBOX=(?P<bbox_xmin_before>\d.*?)[.,%&\s](?P<bbox_xmin_after>.*?)[.,%2C&\s](?P<bbox_ymin_before>\d.*?)[.,%&\s](?P<
rexp_bbox = r"(^.+BBOX=(?P<bbox_xmin_before>\d.*?)[.,%&\s](?P<bbox_xmin_after>.*?)[.,%2C&\s](?P<bbox_ymin_before>\d.*?)[.,%&\s](?P<bbox_ymin_after>.*?)[.,%&\s](?P<bbox_xmax_before>\d.*?)[.,%&\s](?P<bbox_xmax_after>.*?)[.,%&\s](?P<bbox_ymax_before>\d.*?)[.,%&\s](?P<bbox_ymax_after>.*?)[.,%&\s])"
&BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&
&BBOX=156298.828125%2C6576689.453125%2C156328.125%2C6576718.75
&BBOX=156328.1256576806.640625%2C156357.421875%2C6576835.9375
&BBOX=156328.1256576748.046875156357.4218756576777.34375&?BBOX=156328%2C125%2C6576777%2C34375%2C156357%2C421875%2C6576806%2C640625&
&BBOX=156269.53125%2C6576689.453125%2C156298.828125%2C6576718.75&
&BBOX=156298.828125%2C6576718.75%2C156328.125%2C6576748.046875
?BBOX=156386.71875%2C6576806.640625%2C156416.015625%2C6576835.9375&
每个字符串都以“BBOX=“
开头,之后有4个坐标x_最小值
,y_最小值
,x_最大值
,以及y_最大值
。我使用“BBOX=“
在更长的字符串中查找我的坐标所在的位置
x_min
和x_max
应为6位,而y_min
和y_max
应为7位。
它们可以是浮点值或整数值
我想我会把坐标分解成一个部分,然后。之后。但我真的不知道这是不是一条路
现在我的正则表达式如下所示:
rexp_bbox = r"(^.+BBOX=(?P<bbox_xmin_before>\d.*?)[.,%&\s](?P<bbox_xmin_after>.*?)[.,%2C&\s](?P<bbox_ymin_before>\d.*?)[.,%&\s](?P<bbox_ymin_after>.*?)[.,%&\s](?P<bbox_xmax_before>\d.*?)[.,%&\s](?P<bbox_xmax_after>.*?)[.,%&\s](?P<bbox_ymax_before>\d.*?)[.,%&\s](?P<bbox_ymax_after>.*?)[.,%&\s])"
rexp\u bbox=r“(^.+bbox=(?P\d.*)[,%&\s](?P\d.*)[,%2C&\s](?P.*?[,%&\s](?P.*?)[,%&\s](?P.*?[,%&\s](?P\d.?[,%&\s](?P.*?,,,%&\s](?P.*?)
如何构造正则表达式来提取两个坐标对?
a = "&BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&"
ans = a.split('=')[1].split('&')[0].split('%')
拆分在这里可能比复杂的正则表达式更有用,但这也取决于完整的字符串类型。模式“(?:.*BBOX=)(\d{6}(?:\.?[\d]*)(?:%2C,)(\d{7}(?:\.?[\d]*)(?:%2C,)(\d{6}(?:\.?[\d]*)(?:%2C 124;)(\ d}.)(\d}将代码提取到组中,并将其分为4组。第1组=minux
,第2组=minuy
,第3组=max\ux
,第4组=max\uy
以下代码显示了正在运行的模式:
import re
orig_coords = [
'&BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&',
'&BBOX=156298.828125%2C6576689.453125%2C156328.125%2C6576718.75',
'&BBOX=156328.125,6576806.640625%2C156357.421875%2C6576835.9375',
'&BBOX=156328.125,6576748.046875,156357.421875,6576777.34375&',
'?BBOX=156328%2C125%2C6576777%2C34375%2C156357%2C421875%2C6576806%2C640625&',
'&BBOX=156269.53125%2C6576689.453125%2C156298.828125%2C6576718.75&',
'&BBOX=156298.828125%2C6576718.75%2C156328.125%2C6576748.046875',
'?BBOX=156386.71875%2C6576806.640625%2C156416.015625%2C6576835.9375&'
]
bbox_start = "(?:.*BBOX=)"
separator = "(?:%2C|,)"
coord_6 = "(\d{6}(?:\.?[\d]*))"
coord_7 = "(\d{7}(?:\.?[\d]*))"
regex_str = bbox_start + coord_6 + separator + coord_7 + separator + coord_6 + separator + coord_7
reg = re.compile(regex_str)
for c in orig_coords:
r = reg.match(c)
if r:
print('Coordinates for {}'.format(c))
print('x_min: {} x_max: {}'.format(r.group(1), r.group(3)))
print('y_min: {} y_max: {}'.format(r.group(2), r.group(4)))
else:
print('No match for {}'.format(c))
输出:
Coordinates for &BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&
x_min: 151406.25 x_max: 151875
y_min: 6579062.5 y_max: 6579531.25
Coordinates for &BBOX=156298.828125%2C6576689.453125%2C156328.125%2C6576718.75
x_min: 156298.828125 x_max: 156328.125
y_min: 6576689.453125 y_max: 6576718.75
Coordinates for &BBOX=156328.125,6576806.640625%2C156357.421875%2C6576835.9375
x_min: 156328.125 x_max: 156357.421875
y_min: 6576806.640625 y_max: 6576835.9375
Coordinates for &BBOX=156328.125,6576748.046875,156357.421875,6576777.34375&
x_min: 156328.125 x_max: 156357.421875
y_min: 6576748.046875 y_max: 6576777.34375
No match for ?BBOX=156328%2C125%2C6576777%2C34375%2C156357%2C421875%2C6576806%2C640625&
Coordinates for &BBOX=156269.53125%2C6576689.453125%2C156298.828125%2C6576718.75&
x_min: 156269.53125 x_max: 156298.828125
y_min: 6576689.453125 y_max: 6576718.75
Coordinates for &BBOX=156298.828125%2C6576718.75%2C156328.125%2C6576748.046875
x_min: 156298.828125 x_max: 156328.125
y_min: 6576718.75 y_max: 6576748.046875
Coordinates for ?BBOX=156386.71875%2C6576806.640625%2C156416.015625%2C6576835.9375&
x_min: 156386.71875 x_max: 156416.015625
y_min: 6576806.640625 y_max: 6576835.9375
您可以自己运行代码
无法使用此模式的坐标似乎不符合您在问题中发布的规则。类似的内容似乎也可以使用;不过,不太确定这与Jim Wright的答案之间是否存在运行时差异
import re
coords = ["&BBOX=151406.25%2C6579062.5%2C151875%2C6579531.25&",
"&BBOX=156298.828125%2C6576689.453125%2C156328.125%2C6576718.75",
"&BBOX=156328.125,6576806.640625%2C156357.421875%2C6576835.9375",
"&BBOX=156328.125,6576748.046875,156357.421875,6576777.34375& ?BBOX=156328%2C125%2C6576777%2C34375%2C156357%2C421875%2C6576806%2C640625&",
"&BBOX=156269.53125%2C6576689.453125%2C156298.828125%2C6576718.75&",
"&BBOX=156298.828125%2C6576718.75%2C156328.125%2C6576748.046875",
"?BBOX=156386.71875%2C6576806.640625%2C156416.015625%2C6576835.9375&"]
r = re.compile(r"&BBOX=(.+?)(?=&|$)")
x_coords = []
def split_coords(coords_string):
if "%2C" in coords_string:
bbox = coords_string.split('%2C')
else:
bbox = coords_string.split(",")
x_min, x_max = [bbox[0], bbox[2]]
return (x_min, x_max)
# If a match is found using the regex, split the coords and add the x_min and x_max coords to the x_coords array
for i in coords:
match = r.match(i)
if match:
match = match.group(1)
x_coords.append(split_coords(match))
你的评论对我从另一个角度思考很有帮助。我只是没有注意到%2C是坐标之间的常用分隔符。我将我的正则表达式修改为:
rexp\u bbox=r“(^.+bbox=(?p\d.?)(%2C)(?p\d.?)(%2C)(?p\d.?)(%2C)(?p\d.?)(%2C)(?p\d.)
当我在日志文件解析中使用正则表达式时,它就起到了作用,在这里我计算某些边界框的数量(我的问题中的坐标是边界框的角坐标)那是一个大字符串吗?还是每一行都是一个单独的字符串?不是一个常用的分隔符吗?@JimWright:每一行都是一个单独的字符串你能举个例子说明输出应该是什么样子吗?或者至少突出显示字符串中的哪个实际上是一个坐标。拆分有什么问题吗?努力(Y):有趣的方法!