Python 每秒出现未知元素后拆分字符串
我有一个代表多边形的坐标列表字符串。 在此列表中,每个多边形都有相同的起始坐标和结束坐标。 我需要有在单独的字符串(或列表)每个多边形 “17.17165756225586-28.10226440296875,17.1843700040893555-28.200496673583984,17.1986083984375-28.223613739013672,17.17165756225586-28.10226440296875, 28.865726470947266-28.761619567871094,28.80694007873535-28.75750160217285,28.792499542236328-28.70694732660156,28.865726470947266-28.7616195671094 从这个简单的例子中,我需要两个元素: 一是17.17165756225586-28.10226440296875,17.1843700040893555-28.200496673583984,17.1986083984375-28.223613739013672,17.17165756225586-28.10226440296875 二是28.865726470947266-28.761619567871094,28.80694007873535-28.75750160217285,28.792499542236328-28.70694732660156,28.865726470947266-28.7616195671094* 字符串中可能有更多多边形,每个多边形都需要分离。Python 每秒出现未知元素后拆分字符串,python,string,split,polygon,Python,String,Split,Polygon,我有一个代表多边形的坐标列表字符串。 在此列表中,每个多边形都有相同的起始坐标和结束坐标。 我需要有在单独的字符串(或列表)每个多边形 “17.17165756225586-28.10226440296875,17.1843700040893555-28.200496673583984,17.1986083984375-28.223613739013672,17.17165756225586-28.10226440296875, 28.865726470947266-28.76161956787
我只能使用标准的python库来实现这一点如何使用每个“,”来拆分长字符串,并将其放入数组中。然后做一个for循环并执行以下操作:
intStart = 0;
if (array[intStart] == array[i]){
for(j=0; j<i; j++){
string += array[j];
}
arrPolygons.push(string);
intStart = i+1;
}
intStart=0;
if(数组[intStart]==数组[i]){
for(j=0;j用每个“,”拆分长字符串并将其放入一个数组。然后做一个for循环并执行以下操作:
intStart = 0;
if (array[intStart] == array[i]){
for(j=0; j<i; j++){
string += array[j];
}
arrPolygons.push(string);
intStart = i+1;
}
intStart=0;
if(数组[intStart]==数组[i]){
对于(j=0;j,这里有一个相当难看但有效的解决方案,只是将显而易见的方法真正应用到代码中
# Note that your string has inconsistent separators -- sometimes ',', sometimes ', '.
# I'm going to separate on `,` and not worry about it -- you need to work out
# what the correct separator is.
s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
coordinates = s.split(',')
polygon = []
polygons = []
new = True
for coordinate in coordinates:
polygon.append(coordinate)
if new:
start = coordinate
new = False
elif coordinate == start:
polygons.append(polygon)
polygon = []
new = True
result = [",".join(polygon) for polygon in polygons]
print(result)
Out:
['17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875', ' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094']
这是一个相当丑陋但有效的解决方案,只是将显而易见的方法真正地应用到代码中
# Note that your string has inconsistent separators -- sometimes ',', sometimes ', '.
# I'm going to separate on `,` and not worry about it -- you need to work out
# what the correct separator is.
s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
coordinates = s.split(',')
polygon = []
polygons = []
new = True
for coordinate in coordinates:
polygon.append(coordinate)
if new:
start = coordinate
new = False
elif coordinate == start:
polygons.append(polygon)
polygon = []
new = True
result = [",".join(polygon) for polygon in polygons]
print(result)
Out:
['17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875', ' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094']
输出:
[['17.17165756225586 -28.102264404296875', '17.184370040893555 -28.200496673583984', '17.1986083984375 -28.223613739013672', '17.17165756225586 -28.102264404296875'], [' 28.865726470947266 -28.761619567871094', '28.80694007873535 -28.75750160217285', '28.792499542236328 -28.706947326660156', ' 28.865726470947266 -28.761619567871094']]
输出:
[['17.17165756225586 -28.102264404296875', '17.184370040893555 -28.200496673583984', '17.1986083984375 -28.223613739013672', '17.17165756225586 -28.102264404296875'], [' 28.865726470947266 -28.761619567871094', '28.80694007873535 -28.75750160217285', '28.792499542236328 -28.706947326660156', ' 28.865726470947266 -28.761619567871094']]
输入数据
lst = [
'17.17165756225586 -28.102264404296875',
'17.184370040893555 -28.200496673583984',
...
'17.1986083984375 -28.223613739013672',
'17.17165756225586 -28.102264404296875',
'28.865726470947266 -28.761619567871094',
...
'28.80694007873535 -28.75750160217285',
'28.792499542236328 -28.706947326660156',
'28.865726470947266 -28.761619567871094',
]
lst1 = []
for cord in lst:
if cord not in lst1:
lst1.append(cord)
print(lst1)
输出:
[
'17.17165756225586 -28.102264404296875',
'17.184370040893555 -28.200496673583984',
'17.1986083984375 -28.223613739013672',
'28.865726470947266 -28.761619567871094',
'28.80694007873535 -28.75750160217285',
'28.792499542236328 -28.706947326660156',
'28.865726470947266 -28.761619567871094',
]
输入数据
lst = [
'17.17165756225586 -28.102264404296875',
'17.184370040893555 -28.200496673583984',
...
'17.1986083984375 -28.223613739013672',
'17.17165756225586 -28.102264404296875',
'28.865726470947266 -28.761619567871094',
...
'28.80694007873535 -28.75750160217285',
'28.792499542236328 -28.706947326660156',
'28.865726470947266 -28.761619567871094',
]
lst1 = []
for cord in lst:
if cord not in lst1:
lst1.append(cord)
print(lst1)
输出:
[
'17.17165756225586 -28.102264404296875',
'17.184370040893555 -28.200496673583984',
'17.1986083984375 -28.223613739013672',
'28.865726470947266 -28.761619567871094',
'28.80694007873535 -28.75750160217285',
'28.792499542236328 -28.706947326660156',
'28.865726470947266 -28.761619567871094',
]
由于您的输入已经是一个字符串(以及您的预期结果?),因此您可以使用带有反向引用的([^,]+).*\2)
尝试此超级惰性解决方案。这里,[^,]+
是第一个坐标对,*
其他坐标对,而\2
是第一个坐标对
>>> s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
>>> re.findall(r"(([^,]+).*\2)", s)
[('17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875',
'17.17165756225586 -28.102264404296875'),
(' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094',
' 28.865726470947266 -28.761619567871094')]
或者使用finditer
并获取组
直接获取字符串列表:
>>> [m.group() for m in re.finditer(r"(([^,]+).*\2)", s)]
['17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875',
' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094']
经过一些后处理后,要获得成对数字的实际列表(其中
是findall
的结果;对于finditer
,请删除[0]
):
对于较长的字符串,这可能不是最快的解决方案,但我没有计时。由于您的输入已经是字符串(以及您的预期结果?),您可以使用带有反向引用的([^,]+).\2)
尝试此超级惰性解决方案。此处,[^,]+
是第一对坐标,*
是其他坐标对,\2
是第一对坐标对
>>> s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
>>> re.findall(r"(([^,]+).*\2)", s)
[('17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875',
'17.17165756225586 -28.102264404296875'),
(' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094',
' 28.865726470947266 -28.761619567871094')]
或者使用finditer
并获取组
直接获取字符串列表:
>>> [m.group() for m in re.finditer(r"(([^,]+).*\2)", s)]
['17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875',
' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094']
经过一些后处理后,要获得成对数字的实际列表(其中
是findall
的结果;对于finditer
,请删除[0]
):
对于较长的字符串,这可能不是最快的解决方案,但我没有计时。我非常喜欢@newbie的简洁解决方案。下面是一个更详细/可读的解决方案:
s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
vertices = [c.strip() for c in s.split(",")] # split and clean vertex data
polygons = []
current_polygon = None
for vertex in vertices:
if current_polygon is None: # start a new polygon
current_polygon = [vertex]
elif current_polygon[0] == vertex: # conclude the current polygon
current_polygon.append(vertex)
polygons.append(current_polygon)
current_polygon = None
else: # continue the current polygon
current_polygon.append(vertex)
for polygon in polygons: # print polygons
print(",".join(polygon))
我非常喜欢@newbie的简洁解决方案。这里有一个更详细/可读的解决方案:
s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
vertices = [c.strip() for c in s.split(",")] # split and clean vertex data
polygons = []
current_polygon = None
for vertex in vertices:
if current_polygon is None: # start a new polygon
current_polygon = [vertex]
elif current_polygon[0] == vertex: # conclude the current polygon
current_polygon.append(vertex)
polygons.append(current_polygon)
current_polygon = None
else: # continue the current polygon
current_polygon.append(vertex)
for polygon in polygons: # print polygons
print(",".join(polygon))
递归方法:
def split_polygons(s):
if s == '': # base case
return []
start, rest = s.split(',', 1)
head, tail = map(lambda x: x.strip(', '), rest.split(start, 1))
poly = start + ',' + head + start # reconstruct the first polygon
return [poly] + split_polygons(tail)
递归方法:
def split_polygons(s):
if s == '': # base case
return []
start, rest = s.split(',', 1)
head, tail = map(lambda x: x.strip(', '), rest.split(start, 1))
poly = start + ',' + head + start # reconstruct the first polygon
return [poly] + split_polygons(tail)
这是另一种方法,这种方法适用于任何字符串长度,只要它基于您提供的输入格式
strng = "17.17165756225586,-28.102264404296875,17.184370040893555,-28.200496673583984,17.1986083984375,-28.223613739013672,17.17165756225586,-28.102264404296875,28.865726470947266,-28.761619567871094,28.80694007873535,-28.75750160217285,28.792499542236328,-28.706947326660156,28.865726470947266,-28.761619567871094"
#convert to list of tuples
l_tuple = zip(*[iter(strng.split(','))]*2)
#get list of duplicate indexes
l_index=[]
for Tuple in l_tuple:
x = [i for i,x in enumerate(l_tuple) if x == Tuple]
if len(x)>1:
l_index.append(x)
#get separate lists
New_list = []
for IND in list(set(map(tuple,l_index))):
print(l_tuple[IND[0]:IND[1]+1])
New_list.append(l_tuple[IND[0]:IND[1]+1])
这是另一种方法,这种方法适用于任何字符串长度,只要它基于您提供的输入格式
strng = "17.17165756225586,-28.102264404296875,17.184370040893555,-28.200496673583984,17.1986083984375,-28.223613739013672,17.17165756225586,-28.102264404296875,28.865726470947266,-28.761619567871094,28.80694007873535,-28.75750160217285,28.792499542236328,-28.706947326660156,28.865726470947266,-28.761619567871094"
#convert to list of tuples
l_tuple = zip(*[iter(strng.split(','))]*2)
#get list of duplicate indexes
l_index=[]
for Tuple in l_tuple:
x = [i for i,x in enumerate(l_tuple) if x == Tuple]
if len(x)>1:
l_index.append(x)
#get separate lists
New_list = []
for IND in list(set(map(tuple,l_index))):
print(l_tuple[IND[0]:IND[1]+1])
New_list.append(l_tuple[IND[0]:IND[1]+1])
粗体是每个多边形(开始/结束)的相同点。哦,有人请格式化它。我在赶时间。一旦解释了粗体的含义,我认为它的格式很好。困难在哪里,你读第一对并扫描直到它再次出现?是的,每对都用逗号分隔。粗体是每个多边形(开始/结束)的相同点哦,有人请格式化它。我正在赶时间。我认为一旦解释了粗体的含义,格式化就很好了。困难在哪里,你读第一对,然后扫描直到它再次出现?是的,每对都用逗号分隔。你为什么要用另一种语言发布答案?如果有的话,这只是混淆了问题为什么你要用另一种语言发布答案另一种语言?如果有什么区别的话,这只是混淆了问题我不认为我们能在这个问题上逃脱“丑陋”:)我不认为我们能在这个问题上逃脱“丑陋”:这完全没有击中目标。您没有创建坐标对或拆分列表,您只是以一种没有意义的方式对其进行过滤。这完全没有击中目标。您没有创建坐标对或拆分列表,您只是以一种没有意义的方式对其进行过滤。很好,我提出了r'([\-\d\.]+).*“
,晚了15分钟。是否有可能直接获取字符串列表而不是字符串元组列表?它似乎可以工作,必须在具有更多多边形的字符串上进行测试。@EricDuminil我尝试了非捕获组,但使用findall
这似乎是唯一的方法。您也可以使用[m.group()对于re.finditer(…)]
中的m,虽然只是为了获得字符串列表。很好,我想出了r'([\-\d\.]+).*2)“
,晚了15分钟。是否有可能直接获取字符串列表而不是字符串元组列表?它似乎可以工作,必须在具有更多多边形的字符串上进行测试。@EricDuminil我尝试了非捕获组,但使用findall
这似乎是唯一的方法。您也可以使用[m.group()对于re.finditer(…)]
中的m,只需获取字符串列表。