Python 每秒出现未知元素后拆分字符串_Python_String_Split_Polygon

Python 每秒出现未知元素后拆分字符串

python string

Python 每秒出现未知元素后拆分字符串,python,string,split,polygon,Python,String,Split,Polygon,我有一个代表多边形的坐标列表字符串。在此列表中，每个多边形都有相同的起始坐标和结束坐标。我需要有在单独的字符串（或列表）每个多边形 “17.17165756225586-28.10226440296875，17.1843700040893555-28.200496673583984,17.1986083984375-28.223613739013672，17.17165756225586-28.10226440296875， 28.865726470947266-28.76161956787

我有一个代表多边形的坐标列表字符串。在此列表中，每个多边形都有相同的起始坐标和结束坐标。我需要有在单独的字符串（或列表）每个多边形

“17.17165756225586-28.10226440296875，17.1843700040893555-28.200496673583984,17.1986083984375-28.223613739013672，17.17165756225586-28.10226440296875， 28.865726470947266-28.761619567871094，28.80694007873535-28.75750160217285，28.792499542236328-28.70694732660156，28.865726470947266-28.7616195671094

从这个简单的例子中，我需要两个元素：

一是17.17165756225586-28.10226440296875，17.1843700040893555-28.200496673583984，17.1986083984375-28.223613739013672，17.17165756225586-28.10226440296875
二是28.865726470947266-28.761619567871094，28.80694007873535-28.75750160217285,28.792499542236328-28.70694732660156，28.865726470947266-28.7616195671094*

字符串中可能有更多多边形，每个多边形都需要分离。

我只能使用标准的python库来实现这一点

如何使用每个“，”来拆分长字符串，并将其放入数组中。然后做一个for循环并执行以下操作：

intStart = 0;
if (array[intStart] == array[i]){
    for(j=0; j<i; j++){
        string += array[j];
    }
    arrPolygons.push(string);
    intStart = i+1;
}

intStart=0；
if（数组[intStart]==数组[i]）{
for（j=0；j用每个“，”拆分长字符串并将其放入一个数组。然后做一个for循环并执行以下操作：
intStart = 0;
if (array[intStart] == array[i]){
    for(j=0; j<i; j++){
        string += array[j];
    }
    arrPolygons.push(string);
    intStart = i+1;
}

intStart=0；
if（数组[intStart]==数组[i]）{
对于（j=0；j，这里有一个相当难看但有效的解决方案，只是将显而易见的方法真正应用到代码中
# Note that your string has inconsistent separators -- sometimes ',', sometimes ', '.
# I'm going to separate on `,` and not worry about it -- you need to work out
# what the correct separator is.
s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'

coordinates = s.split(',')

polygon = []
polygons = []

new = True

for coordinate in coordinates:
    polygon.append(coordinate)

    if new:
        start = coordinate
        new = False

    elif coordinate == start:
        polygons.append(polygon)
        polygon = []
        new = True

result = [",".join(polygon) for polygon in polygons]
print(result)

Out:
['17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875', ' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094']

这是一个相当丑陋但有效的解决方案，只是将显而易见的方法真正地应用到代码中
# Note that your string has inconsistent separators -- sometimes ',', sometimes ', '.
# I'm going to separate on `,` and not worry about it -- you need to work out
# what the correct separator is.
s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'

coordinates = s.split(',')

polygon = []
polygons = []

new = True

for coordinate in coordinates:
    polygon.append(coordinate)

    if new:
        start = coordinate
        new = False

    elif coordinate == start:
        polygons.append(polygon)
        polygon = []
        new = True

result = [",".join(polygon) for polygon in polygons]
print(result)

Out:
['17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875', ' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094']

输出：
[['17.17165756225586 -28.102264404296875', '17.184370040893555 -28.200496673583984', '17.1986083984375 -28.223613739013672', '17.17165756225586 -28.102264404296875'], [' 28.865726470947266 -28.761619567871094', '28.80694007873535 -28.75750160217285', '28.792499542236328 -28.706947326660156', ' 28.865726470947266 -28.761619567871094']]

输出：
[['17.17165756225586 -28.102264404296875', '17.184370040893555 -28.200496673583984', '17.1986083984375 -28.223613739013672', '17.17165756225586 -28.102264404296875'], [' 28.865726470947266 -28.761619567871094', '28.80694007873535 -28.75750160217285', '28.792499542236328 -28.706947326660156', ' 28.865726470947266 -28.761619567871094']]

输入数据
lst = [
    '17.17165756225586 -28.102264404296875',
    '17.184370040893555 -28.200496673583984',
    ...
    '17.1986083984375 -28.223613739013672',
    '17.17165756225586 -28.102264404296875',
    '28.865726470947266 -28.761619567871094',
    ...
    '28.80694007873535 -28.75750160217285',
    '28.792499542236328 -28.706947326660156',
    '28.865726470947266 -28.761619567871094',
]

lst1 = []
for cord in lst:
    if cord not in lst1:
        lst1.append(cord)
print(lst1)

输出：
[
    '17.17165756225586 -28.102264404296875',
    '17.184370040893555 -28.200496673583984',
    '17.1986083984375 -28.223613739013672',
    '28.865726470947266 -28.761619567871094',
    '28.80694007873535 -28.75750160217285',
    '28.792499542236328 -28.706947326660156',
    '28.865726470947266 -28.761619567871094',
]

输入数据
lst = [
    '17.17165756225586 -28.102264404296875',
    '17.184370040893555 -28.200496673583984',
    ...
    '17.1986083984375 -28.223613739013672',
    '17.17165756225586 -28.102264404296875',
    '28.865726470947266 -28.761619567871094',
    ...
    '28.80694007873535 -28.75750160217285',
    '28.792499542236328 -28.706947326660156',
    '28.865726470947266 -28.761619567871094',
]

lst1 = []
for cord in lst:
    if cord not in lst1:
        lst1.append(cord)
print(lst1)

输出：
[
    '17.17165756225586 -28.102264404296875',
    '17.184370040893555 -28.200496673583984',
    '17.1986083984375 -28.223613739013672',
    '28.865726470947266 -28.761619567871094',
    '28.80694007873535 -28.75750160217285',
    '28.792499542236328 -28.706947326660156',
    '28.865726470947266 -28.761619567871094',
]

由于您的输入已经是一个字符串（以及您的预期结果？），因此您可以使用带有反向引用的（[^，]+）.*\2）
尝试此超级惰性解决方案。这里，[^，]+
是第一个坐标对，*
其他坐标对，而\2
是第一个坐标对
>>> s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
>>> re.findall(r"(([^,]+).*\2)", s)
[('17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875',
  '17.17165756225586 -28.102264404296875'),
 (' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094',
  ' 28.865726470947266 -28.761619567871094')]

或者使用finditer
并获取组
直接获取字符串列表：
>>> [m.group() for m in re.finditer(r"(([^,]+).*\2)", s)]
['17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875',
 ' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094']

经过一些后处理后，要获得成对数字的实际列表（其中
是findall
的结果；对于finditer
，请删除[0]
）：
对于较长的字符串，这可能不是最快的解决方案，但我没有计时。
由于您的输入已经是字符串（以及您的预期结果？），您可以使用带有反向引用的（[^，]+）.\2）
尝试此超级惰性解决方案。此处，[^，]+
是第一对坐标，*
是其他坐标对，\2
是第一对坐标对
>>> s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
>>> re.findall(r"(([^,]+).*\2)", s)
[('17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875',
  '17.17165756225586 -28.102264404296875'),
 (' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094',
  ' 28.865726470947266 -28.761619567871094')]

或者使用finditer
并获取组
直接获取字符串列表：
>>> [m.group() for m in re.finditer(r"(([^,]+).*\2)", s)]
['17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875',
 ' 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094']

经过一些后处理后，要获得成对数字的实际列表（其中
是findall
的结果；对于finditer
，请删除[0]
）：
对于较长的字符串，这可能不是最快的解决方案，但我没有计时。
我非常喜欢@newbie的简洁解决方案。下面是一个更详细/可读的解决方案：
s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
vertices = [c.strip() for c in s.split(",")] # split and clean vertex data

polygons = []           
current_polygon = None

for vertex in vertices:
    if current_polygon is None:             # start a new polygon
        current_polygon = [vertex]
    elif current_polygon[0] == vertex:      # conclude the current polygon
        current_polygon.append(vertex)
        polygons.append(current_polygon)
        current_polygon = None
    else:                                   # continue the current polygon
        current_polygon.append(vertex)

for polygon in polygons:    # print polygons
    print(",".join(polygon))

我非常喜欢@newbie的简洁解决方案。这里有一个更详细/可读的解决方案：
s = '17.17165756225586 -28.102264404296875,17.184370040893555 -28.200496673583984,17.1986083984375 -28.223613739013672,17.17165756225586 -28.102264404296875, 28.865726470947266 -28.761619567871094,28.80694007873535 -28.75750160217285,28.792499542236328 -28.706947326660156, 28.865726470947266 -28.761619567871094'
vertices = [c.strip() for c in s.split(",")] # split and clean vertex data

polygons = []           
current_polygon = None

for vertex in vertices:
    if current_polygon is None:             # start a new polygon
        current_polygon = [vertex]
    elif current_polygon[0] == vertex:      # conclude the current polygon
        current_polygon.append(vertex)
        polygons.append(current_polygon)
        current_polygon = None
    else:                                   # continue the current polygon
        current_polygon.append(vertex)

for polygon in polygons:    # print polygons
    print(",".join(polygon))

递归方法：
def split_polygons(s):
    if s == '':  # base case
        return []
    start, rest = s.split(',', 1)
    head, tail = map(lambda x: x.strip(', '), rest.split(start, 1))
    poly = start + ',' + head + start  # reconstruct the first polygon
    return [poly] + split_polygons(tail)


递归方法：
def split_polygons(s):
    if s == '':  # base case
        return []
    start, rest = s.split(',', 1)
    head, tail = map(lambda x: x.strip(', '), rest.split(start, 1))
    poly = start + ',' + head + start  # reconstruct the first polygon
    return [poly] + split_polygons(tail)


这是另一种方法，这种方法适用于任何字符串长度，只要它基于您提供的输入格式
strng = "17.17165756225586,-28.102264404296875,17.184370040893555,-28.200496673583984,17.1986083984375,-28.223613739013672,17.17165756225586,-28.102264404296875,28.865726470947266,-28.761619567871094,28.80694007873535,-28.75750160217285,28.792499542236328,-28.706947326660156,28.865726470947266,-28.761619567871094"
#convert to list of tuples
l_tuple = zip(*[iter(strng.split(','))]*2)
#get list of duplicate indexes
l_index=[]
for Tuple in l_tuple:
    x = [i for i,x in enumerate(l_tuple) if x == Tuple]
    if len(x)>1:
        l_index.append(x)
#get separate lists
New_list = []
for IND in list(set(map(tuple,l_index))):
    print(l_tuple[IND[0]:IND[1]+1])
    New_list.append(l_tuple[IND[0]:IND[1]+1])

这是另一种方法，这种方法适用于任何字符串长度，只要它基于您提供的输入格式
strng = "17.17165756225586,-28.102264404296875,17.184370040893555,-28.200496673583984,17.1986083984375,-28.223613739013672,17.17165756225586,-28.102264404296875,28.865726470947266,-28.761619567871094,28.80694007873535,-28.75750160217285,28.792499542236328,-28.706947326660156,28.865726470947266,-28.761619567871094"
#convert to list of tuples
l_tuple = zip(*[iter(strng.split(','))]*2)
#get list of duplicate indexes
l_index=[]
for Tuple in l_tuple:
    x = [i for i,x in enumerate(l_tuple) if x == Tuple]
    if len(x)>1:
        l_index.append(x)
#get separate lists
New_list = []
for IND in list(set(map(tuple,l_index))):
    print(l_tuple[IND[0]:IND[1]+1])
    New_list.append(l_tuple[IND[0]:IND[1]+1])

粗体是每个多边形（开始/结束）的相同点。哦，有人请格式化它。我在赶时间。一旦解释了粗体的含义，我认为它的格式很好。困难在哪里，你读第一对并扫描直到它再次出现？是的，每对都用逗号分隔。粗体是每个多边形（开始/结束）的相同点哦，有人请格式化它。我正在赶时间。我认为一旦解释了粗体的含义，格式化就很好了。困难在哪里，你读第一对，然后扫描直到它再次出现？是的，每对都用逗号分隔。你为什么要用另一种语言发布答案？如果有的话，这只是混淆了问题为什么你要用另一种语言发布答案另一种语言？如果有什么区别的话，这只是混淆了问题我不认为我们能在这个问题上逃脱“丑陋”：）我不认为我们能在这个问题上逃脱“丑陋”：这完全没有击中目标。您没有创建坐标对或拆分列表，您只是以一种没有意义的方式对其进行过滤。这完全没有击中目标。您没有创建坐标对或拆分列表，您只是以一种没有意义的方式对其进行过滤。很好，我提出了r'（[\-\d\.]+）.*“
，晚了15分钟。是否有可能直接获取字符串列表而不是字符串元组列表？它似乎可以工作，必须在具有更多多边形的字符串上进行测试。@EricDuminil我尝试了非捕获组，但使用findall
这似乎是唯一的方法。您也可以使用[m.group（）对于re.finditer（…）]
中的m，虽然只是为了获得字符串列表。很好，我想出了r'（[\-\d\.]+）.*2）“
，晚了15分钟。是否有可能直接获取字符串列表而不是字符串元组列表？它似乎可以工作，必须在具有更多多边形的字符串上进行测试。@EricDuminil我尝试了非捕获组，但使用findall
这似乎是唯一的方法。您也可以使用[m.group（）对于re.finditer（…）]
中的m，只需获取字符串列表。