Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用python从.docx文件提取GPS坐标_Python_Regex_Docx - Fatal编程技术网

使用python从.docx文件提取GPS坐标

使用python从.docx文件提取GPS坐标,python,regex,docx,Python,Regex,Docx,我有一些繁忙的任务要做,我需要python的帮助。请看这个word文档 我要从每一行提取文本和GPS坐标。目前,10个docx文件中有100多个坐标。我的“丰富的”python知识让我明白了这一点 from docx import Document import re main_file = Document("D:/DOCUMENTS/Google_Link/1 Category I/1 Category I.docx") table = main_file.tables[1] #t

我有一些繁忙的任务要做,我需要python的帮助。请看这个word文档

我要从每一行提取文本和GPS坐标。目前,10个docx文件中有100多个坐标。我的“丰富的”python知识让我明白了这一点

from docx import Document
import re

main_file = Document("D:/DOCUMENTS/Google_Link/1  Category I/1  Category 
I.docx")
table = main_file.tables[1] #this is same for every document

data = []
keys = None

for i, row in enumerate(table.rows):
   text = (cell.text for cell in row.cells)

if i == 0:
    keys = tuple(text)
    continue

row_data = tuple(text)
data.append(row_data)

regexReference = re.compile("(C.-)\w+")
colReference = [item[1] for item in data]

listReference = filter(regexReference.match, colReference)

for i in listReference:
    print i.encode('UTF-8')
我可以从第2列打印16个参考ID。请引导我打印这样的东西

C1-20701-17-1

some site, some region

The existing CMC Office at Bariyodhala (22°40'34.3"N; 91°38'28.2"E) requires 
some repair/maintenance works including electrical wiring and electrical 
lights and appliances like ceiling fans supplies. Detail specification of 
the works are attached

x = 91°38'28.2"E
y = 22°40'34.3"N
这些XY位置和描述将用于随后创建KML文件,并随每个文档一起附加。我更喜欢为上面部分的每个部分(参考id、位置、描述、x和y)使用一个变量,这样我也可以自动执行


如果存在具有不同模式的文件,我不知道这是否有效(请注意,我使用的是python 2.7.11):


我建议您添加一个指向演示docx文件的链接。添加了演示docx文件链接。谢谢。除了坐标部分外,这似乎几乎起作用。我没有使用当前目录,因为文件不在同一文件夹中。文件名也没有下划线。请添加一个
os.walk
(用于文件夹中的所有文件)并删除下划线。好的,我添加
os.walk
,并通过检查.docx扩展名替换文件名
# -*- coding: utf-8 -*-
from docx import Document
import sys
import os
import re

reload(sys)
sys.setdefaultencoding('utf8')

for root, dirs, files in os.walk("."):
    for name in files:
        doc_file = os.path.join(root, name)
        if doc_file.endswith('docx'):
            main_file = Document(doc_file)
            table = main_file.tables[1]  # this is same for every document

            data = []
            keys = None

            for i, row in enumerate(table.rows):
                text = (cell.text for cell in row.cells)

                if i == 0:
                    keys = tuple(text)
                    continue

                row_data = tuple(text)
                data.append(row_data)

            regexReference = re.compile("(C.-[0-9-]+)")
            regexCoordinate = re.compile(r'(N-(.{,12})([0-9]|\')|[0-9].{,12}N)[;, ]+(E-(.{,12})([0-9]|\')|[0-9].{,12}E)')

            result = []
            for item in data:
                tmp = dict()
                matchReference = regexReference.search(item[1])
                matchCoordinate = regexCoordinate.search(unicode(item[2]))
                if matchReference:
                    tmp['reference'] = matchReference.group()
                if matchCoordinate:
                    tmp['x'] = matchCoordinate.group(1)
                    tmp['y'] = matchCoordinate.group(4)
                tmp['description'] = unicode(item[2])
                tmp['location'] = unicode(item[3])
                result.append(tmp)

            for rs in result:
                if 'reference' in rs:
                    for k, v in rs.iteritems():
                        print('{} = {}'.format(k, v))
                    print

# Output:
# --------------------------------
# y = 91°38'28.2"E
# x = 22°40'34.3"N
# description = The existing CMC Office at Bariyodhala (22°40'34.3"N; 91°38'28.2"E) requires some repair/maintenance works including electrical wiring and electrical lights and appliances like ceiling fans supplies. Detail specification of the works are attached.
# reference = C1-20701-17-1
# location = xxxxx Site, c Region