Python 在文本中识别两个单词的城市(';纽约';)
对于这段代码,我得到了一个包含多个城市的文本文件。我想确定提到的城市并打印它们的州和国家 要求: 如果提到的城市位于两个或两个以上的国家,我会要求用户提及他们正在谈论的城市。此外,如果有一个轻微的打字错误,我会问用户他们是否指的是某个城市。例如,如果他们键入'Dalls'而不是'Dallas',我需要提供用户选项,例如'you means Dallas而不是Dalls' 问题: 到目前为止,我已经成功地满足了这些条件,但当涉及到确定两个词的城市,如“纽约”或“旧金山”时,我的计划无法做到。这是因为我正在逐字逐句地阅读课文。如果你对如何更好地阅读课文有任何建议,请告诉我 p.S.(我知道代码可以用更高级的python方法来简化,但我对python的了解还没有达到这个水平。不过,请告诉我还有什么其他方法可以简化我的程序,因为我觉得现在没有必要了。谢谢!) 文件说明: 我正在使用一个名为“world cities.csv”、“text.txt”和“usa.txt”的文本“world cities.csv”是一个包含世界上许多城市的文件txt是一个包含我将为城市分析的句子的文件usa.txt包含英语中的常用词。我用它来比较“TEXT.txt”来删除常用词。我有一个问题,像“和”这样的词显示为打字错误。因此,这是一个非法的方法来摆脱他们 文本文件: 今天我去了海得拉巴,然后我去了美国的钦奈和纽约。现在我要去东京,明天再回到罗切斯特。达尔和斯德尼是我的下一个目的地 我使用过Geotext,它可以工作,但在阅读诸如“纽约”之类的城市时会出现问题。我的程序中没有geotext的部分读作“York”,当我添加geotext时读作“NewYork”。因此,我的城市列表中有“约克”和“纽约”。我被告知我可以使用NLTK软件包,但我仍在寻找一种有效的方法 在这里输入代码Python 在文本中识别两个单词的城市(';纽约';),python,nltk,Python,Nltk,对于这段代码,我得到了一个包含多个城市的文本文件。我想确定提到的城市并打印它们的州和国家 要求: 如果提到的城市位于两个或两个以上的国家,我会要求用户提及他们正在谈论的城市。此外,如果有一个轻微的打字错误,我会问用户他们是否指的是某个城市。例如,如果他们键入'Dalls'而不是'Dallas',我需要提供用户选项,例如'you means Dallas而不是Dalls' 问题: 到目前为止,我已经成功地满足了这些条件,但当涉及到确定两个词的城市,如“纽约”或“旧金山”时,我的计划无法做到。这是因
import pandas as pd
import re
#imported dataset
dataset = pd.read_csv('world-cities.csv')
#assigned certain parts of data set to variable
data = dataset.iloc[:,:-1]
city = dataset.iloc[:,0]
state = dataset.iloc[:,2]
country = dataset.iloc[:,1]
#opened and imported textfile
txtfile = open('TEXT.txt','r')
txtfile = txtfile.read()
words = open('usa.txt','r')
words = words.read()
#getting rid of punctation
altered = re.sub("[.,:]",'',txtfile)
templist = [] #holds the cities(state and country) info of the places
final = [] #final array
all_cities = [] #used to check for repeating cities
repeat = {} #contains only city names
repeatinfo = [] #contain all infor about repeating cities
stupid = 0
close = 0
typo = []
typodict = {}
typecount = 0
finaltypo = []
#finding out where the talked about cities are
for x in altered.split():
count = 0
zcount = 0
for y in city:
if x == y:
zcount +=1
templist.append([city[count], state[count], country[count]])
all_cities.append(city[count])
count+=1
if zcount > 1:
repeat[x] = zcount
#put in all assumed Typos
for x in altered.split():
if x not in all_cities:
x = x.lower()
if x not in words:
typo.append(x)
#narrow down options of typos
many = 0
for a in typo:
for b in city:
b = b.lower()
if len(a) >= (len(b)-1) and len(a) <= (len(b)+1):
if a[0] == b[0] or a[-1::] == b[-1::]:
if a[0:3] == b[0:3] or a[-3::] == b[-3::]:
#print(f'{a} vs {b}')
many = 0
for x in a:
if x in b:
many+=1
if many >= (len(b)-1) and many <= (len(b)+1):
typodict[b] = a
#let user choose if it is a typo or not
print('TYPO Checking')
for a in typo:
p =0
q = 0
while(p < len(typo) and q == 0):
for x,y in typodict.items():
go2 = True
while(go2 and q==0):
if y == a:
user2 = input(f" Did you mean to type '{x}' instead of
'{y}'? Enter 'y' or 'n': ")
user2 = user2.lower()
if user2 == 'y':
go2 = False
finaltypo.append(x)
p+=1
q+=1
elif user2 == 'n':
go2 = False
else:
print('You have entered a invalid value')
else:
go2 = False
#adding typoed cities into list
for x in finaltypo:
x = x.capitalize()
count = 0
zcount = 0
for y in city:
if x == y:
zcount +=1
templist.append([city[count], state[count], country[count]])
all_cities.append(city[count])
count+=1
if zcount > 1:
repeat[x] = zcount
#finding out what cities repeat and adding all their information to repeat
info
for x in repeat:
rcount = 0
for y in city:
if x == y:
repeatinfo.append([city[rcount], state[rcount],
country[rcount]])
rcount +=1
#determining which country they mean when they mentioned repeating cities
print('Which City?')
for x,y in repeat.items():
i = 0
e = 0
while(i < y and e == 0):
go = True
for c in repeatinfo:
go = True
while(go and e == 0):
if x == c[0]:
user = input(f'Do you mean {x} in {c[1]},{c[2]} enter y
or n: ')
user = user.lower()
i +=1
if user == 'y':
final.append(f' {x} in {c[1]}, {c[2]}')
go = False
i +=1
e +=1
elif user == 'n':
go = False
i+=1
else:
print('You have entered a invalid input')
else:
go = False
#removing repeating cities from templist
for y in list(templist):
if y[0] in list(repeat):
templist.remove(y)
#adding remaining elements of templist to final list
for y in list(templist):
final.append(f' {y[0]} in {y[1]}, {y[2]}')
#printing final output
print('\n You have entered the following cities:')
for x in final:
print(x)
猜一猜“纽约”出现在你的
city
列表中
我认为你可以这样做:
#finding out where the talked about cities are
for count,y in enumerate(city):
if y in altered:
zcount +=1
templist.append([city[count], state[count], country[count]])
all_cities.append(city[count])
我希望这能帮助你了解基本情况。如果你需要更多的帮助,请告诉我。使用Hi @ Susanth Kakarla。如果任何答案都解决了你的问题,请点击检查标记来考虑。这向更广泛的社区表明,你已经找到了一个解决方案,并给回答者和你自己带来了一些声誉。没有义务这样做。
#finding out where the talked about cities are
for count,y in enumerate(city):
if y in altered:
zcount +=1
templist.append([city[count], state[count], country[count]])
all_cities.append(city[count])