如何删除Python中的非英语单词？_Python_Pandas_Twitter_Nlp_Sentiment Analysis

如何删除Python中的非英语单词？

python pandas twitter nlp

如何删除Python中的非英语单词？,python,pandas,twitter,nlp,sentiment-analysis,Python,Pandas,Twitter,Nlp,Sentiment Analysis,我正在用Python做一个情绪分析项目（使用自然语言处理）。我已经从twitter上收集了数据，并将其保存为CSV文件。该文件包含tweet，主要是关于加密货币的。我清理了数据，但在使用分类算法应用情绪分析之前还有一件事。以下是导入库的方法 # importing Libraries from pandas import DataFrame, read_csv import chardet import matplotlib.pyplot as plt; plt.rcdefaults() fro

我正在用Python做一个情绪分析项目（使用自然语言处理）。我已经从twitter上收集了数据，并将其保存为CSV文件。该文件包含tweet，主要是关于加密货币的。我清理了数据，但在使用分类算法应用情绪分析之前还有一件事。以下是导入库的方法

# importing Libraries
from pandas import DataFrame, read_csv
import chardet
import matplotlib.pyplot as plt; plt.rcdefaults()
from matplotlib import rc
%matplotlib inline
import pandas as pd
plt.style.use('ggplot')
import numpy as np
import re
import warnings

#Visualisation
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
from IPython.display import display
from mpl_toolkits.basemap import Basemap
from wordcloud import WordCloud, STOPWORDS

#nltk
from nltk.stem import WordNetLemmatizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.sentiment.util import *
from nltk import tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.snowball import SnowballStemmer


matplotlib.style.use('ggplot')
pd.options.mode.chained_assignment = None
warnings.filterwarnings("ignore")

%matplotlib inline

    ## Reading CSV File and naming the object called crime
ltweet=pd.read_csv("C:\\Users\\name\\Documents\\python assignment\\litecoin1.csv",index_col = None, skipinitialspace = True)
print(ltweet)

我已经清理了大部分数据，因此不需要为该部分添加代码。在我的专栏中，有一些推文主要包含非英语语言。我想删除所有内容（仅限非英语文本）。下面是输出示例

ltweet['Tweets'][0:3]

output:
0      the has published a book on understanding العَرَبِيَّة‎
1      accepts litecoin gives % discount on all iphon...
2      days until litepay launches accept store and s...
3           ltc to usd price litecoin ltc cryptocurrency

有没有办法删除数据中的非英语单词？有人能帮我写代码吗？顺便说一下，代码是基于熊猫的。

你可以试试：

这将返回“False”

对文本中的每个单词都这样做：

english_words = []
for word in text:
    if d.check(word):
        english_words.append(word)

编辑：注意出现在多种语言中的单词。

您可以删除所有不使用拉丁字母的单词，但对于其他单词，您是否也准备删除所有英语拼写错误？是的，也是这样。这里回答了相同的问题，我在导入时出错，错误：找不到满足enchant要求的版本（从版本：无）错误：找不到enchant的匹配发行版您仍然可以使用

pip install pyenchant

安装软件包，尽管软件包不再维护：编辑：确定我刚刚更新了pip，现在我似乎收到了相同的错误。这可能是由于项目不再进行维护：(

english_words = []
for word in text:
    if d.check(word):
        english_words.append(word)