使用python对数据进行分类

使用python对数据进行分类,python,Python,我是python的初学者,我有一个项目,我必须将我的数据分类为不同的类别,我想访问我的分类列表,我尝试了下面的方法,正如你在这里看到的,但我一直收到相同的错误 如果能为我的问题提供任何帮助或解决方案,我将不胜感激 insport = any(ele in text for ele in categorie[3] ) if insport: data["cat"]='sport' nsport=nsport+1

我是python的初学者,我有一个项目,我必须将我的数据分类为不同的类别,我想访问我的分类列表,我尝试了下面的方法,正如你在这里看到的,但我一直收到相同的错误

如果能为我的问题提供任何帮助或解决方案,我将不胜感激

 insport = any(ele in text for ele in categorie[3] )
         if insport:
            data["cat"]='sport'
            nsport=nsport+1
         else :
            insante = any(ele in text for ele in categorie[1] )
            if insante :
                data["cat"]='sante'
                nsante=nsante+1
            else :
                inpolitique = any(ele in text for ele in categorie[2])
                if  inpolitique:
                    data["cat"]='politique'
                    npoli=npoli+1
                else:
                    incalture = any(ele in text for ele in categorie[6] )
                    if incalture:
                        data["cat"]='culture'
                        ncalt=ncalt+1
                    else:
                        inreligion = any(ele in text for ele in categorie[4])
                        if inreligion:
                            data["cat"]='religion'
                            nrelig=nrelig+1
                        else:
                            ineducation = any(ele in text for ele in categorie[5] )
                            if ineducation:
                                data["cat"]='social'
                                neduc=neduc+1
                            else:
                                    print(" the tweet---------------------------------------------------------------------------------------")
                                    print(text)
这将导致以下错误:

Traceback (most recent call last):
  File "C:\Users\NIHAD\PycharmProjects\pythonProject3\classification.py", line 52, in <module>
    ifin = any(ele in text for ele in categorie[0] )
  File "C:\Users\NIHAD\PycharmProjects\pythonProject3\classification.py", line 52, in <genexpr>
    ifin = any(ele in text for ele in categorie[0] )
TypeError: 'in <string>' requires string as left operand, not tuple
回溯(最近一次呼叫最后一次):
文件“C:\Users\NIHAD\PycharmProjects\pythonProject3\classification.py”,第52行,在
ifin=any(类别[0]中元素的文本中元素)
文件“C:\Users\NIHAD\PycharmProjects\pythonProject3\classification.py”,第52行,在
ifin=any(类别[0]中元素的文本中元素)
TypeError:“in”需要字符串作为左操作数,而不是元组
这是我的分类清单:


这里有一种不同的、更简单的方法。我将
文本
与包含类别名称的文本一起用作示例。 识别其中一个文本中包含的类别名称后,您就可以将该文本与数据库中找到的类别一起保存。但是,这仅为每个文本指定一个类别

texts=[
 "bla bla bla santebla bla bla ",
 "bla bla bla sport bla bla bla ",
 "bla bla bla education bla bla bla ",
 "bla bla bla social  bla bla bla ",
 "bla bla bla religion bla bla bla ",
 "bla bla bla politique bla bla bla ",
 "bla bla bla culture  bla bla bla "
]

counts={
 "sante":0,
 "sport":0,
 "education":0,
 "social":0,
 "religion":0,
 "politique":0,
 "culture":0
}

categorie= [[(1, 'education'), (2, 'sante'), (3, 'politique'), (4, 'sport'), (5, 'religion'), (6, 'social'), (7, 'culture')]]

for i,t in enumerate(texts):
    for c in categorie[0]:
        catid=c[0]
        catname=c[1]
        if catname in t:
            counts[catname]+=1
            print("you can save texts[",i,"] with catname:",catname," or catid:",catid, "in your database")


print("Show categories counts:")
print(counts)
输出:

you can save texts[ 0 ] with catname: sante  or catid: 2 in your database
you can save texts[ 1 ] with catname: sport  or catid: 4 in your database
you can save texts[ 2 ] with catname: education  or catid: 1 in your database
you can save texts[ 3 ] with catname: social  or catid: 6 in your database
you can save texts[ 4 ] with catname: religion  or catid: 5 in your database
you can save texts[ 5 ] with catname: politique  or catid: 3 in your database
you can save texts[ 6 ] with catname: culture  or catid: 7 in your database
Show categories counts:
{'sante': 1, 'sport': 1, 'education': 1, 'social': 1, 'religion': 1, 'politique': 1, 'culture': 1}

请考虑使用<代码> ELIF <代码>,而不是这些嵌套的<代码> > <代码> >代码>其他代码> >,请提供正确的<代码>类别>代码>数据。您的元组列表将在第一行中抛出一个
索引器:列表索引超出范围。
。如果要将元组中的单词与文本匹配,它们都应该是文本中的
ele[1]。