无法比较包含字符串的2个Python集

无法比较包含字符串的2个Python集,python,jupyter-notebook,jupyter-lab,Python,Jupyter Notebook,Jupyter Lab,我已经创建了2个python集,它们是由2个不同的CSV文件创建的,其中包含一些Sting 我正在尝试匹配这两个集合,以便它将返回2的交集(两个集合中的公共字符串都应返回) 这就是我的代码的外观: from nltk.corpus import stopwords from nltk.tokenize import word_tokenize import string import nltk #using content mmanager to open and read file #conv

我已经创建了2个python集,它们是由2个不同的CSV文件创建的,其中包含一些Sting

我正在尝试匹配这两个集合,以便它将返回2的交集(两个集合中的公共字符串都应返回)

这就是我的代码的外观:

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string
import nltk
#using content mmanager to open and read file
#converted the text file into csv file at the source using Notepad++
with open(r'skills.csv', 'r', encoding="utf-8-sig") as f:
    myskills = f.readlines()
    #converting mall the string in the list to lowercase
    list_of_myskills = map(lambda x: x.lower(), myskills)
    set_of_myskills = set(list_of_myskills)
    #print(type(nodup_filtered_content))
print(set_of_myskills)
#open and read by line from the text file
with open(r'list_of_skills.csv', 'r') as f2:
    #using readlines() instead of read(), becasue it reads line by line (each 
    line as a string obj in the python list)
    contents_f2 = f2.readlines()
    #converting mall the string in the list to lowercase
    list_of_skills = map(lambda x: x.lower(), contents_f2)
    #converting into sets
    set_of_skills = set(list_of_skills)
print(set_of_skills)
这就是我正在使用的函数:

def set_compare(set1,set2):
if(set1 & set2):
    return print('The matching skills are: '(set1 & set2))
else:
    print("No matching skills")
运行代码后:

    set_compare(set_of_skills,set_of_myskills)
输出:

No matching skills
“skills.csv”的内容包括:

{'critical thinking,identify user needs,business intelligence,business analysis,teamwork,database,data visualization,data analysis,relational database,mysql,oracle sql,design,entity-relationship,develop ,use-cases ,scenarios,project development ,user requirement,design,sequence diagram,state diagram,identifying,uml diagrams,html5,css3,php,clean,analyze,plot,data,python,pandas,numpy,matplotlib,ipython notebook,spyder,anaconda,jupyterlab,data analysis,data visualization,tableau,database,surveys,prototyping,logical data models,data models,requirement elicitation.,leadreship,mysq,team,prioratization,analyze,articulate,'}

“技能列表.csv”文件的内容:

{'assign passwords and maintain database access,agile development,agile project methodology,amazon web services (aws),analytics,analytical,analyze and recommend database improvements,analyze impact of database changes to the business,audit database access and requests,apis,application and server monitoring tools,applications,application development,attention to detail,architecture,big data,business analytics,business intelligence,business process modeling,cloud applications,cloud based visualizations,cloud hosting services,cloud maintenance tasks,cloud management tools,cloud platforms,cloud scalability,cloud services,cloud systems administration,code,coding,computer,communication,configure database software,configuration,configuration management,content strategy,content management,continually review processes for improvement ,continuous deployment,continuous integration,critical thinking,customer support,database,data analysis,data analytics,data imports,data imports,data intelligence,data mining,data modeling,data science,data strategy,data storage,data visualization tools,data visualizations,database administration,deploying applications in a cloud environment,deployment automation tools,deployment of cloud services,design,desktop support,design,design and build database management system,design principles,design prototypes,design specifications,design tools,develop and secure network structures,develop and test methods to synchronize data ,developer,development,documentation,emerging technologies,file systems,flexibility,front end design,google analytics,hardware,help desk,identify user needs ,implement backup and recovery plan ,implementation,information architecture,information design,information systems,interaction design,interaction flows,"install, maintain, and merge databases ",installation,integrated technologies,integrating security protocols with cloud design,internet,it optimization,it security,it soft skills,it solutions,it support,languages,logical thinking,leadership,linux,management,messaging,methodology,metrics,microsoft office,migrating existing workloads into cloud systems,mobile applications,motivation,networks,network operations,networking,open source technology integration,operating systems,operations,optimize queries on live data,optimizing user experiences,optimizing website performance,organization,presentation,programming,problem solving,process flows,product design,product development,prototyping methods,product development,product management,product support,product training,project management,repairs,reporting,research emerging technology,responsive design,review existing solutions,search engine optimization (seo),security,self motivated,self starting,servers,software,software development,software engineering,software quality assurance (qa),solid project management capabilities ,solid understanding of company’s data needs ,storage,strong technical and interpersonal communication ,support,systems software,tablets,team building,team oriented,teamwork,technology,tech skills,technical support,technical writing,testing,time management,tools,touch input navigation,training,troubleshooting,troubleshooting break-fix scenarios,user research,user testing,usability,user-centered design,user experience,user flows,user interface,user interaction diagrams,user research,user testing,ui / ux,utilizing cloud automation tools,virtualization,visual design,web analytics,web applications,web development,web design,web technologies,wireframes,work independently,'}
虽然我可以看到匹配的关键字,但我不明白为什么我得不到输出


也没有得到任何错误

比较两组字符串不会比较这些字符串的子字符串。您的程序实际上所做的是

foo = {'ABC', 'DEF', 'GHI'}
bar = {'AB', 'CD', 'DE', 'FG', 'HI'}

foo.intersection(bar) # returns {}
仅仅因为不同集合中的字符串之间存在共享字符,并不意味着集合具有交集。字符串
'ABC'
在第一个而不是第二个,字符串
'AB'
在第二个而不是第一个,以此类推

现在有点不清楚你到底想比较两个csv的交集是什么。是否要查找两者中的单个单元格?它们也必须在列中匹配吗?如果您提供有关预期输出的更多信息,那么我可以编辑此答案以提供更多信息

[编辑] 根据您的评论,看起来您想要的是分割逗号上的巨大字符串,使集合中的元素成为单个单元格。目前,这些集合只有一个元素,每个元素都只是一根巨大的弦,里面有很多技巧。如果你更换

list_of_myskills = map(lambda x: x.lower(), myskills)


并相应地替换另一个类似的行,那么您可能会更接近您所期望的内容。

这样做:更改.csv文件以包含技能单词,并用“,”分隔。每个文件一行

import pandas as pd
myskills = pd.read_csv("skills.csv",header=None)
set_of_my_skills = set(myskills.iloc[0,])
list_of_skills = pd.read_csv("list_of_skills.csv",header=None)
set_of_skills = set(list_of_skills.iloc[0,])
print(set_of_my_skills & set_of_skills)

{'business intelligence', 'design', 'critical thinking', 'data analysis', 'database', 'teamwork'}

skills.csv : critical thinking,identify user needs,business intelligence,business analysis,teamwork,database,data visualization,data analysis,relational database,mysql,oracle sql,design,entity-relationship,develop ,use-cases ,scenarios,project development ,user requirement,design,sequence diagram,state diagram,identifying,uml diagrams,html5,css3,php,clean,analyze,plot,data,python,pandas,numpy,matplotlib,ipython notebook,spyder,anaconda,jupyterlab,data analysis,data visualization,tableau,database,surveys,prototyping,logical data models,data models,requirement elicitation.,leadreship,mysq,team,prioratization,analyze,articulate         
list_of_skills.csv: assign passwords and maintain database access,agile development,agile project methodology,amazon web services (aws),analytics,analytical,analyze and recommend database improvements,analyze impact of database changes to the business,audit database access and requests,apis,application and server monitoring tools,applications,application development,attention to detail,architecture,big data,business analytics,business intelligence,business process modeling,cloud applications,cloud based visualizations,cloud hosting services,cloud maintenance tasks,cloud management tools,cloud platforms,cloud scalability,cloud services,cloud systems administration,code,coding,computer,communication,configure database software,configuration,configuration management,content strategy,content management,continually review processes for improvement ,continuous deployment,continuous integration,critical thinking,customer support,database,data analysis,data analytics,data imports,data imports,data intelligence,data mining,data modeling,data science,data strategy,data storage,data visualization tools,data visualizations,database administration,deploying applications in a cloud environment,deployment automation tools,deployment of cloud services,design,desktop support,design,design and build database management system,design principles,design prototypes,design specifications,design tools,develop and secure network structures,develop and test methods to synchronize data ,developer,development,documentation,emerging technologies,file systems,flexibility,front end design,google analytics,hardware,help desk,identify user needs ,implement backup and recovery plan ,implementation,information architecture,information design,information systems,interaction design,interaction flows,"install, maintain, and merge databases ",installation,integrated technologies,integrating security protocols with cloud design,internet,it optimization,it security,it soft skills,it solutions,it support,languages,logical thinking,leadership,linux,management,messaging,methodology,metrics,microsoft office,migrating existing workloads into cloud systems,mobile applications,motivation,networks,network operations,networking,open source technology integration,operating systems,operations,optimize queries on live data,optimizing user experiences,optimizing website performance,organization,presentation,programming,problem solving,process flows,product design,product development,prototyping methods,product development,product management,product support,product training,project management,repairs,reporting,research emerging technology,responsive design,review existing solutions,search engine optimization (seo),security,self motivated,self starting,servers,software,software development,software engineering,software quality assurance (qa),solid project management capabilities ,solid understanding of company’s data needs ,storage,strong technical and interpersonal communication ,support,systems software,tablets,team building,team oriented,teamwork,technology,tech skills,technical support,technical writing,testing,time management,tools,touch input navigation,training,troubleshooting,troubleshooting break-fix scenarios,user research,user testing,usability,user-centered design,user experience,user flows,user interface,user interaction diagrams,user research,user testing,ui / ux,utilizing cloud automation tools,virtualization,visual design,web analytics,web applications,web development,web design,web technologies,wireframes,work independently

基本上,我希望能够匹配来自CSVI和CSVI的整个字符串。我同意您提到的关于“ABC”和“AB”的内容。我知道两个文件中都有字符串,因为我在源文件中做了相应的更改,故意在运行代码时检查它们是否匹配。回答您的问题:我正在尝试匹配单个单元格。上面的编辑有意义吗?听起来像您想要的吗?它向我抛出了一个错误----------------------------------------------------------------------AttributeError Traceback(最后一次调用)in()6 myskills=f.readlines()7#将列表中的字符串转换为小写-->8 list of myskills=[x.strip().lower(),用于myskills.split(',')]9 set of myskills=set(list of myskills)10#print(type(nodup\u filtered\u content))AttributeError:“list”对象没有属性“split”,用此行替换代码“list_of_myskills=[y.strip().lower()代表x中的x,在x.split(',')]中代表y的myskills]”非常有效。谢谢。
import pandas as pd
myskills = pd.read_csv("skills.csv",header=None)
set_of_my_skills = set(myskills.iloc[0,])
list_of_skills = pd.read_csv("list_of_skills.csv",header=None)
set_of_skills = set(list_of_skills.iloc[0,])
print(set_of_my_skills & set_of_skills)

{'business intelligence', 'design', 'critical thinking', 'data analysis', 'database', 'teamwork'}

skills.csv : critical thinking,identify user needs,business intelligence,business analysis,teamwork,database,data visualization,data analysis,relational database,mysql,oracle sql,design,entity-relationship,develop ,use-cases ,scenarios,project development ,user requirement,design,sequence diagram,state diagram,identifying,uml diagrams,html5,css3,php,clean,analyze,plot,data,python,pandas,numpy,matplotlib,ipython notebook,spyder,anaconda,jupyterlab,data analysis,data visualization,tableau,database,surveys,prototyping,logical data models,data models,requirement elicitation.,leadreship,mysq,team,prioratization,analyze,articulate         
list_of_skills.csv: assign passwords and maintain database access,agile development,agile project methodology,amazon web services (aws),analytics,analytical,analyze and recommend database improvements,analyze impact of database changes to the business,audit database access and requests,apis,application and server monitoring tools,applications,application development,attention to detail,architecture,big data,business analytics,business intelligence,business process modeling,cloud applications,cloud based visualizations,cloud hosting services,cloud maintenance tasks,cloud management tools,cloud platforms,cloud scalability,cloud services,cloud systems administration,code,coding,computer,communication,configure database software,configuration,configuration management,content strategy,content management,continually review processes for improvement ,continuous deployment,continuous integration,critical thinking,customer support,database,data analysis,data analytics,data imports,data imports,data intelligence,data mining,data modeling,data science,data strategy,data storage,data visualization tools,data visualizations,database administration,deploying applications in a cloud environment,deployment automation tools,deployment of cloud services,design,desktop support,design,design and build database management system,design principles,design prototypes,design specifications,design tools,develop and secure network structures,develop and test methods to synchronize data ,developer,development,documentation,emerging technologies,file systems,flexibility,front end design,google analytics,hardware,help desk,identify user needs ,implement backup and recovery plan ,implementation,information architecture,information design,information systems,interaction design,interaction flows,"install, maintain, and merge databases ",installation,integrated technologies,integrating security protocols with cloud design,internet,it optimization,it security,it soft skills,it solutions,it support,languages,logical thinking,leadership,linux,management,messaging,methodology,metrics,microsoft office,migrating existing workloads into cloud systems,mobile applications,motivation,networks,network operations,networking,open source technology integration,operating systems,operations,optimize queries on live data,optimizing user experiences,optimizing website performance,organization,presentation,programming,problem solving,process flows,product design,product development,prototyping methods,product development,product management,product support,product training,project management,repairs,reporting,research emerging technology,responsive design,review existing solutions,search engine optimization (seo),security,self motivated,self starting,servers,software,software development,software engineering,software quality assurance (qa),solid project management capabilities ,solid understanding of company’s data needs ,storage,strong technical and interpersonal communication ,support,systems software,tablets,team building,team oriented,teamwork,technology,tech skills,technical support,technical writing,testing,time management,tools,touch input navigation,training,troubleshooting,troubleshooting break-fix scenarios,user research,user testing,usability,user-centered design,user experience,user flows,user interface,user interaction diagrams,user research,user testing,ui / ux,utilizing cloud automation tools,virtualization,visual design,web analytics,web applications,web development,web design,web technologies,wireframes,work independently