Pythonic从另一个列表中的两个列表中查找两个项目的方法
我有一些twitter数据,我将文本分为带有快乐表情和悲伤表情的文本,优雅地和pythonical地如下所示:Pythonic从另一个列表中的两个列表中查找两个项目的方法,python,list,Python,List,我有一些twitter数据,我将文本分为带有快乐表情和悲伤表情的文本,优雅地和pythonical地如下所示: happy_set = [":)",":-)","=)",":D",":-D","=D"] sad_set = [":(",":-(","=("] happy = [tweet.split() for tweet in data for face in happy_set if face in tweet] sad = [tweet.split() for tweet in data
happy_set = [":)",":-)","=)",":D",":-D","=D"]
sad_set = [":(",":-(","=("]
happy = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad = [tweet.split() for tweet in data for face in sad_set if face in tweet]
然而,这是可行的,可能在一条推文中就可以找到来自happy_集和sad_集的表情符号。什么是pythonic方法来确保快乐列表只包含快乐集中的表情符号,反之亦然?您正在寻找的是什么
happy_set = set([":)",":-)","=)",":D",":-D","=D"])
sad_set = set([":(",":-(","=("])
happy_maybe_sad = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad_maybe_happy = [tweet.split() for tweet in data for face in sad_set if face in tweet]
happy = [item for item in happy_maybe_sad if not in sad_maybe_happy]
sad = [item for item in sad_maybe_happy if not in happy_maybe_sad]
为了快乐。。。悲哀的是,我坚持使用列表解决方案,因为项目的顺序可能是相关的。如果不是,它可能会更好地用于表演。添加、集合是否已经提供了并集、交集等。您正在寻找的是
happy_set = set([":)",":-)","=)",":D",":-D","=D"])
sad_set = set([":(",":-(","=("])
happy_maybe_sad = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad_maybe_happy = [tweet.split() for tweet in data for face in sad_set if face in tweet]
happy = [item for item in happy_maybe_sad if not in sad_maybe_happy]
sad = [item for item in sad_maybe_happy if not in happy_maybe_sad]
为了快乐。。。悲哀的是,我坚持使用列表解决方案,因为项目的顺序可能是相关的。如果不是,它可能会更好地用于表演。是添加项,集合已经提供了并集、交集等。您可以尝试使用集合,特别是set.isdisjoint。检查快乐推特中的标记集是否与悲伤推特集中的标记集不相交。如果是这样,它肯定属于快乐:
您可以尝试使用set,特别是set.isdisjoint。检查快乐推特中的标记集是否与悲伤推特集中的标记集不相交。如果是这样,它肯定属于快乐: 我会使用lambdas:
>>> is_happy = lambda tweet: any(map(lambda x: x in happy_set, tweet.split()))
>>> is_sad = lambda tweet: any(map(lambda x: x in sad_set, tweet.split()))
>>> data = ["Hi, I am sad :( but don't worry =D", "Happy day :-)", "Boooh :-("]
>>> filter(lambda tweet: is_happy(tweet) and not is_sad(tweet), data)
['Happy day :-)']
>>> filter(lambda tweet: is_sad(tweet) and not is_happy(tweet), data)
['Boooh :-(']
这将避免创建数据的中间副本
如果数据真的很大,您可以用ITERTOOL包中的ifilter替换filter,以获得迭代器而不是列表。我将使用lambdas:
>>> is_happy = lambda tweet: any(map(lambda x: x in happy_set, tweet.split()))
>>> is_sad = lambda tweet: any(map(lambda x: x in sad_set, tweet.split()))
>>> data = ["Hi, I am sad :( but don't worry =D", "Happy day :-)", "Boooh :-("]
>>> filter(lambda tweet: is_happy(tweet) and not is_sad(tweet), data)
['Happy day :-)']
>>> filter(lambda tweet: is_sad(tweet) and not is_happy(tweet), data)
['Boooh :-(']
这将避免创建数据的中间副本
如果数据真的很大,你可以用ITERTOOL包中的ifilter替换filter,得到一个迭代器而不是列表。你只想要快乐中的快乐。“悲伤中只有悲伤,放弃快乐和悲伤?”Sylvain Leroux,你真的可以在{happy,sad}集和tweet上做一个交集吗?对于这样的问题,你应该提供一个交集,因为它可能有助于测试和消除问题的歧义。你只想要快乐中的快乐。“只有在悲伤中悲伤,抛弃快乐和悲伤吗?”西尔万·勒鲁,这确实可以在{快乐,悲伤}上做一个交叉点_设置和tweet?对于此类问题,您应该提供一个,因为它可能有助于测试和消除问题的歧义。这会引发错误:AttributeError:“list”对象没有属性“isdisjoint”这会引发错误:AttributeError:“list”对象没有属性“isdisjoint”