Python 将列表合并到单列数据帧时无值填充_Python_Pandas_Dataframe

Python 将列表合并到单列数据帧时无值填充

python pandas dataframe

Python 将列表合并到单列数据帧时无值填充,python,pandas,dataframe,Python,Pandas,Dataframe,在将数据列表合并到单列数据帧中时，填充了不需要的“无”值。我已经对原始数据进行了NLTK转换 Mycode def apwords(words): filtered_sentence = [] words = word_tokenize(words) for w in words: filtered_sentence.append(w) return filtered_sentence addwords = lambda x: apwords(x)

在将数据列表合并到单列数据帧中时，填充了不需要的“无”值。我已经对原始数据进行了NLTK转换

Mycode

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: apwords(x)
clean = data['Clean_addr'].apply(addwords)


clean =list(clean)
bigram = Phrases(clean, min_count=150, threshold=2)
bigrams = Phraser(bigram)

x=[]
for i in clean:
    x.append(bigrams[i])
y=pd.DataFrame(x)
data['Phrases_Clean_Addr']=y.apply(lambda x: ' '.join(x.astype(str)), axis=1)

robeco des_voeux rd central f man yee building room central 
nikko asset management hk limi f man yee building des_voeux rd central 
cfa institute office f man yee building des_voeux rd central 
victon registrations ltd room f regent centre queens_rd central central 
ding fung ltd room crawford house queens_rd central central 
quam ltd queens_rd central th th floors china building

清除数据输出

   [['robeco', 'des','voeux', 'rd','central','f','man','yee','building','room','central'],
 ['nikko','asset','management','hk','limi','f','man','yee','building','des','voeux','rd','central'],
 ['cfa','institute','office','f','man','yee','building','des','voeux','rd','central'],
 ['victon','registrations','ltd','room','f','regent','centre','queens','rd','central','central'],
 ['ding','fung','ltd','room','crawford','house','queens','rd','central','central'],
 ['quam','ltd','queens','rd','central','th','th','floors','china','building']
 ['f', 'des', 'voeux', 'rd', 'central'],
 ['f', 'wincome', 'centre', 'des', 'voeux', 'rd', 'central'],
 ['ags', 'f', 'chuangs', 'tower', 'connaught', 'rd', 'central']]

我的当前输出

robeco des_voeux rd central f man yee building room central None None None None None None None None None None
nikko asset management hk limi f man yee building des_voeux rd central None None None None None None None None
cfa institute office f man yee building des_voeux rd central None None None None None None None None None None
victon registrations ltd room f regent centre queens_rd central central None None None None None None None None None None
ding fung ltd room crawford house queens_rd central central None None None None None None None None None None None
quam ltd queens_rd central th th floors china building None None None None None None None None None None None
canara bank aon china bldng queens_rd centeal central None None None None None None None None None None None None
gia room f aon china building queens_rd central None None None None None None None None None None None None
zaaba capital ltd_unit b f china building queens_rd central None None None None None None None None None None None
firestar diamond hk nd_floor new henry house ice house rd None None None None None None None None None None

预期产出

   [['robeco', 'des','voeux', 'rd','central','f','man','yee','building','room','central'],
 ['nikko','asset','management','hk','limi','f','man','yee','building','des','voeux','rd','central'],
 ['cfa','institute','office','f','man','yee','building','des','voeux','rd','central'],
 ['victon','registrations','ltd','room','f','regent','centre','queens','rd','central','central'],
 ['ding','fung','ltd','room','crawford','house','queens','rd','central','central'],
 ['quam','ltd','queens','rd','central','th','th','floors','china','building']
 ['f', 'des', 'voeux', 'rd', 'central'],
 ['f', 'wincome', 'centre', 'des', 'voeux', 'rd', 'central'],
 ['ags', 'f', 'chuangs', 'tower', 'connaught', 'rd', 'central']]

附加到数据帧的所有None值都不应该存在

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: apwords(x)
clean = data['Clean_addr'].apply(addwords)


clean =list(clean)
bigram = Phrases(clean, min_count=150, threshold=2)
bigrams = Phraser(bigram)

x=[]
for i in clean:
    x.append(bigrams[i])
y=pd.DataFrame(x)
data['Phrases_Clean_Addr']=y.apply(lambda x: ' '.join(x.astype(str)), axis=1)

robeco des_voeux rd central f man yee building room central 
nikko asset management hk limi f man yee building des_voeux rd central 
cfa institute office f man yee building des_voeux rd central 
victon registrations ltd room f regent centre queens_rd central central 
ding fung ltd room crawford house queens_rd central central 
quam ltd queens_rd central th th floors china building

这是预期的行为，因为您从大小不等的列表列表中创建了数据帧。在您的示例中，x中列表的最大长度为13。因此，数据帧y包含13列。对于少于13个条目的任何行的元素，都会填充NA值

要获得您请求的输出，只需将dropna添加到apply函数中

data['Phrases_Clean_Addr']=y.apply(lambda x: ' '.join(x.dropna().astype(str)), axis=1)

所以完整的解决方案是

x = [['robeco', 'des','voeux', 'rd','central','f','man','yee','building','room','central'],['nikko','asset','management','hk','limi','f','man','yee','building','des','voeux','rd','central'],['cfa','institute','office','f','man','yee','building','des','voeux','rd','central'],['victon','registrations','ltd','room','f','regent','centre','queens','rd','central','central'],['ding','fung','ltd','room','crawford','house','queens','rd','central','central'],['quam','ltd','queens','rd','central','th','th','floors','china','building'],['f', 'des', 'voeux', 'rd', 'central'],['f', 'wincome', 'centre', 'des', 'voeux', 'rd', 'central'],['ags', 'f', 'chuangs', 'tower', 'connaught', 'rd', 'central']]

y = pd.DataFrame(x)

z = y.apply(lambda x: ' '.join(x.dropna().astype(str)), axis=1)

>>> z.values
   array(['robeco des voeux rd central f man yee building room central',
   'nikko asset management hk limi f man yee building des voeux rd central',
   'cfa institute office f man yee building des voeux rd central',
   'victon registrations ltd room f regent centre queens rd central central',
   'ding fung ltd room crawford house queens rd central central',
   'quam ltd queens rd central th th floors china building',
   'f des voeux rd central', 'f wincome centre des voeux rd central',
   'ags f chuangs tower connaught rd central'], dtype=object)

向我们展示数据…@Stephernauch，添加到问题中请格式化数据以便运行。IE

clean=…

@stephernauch，完成问题中更新的clean输出您的数据包含语法错误，并且代码具有未定义的类引用。如果你创建一个例子，你会得到更多更好的答案。尤其要确保输入的数据和预期的数据是完整的（不是伪数据），并且可以很容易地剪切和粘贴到编辑器中，以便测试建议的解决方案。@Rahulrajan请参阅上面的完整解决方案。请说明我是否误读了你的文章，因为这是我的解释。