C++ wifstream与imbue、locale产生valgrind错误

C++ wifstream与imbue、locale产生valgrind错误,c++,locale,ubuntu-12.04,wifstream,C++,Locale,Ubuntu 12.04,Wifstream,我用ngrams实现了一个语言检测器,到目前为止所有的工作都很好。为了检测一组语言,我有一组语言相关的ngrams文件,用于在实际检测开始之前检测器需要读入的每种支持的语言 为了读取这些文件,我设置了系统默认语言环境(在我的ubuntu机器上是en_US.UTF-8),如下所示。这些代码段位于mylanguage\u identifier构造函数中: std::locale def_lc(""); // --- line 37 (see valgrind) const utf8_codecvt_

我用ngrams实现了一个语言检测器,到目前为止所有的工作都很好。为了检测一组语言,我有一组语言相关的ngrams文件,用于在实际检测开始之前检测器需要读入的每种支持的语言

为了读取这些文件,我设置了系统默认语言环境(在我的ubuntu机器上是en_US.UTF-8),如下所示。这些代码段位于my
language\u identifier
构造函数中:

std::locale def_lc(""); // --- line 37 (see valgrind)
const utf8_codecvt_t &utf8_codecvt = std::use_facet<utf8_codecvt_t>(def_lc);
std::locale utf8_locale(def_lc, &utf8_codecvt);
执行检测器时,valgrind会给出以下输出:

==21669== Memcheck, a memory error detector
==21669== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==21669== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21669== Command: ./language_identifier
==21669== 
==21669== Invalid read of size 8
==21669==    at 0x56E5E08: wcscmp (wcscmp.S:479)
==21669==    by 0x4EA2113: std::moneypunct<wchar_t, false>::~moneypunct() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4EA2198: std::moneypunct<wchar_t, false>::~moneypunct() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E96A79: std::locale::_Impl::~_Impl() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E96C4C: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x40429E: language_identifier::language_identifier() (language_identifier.cpp:137)
==21669==    by 0x409802: Singleton<language_identifier>::instance() (Singleton.h:29)
==21669==    by 0x4050C1: main (language_identifier.cpp:270)
==21669==  Address 0x5a07248 is 0 bytes after a block of size 8 alloc'd
==21669==    at 0x4C2AC27: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21669==    by 0x4EA1DED: std::moneypunct<wchar_t, false>::_M_initialize_moneypunct(__locale_struct*, char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E9911E: std::locale::_Impl::_Impl(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E9965E: std::locale::locale(char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x403C36: language_identifier::language_identifier() (language_identifier.cpp:37)
==21669==    by 0x409802: Singleton<language_identifier>::instance() (Singleton.h:29)
==21669==    by 0x4050C1: main (language_identifier.cpp:270)
==21669== 
==21669== Invalid read of size 8
==21669==    at 0x56E5E08: wcscmp (wcscmp.S:479)
==21669==    by 0x4EA2003: std::moneypunct<wchar_t, true>::~moneypunct() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4EA2088: std::moneypunct<wchar_t, true>::~moneypunct() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E96A79: std::locale::_Impl::~_Impl() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E96C4C: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x40429E: language_identifier::language_identifier() (language_identifier.cpp:137)
==21669==    by 0x409802: Singleton<language_identifier>::instance() (Singleton.h:29)
==21669==    by 0x4050C1: main (language_identifier.cpp:270)
==21669==  Address 0x5a07478 is 0 bytes after a block of size 8 alloc'd
==21669==    at 0x4C2AC27: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21669==    by 0x4EA17FD: std::moneypunct<wchar_t, true>::_M_initialize_moneypunct(__locale_struct*, char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E9916B: std::locale::_Impl::_Impl(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E9965E: std::locale::locale(char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x403C36: language_identifier::language_identifier() (language_identifier.cpp:37)
==21669==    by 0x409802: Singleton<language_identifier>::instance() (Singleton.h:29)
==21669==    by 0x4050C1: main (language_identifier.cpp:270)
==21669== 

--- lang is en

--- lang is zh

--- lang is de

--- lang is ja

--- lang is ja

--- lang is zh

--- lang is zh

--- T1: lang is de

--- T2: lang is de
==21669== 
==21669== HEAP SUMMARY:
==21669==     in use at exit: 0 bytes in 0 blocks
==21669==   total heap usage: 366,286 allocs, 366,286 frees, 17,016,689 bytes allocated
==21669== 
==21669== All heap blocks were freed -- no leaks are possible
==21669== 
==21669== For counts of detected and suppressed errors, rerun with: -v
==21669== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 2)

谢谢你的提示

对该问题的进一步调查导致我在redhat/fedora上提交了一个bug,其中报告了一个类似的问题,请参阅glibc glibc-2.14(至少在x86_64上)中的
wcscmp()使用MMX,该MMX可能会读取它正在检查的输入字符串的末尾。valgrind不喜欢这样。
所以这似乎是valgrind的一个问题?!我编译了最新的valgrind版本(使用3.8.1),但问题仍然存在。
==21669== Memcheck, a memory error detector
==21669== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==21669== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21669== Command: ./language_identifier
==21669== 
==21669== Invalid read of size 8
==21669==    at 0x56E5E08: wcscmp (wcscmp.S:479)
==21669==    by 0x4EA2113: std::moneypunct<wchar_t, false>::~moneypunct() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4EA2198: std::moneypunct<wchar_t, false>::~moneypunct() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E96A79: std::locale::_Impl::~_Impl() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E96C4C: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x40429E: language_identifier::language_identifier() (language_identifier.cpp:137)
==21669==    by 0x409802: Singleton<language_identifier>::instance() (Singleton.h:29)
==21669==    by 0x4050C1: main (language_identifier.cpp:270)
==21669==  Address 0x5a07248 is 0 bytes after a block of size 8 alloc'd
==21669==    at 0x4C2AC27: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21669==    by 0x4EA1DED: std::moneypunct<wchar_t, false>::_M_initialize_moneypunct(__locale_struct*, char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E9911E: std::locale::_Impl::_Impl(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E9965E: std::locale::locale(char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x403C36: language_identifier::language_identifier() (language_identifier.cpp:37)
==21669==    by 0x409802: Singleton<language_identifier>::instance() (Singleton.h:29)
==21669==    by 0x4050C1: main (language_identifier.cpp:270)
==21669== 
==21669== Invalid read of size 8
==21669==    at 0x56E5E08: wcscmp (wcscmp.S:479)
==21669==    by 0x4EA2003: std::moneypunct<wchar_t, true>::~moneypunct() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4EA2088: std::moneypunct<wchar_t, true>::~moneypunct() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E96A79: std::locale::_Impl::~_Impl() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E96C4C: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x40429E: language_identifier::language_identifier() (language_identifier.cpp:137)
==21669==    by 0x409802: Singleton<language_identifier>::instance() (Singleton.h:29)
==21669==    by 0x4050C1: main (language_identifier.cpp:270)
==21669==  Address 0x5a07478 is 0 bytes after a block of size 8 alloc'd
==21669==    at 0x4C2AC27: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21669==    by 0x4EA17FD: std::moneypunct<wchar_t, true>::_M_initialize_moneypunct(__locale_struct*, char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E9916B: std::locale::_Impl::_Impl(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x4E9965E: std::locale::locale(char const*) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==21669==    by 0x403C36: language_identifier::language_identifier() (language_identifier.cpp:37)
==21669==    by 0x409802: Singleton<language_identifier>::instance() (Singleton.h:29)
==21669==    by 0x4050C1: main (language_identifier.cpp:270)
==21669== 

--- lang is en

--- lang is zh

--- lang is de

--- lang is ja

--- lang is ja

--- lang is zh

--- lang is zh

--- T1: lang is de

--- T2: lang is de
==21669== 
==21669== HEAP SUMMARY:
==21669==     in use at exit: 0 bytes in 0 blocks
==21669==   total heap usage: 366,286 allocs, 366,286 frees, 17,016,689 bytes allocated
==21669== 
==21669== All heap blocks were freed -- no leaks are possible
==21669== 
==21669== For counts of detected and suppressed errors, rerun with: -v
==21669== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 2)
#include <iostream>
#include <locale>

int main (int argc, char **argv) {
    try {
        std::locale * l1 = new std::locale("de_DE.UTF-8");
        delete l1;

        std::locale l2("de_DE.UTF-8");

    } catch(...) {
        return 0;
    }
    return 0;
};
$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu EGLIBC 2.15-0ubuntu10.3) stable release version 2.15, by Roland McGrath et al.
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.6.3.
Compiled on a Linux 3.2.30 system on 2012-10-05.
Available extensions:
    crypt add-on version 2.1 by Michael Glad and others
    GNU Libidn by Simon Josefsson
    Native POSIX Threads Library by Ulrich Drepper et al
    BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC