字符串与Python的相似性+；Sqlite（Levenshtein距离/编辑距离）_Python_Sqlite_String Comparison_Similarity

字符串与Python的相似性+；Sqlite（Levenshtein距离/编辑距离）

python sqlite

字符串与Python的相似性+；Sqlite（Levenshtein距离/编辑距离）,python,sqlite,string-comparison,similarity,Python,Sqlite,String Comparison,Similarity,Python+Sqlite中是否有可用的字符串相似性度量，例如使用sqlite3模块用例示例： import sqlite3 conn = sqlite3.connect(':memory:') c = conn.cursor() c.execute('CREATE TABLE mytable (id integer, description text)') c.execute('INSERT INTO mytable VALUES (1, "hello world, guys")') c.e

Python+Sqlite中是否有可用的字符串相似性度量，例如使用

sqlite3

模块

用例示例：

import sqlite3
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('CREATE TABLE mytable (id integer, description text)')
c.execute('INSERT INTO mytable VALUES (1, "hello world, guys")')
c.execute('INSERT INTO mytable VALUES (2, "hello there everybody")')

此查询应匹配ID为1的行，但不匹配ID为2的行：

c.execute('SELECT * FROM mytable WHERE dist(description, "He lo wrold gyus") < 6')

但我没有发现字符串比较有这样的“相似距离”，FTS的特征

匹配

或

接近

似乎没有字母变化的相似性度量，等等

而且表明：

SQLite的FTS引擎基于令牌——搜索引擎试图匹配的关键字。
有各种各样的标记化器可用，但它们相对简单。“simple”标记器简单地将每个单词拆分并小写：例如，在字符串“The quick brown fox jumps over The lazy dog”中，“jumps”一词将匹配，但不匹配“jump”。“porter”标记器更高级一些，去掉了单词的变位，这样“jumps”和“jumping”就可以匹配了，但是像“jmups”这样的打字错误就不匹配了。

遗憾的是，后者（无法找到“jmups”与“jumps”类似的事实）使我的用例不实用

test.py

import sqlite3
db = sqlite3.connect(':memory:')
db.enable_load_extension(True)
db.load_extension('./spellfix')                 # for Linux
#db.load_extension('./spellfix.dll')            # <-- UNCOMMENT HERE FOR WINDOWS
db.enable_load_extension(False)
c = db.cursor()
c.execute('CREATE TABLE mytable (id integer, description text)')
c.execute('INSERT INTO mytable VALUES (1, "hello world, guys")')
c.execute('INSERT INTO mytable VALUES (2, "hello there everybody")')
c.execute('SELECT * FROM mytable WHERE editdist3(description, "hel o wrold guy") < 600')
print c.fetchall()
# Output: [(1, u'hello world, guys')]

gcc-g-shared spellfix.c-I~/sqlite-amalgation-3230100/-o spellfix.dll

sqlite3

pip安装——升级pysqlite

导入sqlite3；打印sqlite3。sqlite_版本

wget https://www.sqlite.org/src/tarball/27392118/SQLite-27392118.tar.gz
tar xvfz SQLite-27392118.tar.gz
cd SQLite-27392118 ; sh configure ; make sqlite3.c ; cd ..
gcc -g -fPIC -shared SQLite-27392118/ext/misc/spellfix.c -I SQLite-27392118/src/ -o spellfix.so
python test.py   # [(1, u'hello world, guys')]

call "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat"

call "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat" x64
cl /I sqlite-amalgamation-3110100/ sqlite-src-3110100/ext/misc/spellfix.c /link /DLL /OUT:spellfix.dll
python test.py

apt-get -y install unzip build-essential libsqlite3-dev
wget https://sqlite.org/2016/sqlite-src-3110100.zip
unzip sqlite-src-3110100.zip
gcc -shared -fPIC -Wall -Isqlite-src-3110100 sqlite-src-3110100/ext/misc/spellfix.c -o spellfix.so
python test.py

wget https://www.sqlite.org/src/tarball/27392118/SQLite-27392118.tar.gz
tar xvfz SQLite-27392118.tar.gz
cd SQLite-27392118 ; sh configure ; make sqlite3.c ; cd ..
gcc -g -fPIC -shared SQLite-27392118/ext/misc/spellfix.c -I SQLite-27392118/src/ -o spellfix.so
python test.py   # [(1, u'hello world, guys')]