概述
TNTSearch是一款完全用PHP編寫(xiě)的全文搜索(FTS)引擎。只需簡(jiǎn)單配置,你就能在幾分鐘內(nèi)添加出色的搜索體驗(yàn)。
功能包括
- 動(dòng)態(tài)索引更新(無(wú)需每次都重新索引)
- 可通過(guò)Packagist.org輕松部署
我們還創(chuàng)建了一些演示頁(yè)面,展示了n - gram在容錯(cuò)檢索中的實(shí)際應(yīng)用。該軟件包有許多輔助函數(shù),如用于距離計(jì)算的Jaro - Winkler和余弦相似度函數(shù)。它支持英語(yǔ)、克羅地亞語(yǔ)、阿拉伯語(yǔ)、意大利語(yǔ)、俄語(yǔ)、葡萄牙語(yǔ)和烏克蘭語(yǔ)的詞干提取。如果內(nèi)置的詞干提取器不夠用,該引擎允許你輕松插入任何兼容的Snowball詞干提取器。該軟件包的一些分支甚至支持中文。歡迎大家貢獻(xiàn)對(duì)其他語(yǔ)言的支持!
與許多其他引擎不同,TNTSearch的索引可以輕松更新,無(wú)需重新索引或使用增量更新。DigitalOcean
安裝
安裝TNTSearch最簡(jiǎn)單的方法是通過(guò)composer:
composer require teamtnt/tntsearch
要求
在繼續(xù)之前,請(qǐng)確保你的服務(wù)器滿(mǎn)足以下要求:
示例
創(chuàng)建索引
為了能夠進(jìn)行全文搜索查詢(xún),你必須創(chuàng)建一個(gè)索引。 用法:
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig([
'driver' => 'mysql',
'host' => 'localhost',
'database' => 'dbname',
'username' => 'user',
'password' => 'pass',
'storage' => '/var/www/tntsearch/examples/',
'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//可選
]);
$indexer = $tnt->createIndex('name.index');
$indexer->query('SELECT id, article FROM articles;');
//$indexer->setLanguage('german'); $indexer->run();
重要提示:“storage”設(shè)置標(biāo)記了所有索引將保存的文件夾,因此請(qǐng)確保對(duì)該文件夾有寫(xiě)入權(quán)限,否則可能會(huì)拋出以下異常:[PDOException] SQLSTATE[HY000] [14] unable to open database file *
注意:如果你的主鍵不是id
,請(qǐng)按如下方式設(shè)置:
$indexer->setPrimaryKey('article_id');
使主鍵可搜索
默認(rèn)情況下,主鍵是不可搜索的。如果你想讓主鍵可搜索,只需運(yùn)行:
$indexer->includePrimaryKey();
搜索
搜索短語(yǔ)或關(guān)鍵詞非常簡(jiǎn)單:
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$res = $tnt->search("This is a test search", 12);
print_r($res); //返回一個(gè)包含12個(gè)與查詢(xún)最匹配的文檔ID的數(shù)組
// 要顯示結(jié)果,你需要針對(duì)應(yīng)用程序數(shù)據(jù)庫(kù)進(jìn)行額外查詢(xún)
// SELECT * FROM articles WHERE id IN $res ORDER BY FIELD(id, $res);
ORDER BY FIELD
子句很重要,否則數(shù)據(jù)庫(kù)引擎將無(wú)法按要求的順序返回結(jié)果。
布爾搜索
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
// 這將返回所有包含romeo但不包含juliet的文檔
$res = $tnt->searchBoolean("romeo -juliet");
// 返回所有包含romeo或hamlet的文檔
$res = $tnt->searchBoolean("romeo or hamlet");
// 返回所有包含romeo AND juliet或者prince AND hamlet的文檔
$res = $tnt->searchBoolean("(romeo juliet) or (prince hamlet)");
模糊搜索
可以通過(guò)設(shè)置以下成員變量來(lái)調(diào)整模糊度:
public $fuzzy_prefix_length = 2;
public $fuzzy_max_expansions = 50;
public $fuzzy_distance = 2; //代表萊文斯坦距離;
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$tnt->fuzziness(true);
// 當(dāng)模糊度標(biāo)志設(shè)置為true時(shí),關(guān)鍵詞juleit將返回與單詞juliet匹配的文檔,默認(rèn)萊文斯坦距離為2
$res = $tnt->search("juleit");
更新索引
創(chuàng)建索引后,每次對(duì)文檔集合進(jìn)行更改時(shí),無(wú)需重新索引。TNTSearch支持動(dòng)態(tài)索引更新。
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$index = $tnt->getIndex();
// 向索引中插入新文檔
$index->insert(['id' => '11', 'title' => 'new title', 'article' => 'new article']);
// 更新現(xiàn)有文檔
$index->update(11, ['id' => '11', 'title' => 'updated title', 'article' => 'updated article']);
// 從索引中刪除文檔
$index->delete(12);
自定義分詞器
首先,創(chuàng)建你自己的分詞器類(lèi)。它應(yīng)該擴(kuò)展AbstractTokenizer
類(lèi),定義單詞分割的$pattern
值,并且必須實(shí)現(xiàn)TokenizerInterface
接口:
use TeamTNT\TNTSearch\Support\AbstractTokenizer;
use TeamTNT\TNTSearch\Support\TokenizerInterface;
class SomeTokenizer extends AbstractTokenizer implements TokenizerInterface
{
static protected $pattern = '/[\s,\.]+/';
public function tokenize($text) {
return preg_split($this->getPattern(), strtolower($text), -1, PREG_SPLIT_NO_EMPTY);
}
}
這個(gè)分詞器將使用空格、逗號(hào)和句號(hào)來(lái)分割單詞。準(zhǔn)備好分詞器后,你應(yīng)該通過(guò)setTokenizer
方法將其傳遞給TNTIndexer
。
$someTokenizer = new SomeTokenizer;
$indexer = new TNTIndexer;
$indexer->setTokenizer($someTokenizer);
另一種方法是通過(guò)配置傳遞分詞器:
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig([
'driver' => 'mysql',
'host' => 'localhost',
'database' => 'dbname',
'username' => 'user',
'password' => 'pass',
'storage' => '/var/www/tntsearch/examples/',
'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//可選,
'tokenizer' => \TeamTNT\TNTSearch\Support\SomeTokenizer::class
]);
$indexer = $tnt->createIndex('name.index');
$indexer->query('SELECT id, article FROM articles;');
$indexer->run();
地理搜索
索引
$candyShopIndexer = new TNTGeoIndexer;
$candyShopIndexer->loadConfig($config);
$candyShopIndexer->createIndex('candyShops.index');
$candyShopIndexer->query('SELECT id, longitude, latitude FROM candy_shops;');
$candyShopIndexer->run();
搜索
$currentLocation = [
'longitude' => 11.576124,
'latitude' => 48.137154
];
$distance = 2; //千米
$candyShopIndex = new TNTGeoSearch();
$candyShopIndex->loadConfig($config);
$candyShopIndex->selectIndex('candyShops.index');
$candyShops = $candyShopIndex->findNearest($currentLocation, $distance, 10);
分類(lèi)
use TeamTNT\TNTSearch\Classifier\TNTClassifier;
$classifier = new TNTClassifier();
$classifier->learn("A great game", "Sports");
$classifier->learn("The election was over", "Not sports");
$classifier->learn("Very clean match", "Sports");
$classifier->learn("A clean but forgettable game", "Sports");
$guess = $classifier->predict("It was a close election");
var_dump($guess['label']); //返回 "Not sports"
保存分類(lèi)器
$classifier->save('sports.cls');
加載分類(lèi)器
$classifier = new TNTClassifier();
$classifier->load('sports.cls');
閱讀原文:原文鏈接
該文章在 2025/4/3 18:32:49 編輯過(guò)