Compare commits

..

No commits in common. "b4320f611b4581eec48014b080e006f838d90c8a" and "785a517d62e94338825fa8177f32b9ddd27c0f00" have entirely different histories.

14 mengubah file dengan 81 tambahan dan 162 penghapusan

Melihat File

@ -1,15 +1,4 @@
# looqs: Release notes
## 2022-10-22 - v0.8
CHANGES:
- For new, not previously indexed files, start creating an additional index using sqlite's experimental trigram tokenizer. Thanks to that, we can now match substrings >= 3 of an unicode sequence. Results of the usual index are prioritized.
- GUI: Ensure order of previews matches ranking exactly. Previously, it depended simply on the time preview generators took, i. e. it was more or less a race.
- Report progress more often during indexing, so users don't get the impression that it's stuck when processing dirs with large documents.
- Fix a regression that caused phrase queries to be broken
- Minor improvements
- Add packages: Ubuntu 22.10.
## 2022-09-10 - v0.7
CHANGES:

Melihat File

@ -1,7 +1,12 @@
# looqs - Hacking
## Introduction
If you are interested on how to contribute, please see the file [CONTRIBUTING.md](CONTRIBUTING.md) which contains the instructions on how to submit patches etc.
Without elaborating here, I hacked looqs because I was not satisfied with the state of desktop search on Linux.
Originally a set of CLI python scripts, it is now written in C++ and offers a GUI made using Qt. While a "web app" would have been an option, I prefer a desktop application for something like looqs. I chose Qt because I am more familiar with it than with any other GUI framework. To my knowledge, potential alternatives like GTK do not include as many "batteries" as Qt anyway, so the job presumably would have been harder there,
at least for me.
If you are interested in how to contribute, please see the file [CONTRIBUTING.md](CONTRIBUTING.md) which contains the instructions on how to submit patches etc.
## Security
The architecture ensures that the parsing of documents and the preview generation is sandboxed by [exile.h](https://github.com/quitesimpleorg/exile.h). looqs uses a multi-process architecture to achieve this.
@ -11,8 +16,7 @@ Qt code is considered trusted in this model. While one may critize this, it was
Set the enviornment variable `LOOQS_DISABLE_SANDBOX=1` to disable sandboxing. It's intended for troublehshooting.
## Database
The heart is sqlite, with the FTS5 extensions behind the full-text search. While FTS may not be sqlite's strong suit, I definitely did not want to run one of those oftenly recommended heavy (Java based) solutions. I explored other options like Postgresql, I've discard them due to some limitations back then. It's also natural to use sqlite as it's
used for metadata in general.
The heart is sqlite, with the FTS5 extensions behind the full-text search. While FTS may not be sqlite's strong suit, I definitly did not want to run one of those oftenly recommended heavy (Java based) solutions. I explored other options like Postgresql, I've discard them due to some limitations back then.
Down the road, alternatives will be explored of course if sqlite should not suffice anymore.
@ -23,7 +27,7 @@ looqs simply strips the tags and that seems to work fine so far. Naturally, this
Naturally, looqs won't be able to index and render previews for everything. Such approach would create a huge bloated binary. In the future, there will be some plugin system of some sorts, either we will load .so objects or use subprocesses.
## Name
looqs looks for files. You as the user can also look inside them. The 'k' in "looks" was replaced by a 'q'. Originally, I wanted my projects to have "qs" (for quitesimple) in their name. While that quirk is abandoned now, this got us to looqs.
looqs looks for files. You as the user can also look inside them. The 'k' in "looks" was replaced by a 'q'. Originally, I wanted my projects to have "qs" (for quitesimple) in their name. While abandoned now, this got us to looqs.

Melihat File

@ -28,12 +28,9 @@ There is no need to write the long form of filters. There are also booleans avai
The screenshots in this section may occasionally be slightly outdated, but they are usually recent enough to get an overall impression of the current state of the GUI.
## Current status
Latest version: 2022-10-22, v0.8
Latest version: 2022-09-10, v0.7
Please keep in mind: looqs is still at an early stage and may exhibit some weirdness and contain bugs.
Please see [Changelog](CHANGELOG.md) for a human readable list of changes. For download instructions, see
further down this document.
Please see [Changelog](CHANGELOG.md) for a human readable list of changes.
## Goals and principles
@ -97,7 +94,7 @@ The GUI is located in `gui/looqs-gui`, the binary for the CLI is in `cli/looqs`
## Packages
At this point, looqs is not in any official distro package repo, but I maintain some packages.
### Ubuntu 22.04, 22.10
### Ubuntu 22.04
Latest release can be installed using apt from the repo.
```
# First, obtain key, assume it's trusted.

Melihat File

@ -643,6 +643,7 @@ void MainWindow::previewReceived(QSharedPointer<PreviewResult> preview, unsigned
{
QString docPath = preview->getDocumentPath();
auto previewPage = preview->getPage();
ClickLabel *headerLabel = new ClickLabel();
headerLabel->setText(QString("Path: ") + preview->getDocumentPath());
@ -684,24 +685,7 @@ void MainWindow::previewReceived(QSharedPointer<PreviewResult> preview, unsigned
previewWidget->setLayout(previewLayout);
QBoxLayout *layout = static_cast<QBoxLayout *>(ui->scrollAreaWidgetContents->layout());
int pos = previewOrder[docPath + QString::number(previewPage)];
if(pos <= layout->count())
{
layout->insertWidget(pos, previewWidget);
for(auto it = previewWidgetOrderCache.constKeyValueBegin();
it != previewWidgetOrderCache.constKeyValueEnd(); it++)
{
if(it->first <= layout->count())
{
layout->insertWidget(it->first, it->second);
}
}
}
else
{
previewWidgetOrderCache[pos] = previewWidget;
}
ui->scrollAreaWidgetContents->layout()->addWidget(previewWidget);
}
}
@ -954,10 +938,6 @@ void MainWindow::makePreviews(int page)
renderConfig.scaleY = QGuiApplication::primaryScreen()->physicalDotsPerInchY() * (currentScale / 100.);
renderConfig.wordsToHighlight = wordsToHighlight;
this->previewOrder.clear();
this->previewWidgetOrderCache.clear();
int previewPos = 0;
QVector<RenderTarget> targets;
for(SearchResult &sr : this->previewableSearchResults)
{
@ -972,7 +952,6 @@ void MainWindow::makePreviews(int page)
renderTarget.path = sr.fileData.absPath;
renderTarget.page = (int)sr.page;
targets.append(renderTarget);
this->previewOrder[renderTarget.path + QString::number(renderTarget.page)] = previewPos++;
}
int numpages = ceil(static_cast<double>(targets.size()) / previewsPerPage);
ui->spinPreviewPage->setMaximum(numpages);

Melihat File

@ -23,6 +23,16 @@ class MainWindow : public QMainWindow
{
Q_OBJECT
public:
explicit MainWindow(QWidget *parent, QString socketPath);
~MainWindow();
signals:
void beginSearch(const QString &query);
void startPdfPreviewGeneration(QVector<SearchResult> paths, double scalefactor);
protected:
void closeEvent(QCloseEvent *event) override;
private:
DatabaseFactory *dbFactory;
SqliteDbService *dbService;
@ -30,40 +40,37 @@ class MainWindow : public QMainWindow
IPCPreviewClient ipcPreviewClient;
QThread ipcClientThread;
QThread syncerThread;
Indexer *indexer;
IndexSyncer *indexSyncer;
QProgressDialog progressDialog;
Indexer *indexer;
QFileIconProvider iconProvider;
bool previewDirty;
QSqlDatabase db;
QFutureWatcher<QVector<SearchResult>> searchWatcher;
void add(QString path, unsigned int page);
QVector<SearchResult> previewableSearchResults;
LooqsQuery contentSearchQuery;
QVector<QString> searchHistory;
int currentSearchHistoryIndex = 0;
QString currentSavedSearchText;
QHash<QString, int> previewOrder; /* Quick lookup for the order a preview should have */
QMap<int, QWidget *>
previewWidgetOrderCache /* Saves those that arrived out of order to be inserted later at the correct pos */;
bool previewDirty;
int previewsPerPage;
unsigned int processedPdfPreviews;
unsigned int currentPreviewGeneration = 1;
void connectSignals();
void makePreviews(int page);
bool previewTabActive();
bool indexerTabActive();
void keyPressEvent(QKeyEvent *event) override;
unsigned int processedPdfPreviews;
void handleSearchResults(const QVector<SearchResult> &results);
void handleSearchError(QString error);
LooqsQuery contentSearchQuery;
int previewsPerPage;
void createSearchResutlMenu(QMenu &menu, const QFileInfo &fileInfo);
void openDocument(QString path, int num);
void openFile(QString path);
unsigned int currentPreviewGeneration = 1;
void initSettingsTabs();
int currentSelectedScale();
void processShortcut(int key);
bool eventFilter(QObject *object, QEvent *event);
QVector<QString> searchHistory;
int currentSearchHistoryIndex = 0;
QString currentSavedSearchText;
private slots:
void lineEditReturnPressed();
void treeSearchItemActivated(QTreeWidgetItem *item, int i);
@ -83,16 +90,6 @@ class MainWindow : public QMainWindow
void startIpcPreviews(RenderConfig config, const QVector<RenderTarget> &targets);
void stopIpcPreviews();
void beginIndexSync();
public:
explicit MainWindow(QWidget *parent, QString socketPath);
~MainWindow();
signals:
void beginSearch(const QString &query);
void startPdfPreviewGeneration(QVector<SearchResult> paths, double scalefactor);
protected:
void closeEvent(QCloseEvent *event) override;
};
#endif // MAINWINDOW_H

Melihat File

@ -1,24 +1,20 @@
#include "../shared/common.h"
#include "previewgenerator.h"
#include <QMutexLocker>
#include "previewgeneratorpdf.h"
#include "previewgeneratorplaintext.h"
#include "previewgeneratorodt.h"
static PreviewGenerator *plainTextGenerator = new PreviewGeneratorPlainText();
static QHash<QString, PreviewGenerator *> generators{
static QMap<QString, PreviewGenerator *> generators{
{"pdf", new PreviewGeneratorPdf()}, {"txt", plainTextGenerator}, {"md", plainTextGenerator},
{"py", plainTextGenerator}, {"java", plainTextGenerator}, {"js", plainTextGenerator},
{"cpp", plainTextGenerator}, {"c", plainTextGenerator}, {"sql", plainTextGenerator},
{"odt", new PreviewGeneratorOdt()}};
static QMutex generatorsMutex;
PreviewGenerator *PreviewGenerator::get(QFileInfo &info)
{
QMutexLocker locker(&generatorsMutex);
PreviewGenerator *result = generators.value(info.suffix(), nullptr);
locker.unlock();
if(result == nullptr)
{
if(Common::isTextFile(info))

Melihat File

@ -7,12 +7,10 @@ static QMutex cacheMutex;
Poppler::Document *PreviewGeneratorPdf::document(QString path)
{
QMutexLocker locker(&cacheMutex);
if(documentcache.contains(path))
{
return documentcache.value(path);
}
locker.unlock();
Poppler::Document *result = Poppler::Document::load(path);
if(result == nullptr)
{
@ -21,7 +19,7 @@ Poppler::Document *PreviewGeneratorPdf::document(QString path)
}
result->setRenderHint(Poppler::Document::TextAntialiasing);
locker.relock();
QMutexLocker locker(&cacheMutex);
documentcache.insert(path, result);
locker.unlock();
return result;

Melihat File

@ -152,14 +152,12 @@ void Indexer::processFileScanResult(FileScanResult result)
++this->currentIndexResult.erroredPaths;
}
QTime currentTime = QTime::currentTime();
if(currentScanProcessedCount++ == progressReportThreshold || this->lastProgressReportTime.secsTo(currentTime) >= 10)
if(currentScanProcessedCount++ == progressReportThreshold)
{
emit indexProgress(this->currentIndexResult.total(), this->currentIndexResult.addedPaths,
this->currentIndexResult.skippedPaths, this->currentIndexResult.erroredPaths,
this->dirScanner->pathCount());
currentScanProcessedCount = 0;
this->lastProgressReportTime = currentTime;
}
}

Melihat File

@ -72,8 +72,6 @@ class Indexer : public QObject
IndexResult currentIndexResult;
void launchWorker(ConcurrentQueue<QString> &queue, int batchsize);
QTime lastProgressReportTime = QTime::currentTime();
public:
bool isRunning();

Melihat File

@ -1,3 +0,0 @@
CREATE VIRTUAL TABLE fts_trigram USING fts5(content, content='',tokenize="trigram");
ALTER TABLE content ADD COLUMN fts_trigramid integer;
CREATE INDEX content_fts_trigramid ON content (fts_trigramid);

Melihat File

@ -3,6 +3,5 @@
<file>1.sql</file>
<file>2.sql</file>
<file>3.sql</file>
<file>4.sql</file>
</qresource>
</RCC>

Melihat File

@ -110,44 +110,6 @@ unsigned int SqliteDbService::getFiles(QVector<FileData> &results, QString wildC
return processedRows;
}
bool SqliteDbService::insertToFTS(bool useTrigrams, QSqlDatabase &db, int fileid, QVector<PageData> &pageData)
{
QString ftsInsertStatement;
QString contentInsertStatement;
if(useTrigrams)
{
ftsInsertStatement = "INSERT INTO fts_trigram(content) VALUES(?)";
contentInsertStatement = "INSERT INTO content(fileid, page, fts_trigramid) VALUES(?, ?, last_insert_rowid())";
}
else
{
ftsInsertStatement = "INSERT INTO fts(content) VALUES(?)";
contentInsertStatement = "INSERT INTO content(fileid, page, ftsid) VALUES(?, ?, last_insert_rowid())";
}
for(const PageData &data : pageData)
{
QSqlQuery ftsQuery(db);
ftsQuery.prepare(ftsInsertStatement);
ftsQuery.addBindValue(data.content);
if(!ftsQuery.exec())
{
Logger::error() << "Failed fts insertion " << ftsQuery.lastError() << Qt::endl;
return false;
}
QSqlQuery contentQuery(db);
contentQuery.prepare(contentInsertStatement);
contentQuery.addBindValue(fileid);
contentQuery.addBindValue(data.pagenumber);
if(!contentQuery.exec())
{
Logger::error() << "Failed content insertion " << contentQuery.lastError() << Qt::endl;
return false;
}
}
return true;
}
SaveFileResult SqliteDbService::saveFile(QFileInfo fileInfo, QVector<PageData> &pageData)
{
QString absPath = fileInfo.absoluteFilePath();
@ -187,18 +149,24 @@ SaveFileResult SqliteDbService::saveFile(QFileInfo fileInfo, QVector<PageData> &
}
int lastid = inserterQuery.lastInsertId().toInt();
if(!insertToFTS(false, db, lastid, pageData))
for(const PageData &data : pageData)
{
db.rollback();
Logger::error() << "Failed to insert data to FTS index " << Qt::endl;
return DBFAIL;
}
if(!insertToFTS(true, db, lastid, pageData))
{
db.rollback();
Logger::error() << "Failed to insert data to FTS index " << Qt::endl;
return DBFAIL;
QSqlQuery ftsQuery(db);
ftsQuery.prepare("INSERT INTO fts(content) VALUES(?)");
ftsQuery.addBindValue(data.content);
ftsQuery.exec();
QSqlQuery contentQuery(db);
contentQuery.prepare("INSERT INTO content(fileid, page, ftsid) VALUES(?, ?, last_insert_rowid())");
contentQuery.addBindValue(lastid);
contentQuery.addBindValue(data.pagenumber);
if(!contentQuery.exec())
{
db.rollback();
Logger::error() << "Failed content insertion " << contentQuery.lastError() << Qt::endl;
return DBFAIL;
}
}
if(!db.commit())
{
db.rollback();

Melihat File

@ -13,7 +13,6 @@ class SqliteDbService
{
private:
DatabaseFactory *dbFactory = nullptr;
bool insertToFTS(bool useTrigrams, QSqlDatabase &db, int fileid, QVector<PageData> &pageData);
public:
SqliteDbService(DatabaseFactory &dbFactory);

Melihat File

@ -82,10 +82,11 @@ QString SqliteSearch::escapeFtsArgument(QString ftsArg)
{
value = value.mid(0, value.size() - 1);
}
result += "\"" + value + "\"* ";
result += "\"" + value + "\"*";
}
else
{
value = "\"\"" + value + "\"\"";
result += "\"" + value + "\" ";
}
}
@ -141,7 +142,9 @@ QPair<QString, QVector<QString>> SqliteSearch::createSql(const Token &token)
}
if(token.type == FILTER_CONTENT_CONTAINS)
{
return {" fts MATCH ? ", {escapeFtsArgument(value)}};
return {" content.id IN (SELECT fts.ROWID FROM fts WHERE fts.content MATCH ? ORDER BY "
"rank) ",
{escapeFtsArgument(value)}};
}
throw LooqsGeneralException("Unknown token passed (should not happen)");
}
@ -161,14 +164,26 @@ QSqlQuery SqliteSearch::makeSqlQuery(const LooqsQuery &query)
auto tokens = query.getTokens();
for(const Token &token : tokens)
{
auto sql = createSql(token);
whereSql += sql.first;
bindValues.append(sql.second);
if(token.type == FILTER_CONTENT_CONTAINS)
{
if(!ftsAlreadyJoined)
{
joinSql += " INNER JOIN fts ON content.ftsid = fts.ROWID ";
ftsAlreadyJoined = true;
}
whereSql += " fts.content MATCH ? ";
bindValues.append(escapeFtsArgument(token.value));
}
else
{
auto sql = createSql(token);
whereSql += sql.first;
bindValues.append(sql.second);
}
}
QString prepSql;
QString sortSql = createSortSql(query.getSortConditions());
int bindIterations = 1;
if(isContentSearch)
{
if(sortSql.isEmpty())
@ -176,24 +191,12 @@ QSqlQuery SqliteSearch::makeSqlQuery(const LooqsQuery &query)
if(std::find_if(tokens.begin(), tokens.end(),
[](const Token &t) -> bool { return t.type == FILTER_CONTENT_CONTAINS; }) != tokens.end())
{
sortSql = "ORDER BY prio, rank";
sortSql = "ORDER BY rank";
}
}
QString whereSqlTrigram = whereSql;
whereSqlTrigram.replace("fts MATCH", "fts_trigram MATCH"); // A bit dirty...
prepSql =
"SELECT DISTINCT path, page, mtime, size, filetype FROM ("
"SELECT file.path AS path, content.page AS page, file.mtime AS mtime, file.size AS size, "
"file.filetype AS filetype, 0 AS prio, fts.rank AS rank FROM file INNER JOIN content ON file.id = "
"content.fileid "
"INNER JOIN fts ON content.ftsid = fts.ROWID WHERE 1=1 AND " +
whereSql +
"UNION ALL SELECT file.path AS path, content.page AS page, file.mtime AS mtime, file.size AS size, "
"file.filetype AS filetype, 1 as prio, fts_trigram.rank AS rank FROM file INNER JOIN content ON file.id = "
"content.fileid " +
"INNER JOIN fts_trigram ON content.fts_trigramid = fts_trigram.ROWID WHERE 1=1 AND " + whereSqlTrigram +
" ) " + sortSql;
++bindIterations;
prepSql = "SELECT file.path AS path, content.page AS page, file.mtime AS mtime, file.size AS size, "
"file.filetype AS filetype FROM file INNER JOIN content ON file.id = content.fileid " +
joinSql + " WHERE 1=1 AND " + whereSql + " " + sortSql;
}
else
{
@ -213,14 +216,11 @@ QSqlQuery SqliteSearch::makeSqlQuery(const LooqsQuery &query)
QSqlQuery dbquery(*db);
dbquery.prepare(prepSql);
for(int i = 0; i < bindIterations; i++)
for(const QString &value : bindValues)
{
for(const QString &value : bindValues)
if(value != "")
{
if(value != "")
{
dbquery.addBindValue(value);
}
dbquery.addBindValue(value);
}
}
return dbquery;