Relevancy

Ordering documents

DataparkSearch by default sorts results first by relevency and second by popularity rank.

Relevancy calculation

Relevancy for every found document is calculated as 100% multiply by cosine of an angle formed by weights vector for request and weights vector for document found. The number of vector coordinates is equal to multiplication of the number words forms in search query and the number of sections defined in indexer.conf. Every vector's coordinate is corresponds to a word in search query that fit one of document section. The values of this coordinate is depends on weight for this section defined by wf parameter (see the Section called Changing different document parts weights at search time) and what this word is: exactly the same as in search query or it's word form or synonym. And one more coordinate is equal to average distance between searched words in document. For query related vector this coordinate is equal to 0.

Since sections definition located only in indexer.conf file, use NumSections command in searchd.conf or in search.htm to specify the number od section used. By default, this value is 256. But note, NumSections do not affect document ordering, only the relevancy value.

Popularity rank

DataparkSearch support two methods for popularity rank calculation. A method used in previous versions called "Goo", and new method is called "Neo". By default, the Goo method is used. To select desired PopRank calculation method use PopRankMethod command:


PopRankMethod Neo

You need enable links collection by CollectLinks yes command in your indexer.conf file for Neo method and for full functionality of Goo method. But this slow down a bit indexing speed. By default, links collection is not enabled.

If you place PopRankSkipSameSite yes command in indexer.conf file, indexer will take only inter site links (i.e. links from a page on one site to a page on another site) for popularity rank calculation.

"Goo" popularity rank calculation method

The popularity rank calculation is made in two stages. At first stage, the value of Weight parameter for every server is divide by number of links from this server. Thus, the weight of one link from this server is calculated. At second stage, for every page we find the sum of weights of all links pointed to this page. This sum is popularity rank for this page.

By default, the value of Weight parameter is equal to 1 for all servers indexed. You may change this value by Weight command in indexer.conf file or directly in server table, if you load servers configuration from this table.

If you place PopRankFeedBack yes command in indexer.conf file, indexer will calculate site weights before page rank calculation. To do that, indexer calculate sum of popularity rank for all pages from same site. If this sum will great 1, the weight for site set to this sum, otherwise, site weight is set to 1.

If you place PopRankUseTracking yes command in indexer.conf file, indexer will calculate site weight as the number of tracked queries with restriction on this site.

If you place PopRankUseShowCnt yes command in search.htm (or searchd.conf) file, then for every result shown to user corresponding url.shows value will be increased on 1, if relevancy for this result is great or equal to value specified by PopRankShowCntRatio command (default value is 25.0). If you place PopRankUseShowCnt yes in indexer.conf file, indexer will add to url's PopularityRank the value of url.shows multiplied by value, specified in PopRankShowCntWeight command (default value is 0.01).

For this method is supposed all pages are neurons and links between pages are links between neurons. So it's possible use an error back-propagation algorithm to train this neural network. Popularity rank for a pages is the activity level for corresponding neuron.

You may use PopRankNeoIterations command to specify the number of iterations of the Neo Popularity Rank calculation. Default value is 3.

Boolean search

Please note that in case of boolean searching of two or more words, you have to enter operators (&, |, ~). I.e. it is necessary to enter "a & book" instead of "a book" (with no quotation marks).

Crosswords

This feature allows to assign words between <a href="xxx"> and </a> also to a document this link leads to. It works in SQL database mode and is not supported in cache mode. To enable Crosswords, please use CrossWords yes command in indexer.conf and search.htm.