Threshold estimationΒΆ

Currently it is possible to perform threshold estimation tasks using the kth_threshold tool. The tool computes the k-highest impact score for each term of a query. Clearly, the top-k threshold of a query can be lower-bounded by the maximum of the k-th highest impact scores of the query terms.

In addition to the k-th highest score for each individual term, it is possible to use the k-th highest score for certain pairs and triples of terms.

To perform threshold estimation use the kth_threshold command:

A tool for performing threshold estimation using the k-highest impact score for each term, pair or triple of a query. Pairs and triples are only used if provided with --pairs and --triples respectively.
Usage: ./bin/kth_threshold [OPTIONS]

Options:
  -h,--help                   Print this help message and exit
  -e,--encoding TEXT REQUIRED Index encoding
  -i,--index TEXT REQUIRED    Inverted index filename
  -w,--wand TEXT REQUIRED     WAND data filename
  --compressed-wand Needs: --wand
                              Compressed WAND data file
  --tokenizer TEXT:{english,whitespace}=english
                              Tokenizer
  -H,--html UINT=0            Strip HTML
  -F,--token-filters TEXT:{krovetz,lowercase,porter2} ...
                              Token filters
  --stopwords TEXT            Path to file containing a list of stop words to filter out
  -q,--queries TEXT           Path to file with queries
  --terms TEXT                Term lexicon
  --weighted                  Weights scores by query frequency
  -k INT REQUIRED             The number of top results to return
  -s,--scorer TEXT REQUIRED   Scorer function
  --bm25-k1 FLOAT Needs: --scorer
                              BM25 k1 parameter.
  --bm25-b FLOAT Needs: --scorer
                              BM25 b parameter.
  --pl2-c FLOAT Needs: --scorer
                              PL2 c parameter.
  --qld-mu FLOAT Needs: --scorer
                              QLD mu parameter.
  -L,--log-level TEXT:{critical,debug,err,info,off,trace,warn}=info
                              Log level
  --config TEXT               Configuration .ini file
  -p,--pairs TEXT Excludes: --all-pairs
                              A tab separated file containing all the cached term pairs
  -t,--triples TEXT Excludes: --all-triples
                              A tab separated file containing all the cached term triples
  --all-pairs Excludes: --pairs
                              Consider all term pairs of a query
  --all-triples Excludes: --triples
                              Consider all term triples of a query
  --quantized                 Quantizes the scores

--all-pairs and --all-triples can be used if you want to consider all the pairs and triples terms of a query as being previously cached.