Mspm - Multi-String Pattern Matching Algorithm for information retrieval

README.md

Multi-String Pattern Matching algorithm.

This implementation is inspired from Aho-Corasick algorithm

Getting Started

modelA = mspm.NewModel("mspm_model_A")
patternsToSearch = strings.NewReader(words) // words is newline seperated list of words/keywords

// build mspm model with patterns
modelA.Build(patternsToSearch)

inputString := "input document where the above pattern is searched for."
document := strings.NewReader(inputString)
output, err := modelA.MultiTermMatch(document)

// output ~= [{matched_word: n_count}, ..]

Test Coverage

TrieNode vs TrieHashNode benchmark

$ cd github.com/BlackRabbitt/mspm/ds/trie
$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/BlackRabbitt/mspm/ds/trie
BenchmarkTrieNodeInsert-4             500000          2582 ns/op
BenchmarkTrieNodeSearch-4           10000000           205 ns/op
BenchmarkTrieHashNodeInsert-4        1000000          1491 ns/op
BenchmarkTrieHashNodeSearch-4       10000000           206 ns/op
PASS
ok      github.com/BlackRabbitt/mspm/ds/trie	8.795s

Resources

  1. Trie
  2. mspm - Multi-String Pattern Matching algorithm. Generally used for Information Retrieval.
  3. Aho-Corasick algorithm

Nguồn : GitHub