Reddit Reddit reviews Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences

We found 2 Reddit comments about Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Here are the top ones, ranked by their Reddit score.

Computers & Technology
Books
Networking & Cloud Computing
Computer Network Administration
Network Storage & Retrieval Administration
Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences
Check price on Amazon

2 Reddit comments about Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences:

u/librik · 13 pointsr/programming

Bit-parallel text search algorithms like this are all covered in ridiculous detail in the book Flexible Pattern Matching In Strings by Gonzalo Navarro and Matthew Raffinot. (Also don't skip the errata here.) It's a good book (if you're willing to accept that it's biased propaganda for bitwise string searching while claiming to be an impartial recipe book for all string search algorithms) which even deals with fuzzy approximate text matching using these techniques.

What makes it so interesting is that this is the first time I've seen the full extent of bit twiddling hacks, developed in theoretical depth in "Hacker's Delight", really pushed extensively to solve a real problem. I mean, occasionally you would see Population Count or Clear Lowest Bit as an optimization trick in a chess program, but these guys use all of it as the basic technology for their field.

Sun Wu and Udi Manber were probably the first people to spot that, so long as the complete set of states fits inside a machine word, the CPU can be seen as simulating a Nondeterministic Finite Automaton very, very quickly. At that point, the race was on, and every new "bit hack" discovered extended the range of NFAs that could run inside a register. Combine that with code generation, as the article here does, and you've got something that runs like a bat out of hell. (But you'll notice the big problem is that it can tell you a regular expression match ends, but it can't tell you where it starts!)

u/burntsushi · 8 pointsr/rust

Aye. And personally, I'm not a huge fan of using edit distance for fuzzy searches in most cases. I've found n-gram (with n=3 or n=4) to be much more effective. And you can use that in conjunction with bitap, for example, by using an n-gram index to narrow down your search space. I use an n-gram index in imdb-rename.

If you like algorithms like bitap, you'll definitely like Flexible Pattern Matching in Strings, which covers generalizations of bitaps for regexes themselves, for both Thompson and Glushkov automata.