Netrics Matching Engine – Technical Overview

The Netrics Matching Engine is an in-memory database search application that can be attached to virtually any data source including Oracle, Microsoft SQL Server, IBM DB2, MySQL, and many others. The search function is integrated into applications via a standard API using common programming languages including Java, .Net, Python, and C/C++. The software runs natively under Linux, Windows, and all major UNIX platforms, on 32 and 64 bit processors. The engine can provide sustained real-time, highly accurate search capabilities for small, medium, large and really humongous databases(700 million records in one case).

The engine architecture is such that all requests can be load balanced across application instances and partitioned to handle any size database (billions of records) with sub-second latency. Multi-threaded, federated queries are possible so that you can take advantage of a wide range of server environments, data schema and business application needs.

Architectural Overview

There really isn’t a limitation to the throughput. Rather it’s a matter of how much processing power is configured, but any commodity hardware will do. One of our largest implementation has 700 million records processing 25 queries per second around the clock on a relatively modest blade server infrastructure. The highly compact engine is contained within a single executable of about 1MB size which makes deployment possible on any size platform.

The engine uses advanced mathematical modeling and bi-partite graph based matching to calculate similarity scores. The really clever (patented) part is how it processes extremely large numbers of match calculations in very short amounts of time on standard hardware. The result is that the engine can distinguish between patterns of data that strict SQL-type search and other types of fuzzy matching simply cannot perform. The engine is completely agnostic as to the type of data or domain. It doesn’t make any assumptions about whether your data is name and address, product data, medical records or Chinese characters. Spoken languages can be intermixed. It’s cultural and domain independence allows you to deploy the engine within hours without any prior knowledge of the type, structure or state of the data.

The four live demos on this site have had no prepossessing, cleaning, matching rules, scrubbing or normalizing whatsoever. All four data sources are running on a single instance of the Netrics Matching Engine, on a commodity single CPU dual core server (Dell PowerEdge 1950) running RedHat Linux.

This all means that you don't have to build rules, perform data profiling or normalization in order to find and capture significantly meaningful information from your data sources. Connecting the engine to all your EXISTING applications requires as few a 10-15 lines of code.

Are we saying that we don't use rules sets – that it just doesn't matter?

Yes. It doesn't matter at all. If you like, the engine does provide for cross-token, cross field matching, back and forth across the field boundaries – any way you choose. You also have fine-grained control of which fields are used for which part of the query, how they are combined and how they relate to each other. You can also define the individual field or token sensitivity, weighting, and many other parameters that control the matching process. But in most application, very little tuning is required to obtain extremely accurate results.

For every query, the engine returns a result set (however many records you like) each ranked and scored according to their similarity to search request. The engine provides not only a total score across all entire fields but can provide individual scoring at the field and character level. A standard feature provides HTML tokens embedded in the result record data that can visually highlight which portions of the data records at the field and character level contributed most to the match and to what degree.

Due to its simplicity of implementation combined with its flexible search capability, it was easy for us to provide a full working demo on this web site – no other technology can do this. Please let us know if you would like to try our engine with your own data via our cloud computing option.

Request PDF Document