All versions of this manual
X
 

Entity resolution

ℹ️ Entity Resolution is currently only available in Linkurious Enterprise v4.1.12 and later, as a beta feature.

If you would like to try this feature, please get in touch.

Feature overview

The Entity Resolution feature allows you to automatically detect duplicate records for people or organizations in your data. It can show details about why two records have been detected as the same entity, and display duplicate records as a single node as you explore the graph.

All Linkurious Enterprise licenses include a discovery package that allows to ingest up to 250k records to detect entities. If you would like to purchase the ability to ingest more records, please get in touch with your account executive.

Technical overview

The entity resolution feature is an additional service than that needs to be installed and run independently of the Linkurious Enterprise service. The Entity Resolution service uses the Senzing library internally to resolve records to entities. To use the Entity Resolution service, you need to install it, start it, and set up its URL in the Linkurious Enterprise configuration.

Technical requirements

The Entity Resolution service needs to be run with docker. It is not standalone and requires Linkurious Enterprise.

Hardware

For a local testing environment:

  • Linux or macOS
  • At least 4 GB of RAM.
  • An x86_64 modern CPU with SSE2 and AVX support (ARM / Apple Silicon CPUs are not supported).

For production environment:

  • Linux
  • At least 16 GB of RAM.
  • At least 4 modern x86_64 CPU Cores, with SSE2 and AVX support.
  • Connection to a database server that uses Solid State Drives or NVMe storage (provision approximately 1GB of storage per 50K records).

For more details, please refer to Senzing's technical documentation:

Local testing setup

⚠️ This configuration is for quick local testing, and should not be used for production:

  • It does not include data persistence, authentication or network security
  • It requires Linkurious Enterprise to be running locally For production deployments, please refer to the production setup documentation.

The following steps will help you run Linkurious Entity Resolution using the docker command line.

ℹ️ In the following commands, please replace LINKURIOUS_PRIVATE_REGISTRY, YOUR_EAMAIL and YOUR_DOWNLOAD_KEY with the values found in the "download" section of the customer center.

First, download the Entity Resolution service:

  • docker login -u 'YOUR_EMAIL' -p 'YOUR_DOWNLOAD_KEY' LINKURIOUS_PRIVATE_REGISTRY
  • docker pull --platform linux/x86_64 LINKURIOUS_PRIVATE_REGISTRY/linkurious/linkurious-entity-resolution:1.0.0

Then, start the Entity Resolution service:

docker run -p 8080:8080 --platform linux/x86_64 LINKURIOUS_PRIVATE_REGISTRY/linkurious/linkurious-entity-resolution:1.0.0

Then, check that the service is running by opening this url in your browser: http://localhost:8080/status

You should see ("uptime" is the number of seconds since the service started):

{"status":"API is up","uptime":10}

Then, in Linkurious Enterprise, open the "Global configuration" page and set the following values in the "Entity resolution" section:

{
   "enabled": true,
   "url": "http://localhost:8080"
}

Finally, in Linkurious Enterprise, select the "Entity Resolution" entry in the main menu to configure a data mapping and start ingesting records.

entity resolution entry in thr main menu

Persistence for testing

The above steps are meant for local testing and do not include data persistence, which leads to inconsistent results as soon as the service is restarted.

By default, the service uses an SQLite database located within the container filesystem at /usr/src/app/project/var/sqlite/G2C.db. The easiest way to enable persistence is to configure a persistent volume mapping for this file.

This is an example of mounting a volume for this file, please replace the path /path/to/host to a valid path in your system.

docker run  -p 8080:8080 -v /path/to/host/sqlite:/usr/src/app/project/var/sqlite LINKURIOUS_PRIVATE_REGISTRY/linkurious/linkurious-entity-resolution:1.0.0

Please refer to the docker documentation to learn how the configure named volumes. You can also configure another database vendor, more info in the "Persistence for production" section.

Production setup

In production, it is required to configure persistence, and strongly recommended to enable authentication and use private networking.

Persistence for production

In production, we recommend setting up persistence using MySQL or MSSQL. To configure persistence, you need to set environment variables when starting the linkurious-entity-resolution docker container.

MySQL

Example configuration for MySQL:

SENZING_DATABASE_VENDOR: 'mysql'
SENZING_DATABASE_HOST: '127.0.0.1'
SENZING_DATABASE_PORT: '3306'
SENZING_DATABASE_USER: 'my-database-user'
SENZING_DATABASE_NAME: 'my-database-name'
SENZING_DATABASE_PASSWORD: 'my-database-password'

MSSQL

SENZING_DATABASE_TRUST_CERTIFICATE is optional, and only supported on MSSQL. When set to true, connecting to a server using a self-signed certificate is allowed. It is recommended to set this variable to false in production.

Example configuration for MSSQL:

SENZING_DATABASE_VENDOR: 'mssql'
SENZING_DATABASE_HOST: '127.0.0.1'
SENZING_DATABASE_PORT: '1433'
SENZING_DATABASE_USER: 'my-database-user'
SENZING_DATABASE_NAME: 'my-database-name'
SENZING_DATABASE_PASSWORD: 'my-database-password'
SENZING_DATABASE_TRUST_CERTIFICATE: 'true'

Authentication

You can enable authentication to make sure that API calls to the Entity Resolution service are only accepted if the call contains a API Key.

To enable authentication:

  1. Start the linkurious-entity-resolution docker container with a LINKURIOUS_ENTITY_RESOLUTION_API_KEY environment variable (the key must be at least 32 characters long).
  2. In Linkurious Enterprise, in the "Global Configuration", edit the "Entity Resolution" section to use the same API Key:
    {
      "enabled": true,
      "url": "http://my-linkurious-entity-resolution-server.svc:8080",
      "serviceApiKey": "your api key here"
    }
    

Networking

The docker image exposes the port 8080 for HTTP connections, but you can use another port via docker mapping. Please visit the docker documentation to learn how publish the ports of a container.

In production, it is strongly advised to have a private network between Linkurious Enterprise and the Entity Resolution service.