Entity resolution
ℹ️ Entity Resolution is currently only available in Linkurious Enterprise v4.1.12 and later, as a beta feature.
If you would like to try this feature, please get in touch.
Feature overview
The Entity Resolution feature allows you to automatically detect duplicate records for people or organizations in your data. It can show details about why two records have been detected as the same entity, and display duplicate records as a single node as you explore the graph.
All Linkurious Enterprise licenses include a discovery package that allows to ingest up to 250k records to detect entities. If you would like to purchase the ability to ingest more records, please get in touch with your account executive.
Technical overview
The entity resolution feature is an additional service than that needs to be installed and run independently of the Linkurious Enterprise service. The Entity Resolution service uses the Senzing library internally to resolve records to entities. To use the Entity Resolution service, you need to install it, start it, and set up its URL in the Linkurious Enterprise configuration.
Technical requirements
The Entity Resolution service needs to be run with docker
.
It is not standalone and requires Linkurious Enterprise.
Hardware
For a local testing environment:
- Linux or macOS
- At least 4 GB of RAM.
- An x86_64 modern CPU with SSE2 and AVX support (ARM / Apple Silicon CPUs are not supported).
For production environment:
- Linux
- At least 16 GB of RAM.
- At least 4 modern x86_64 CPU Cores, with SSE2 and AVX support.
- Connection to a database server that uses Solid State Drives or NVMe storage (provision approximately 1GB of storage per 50K records).
For more details, please refer to Senzing's technical documentation:
Local testing setup
⚠️ This configuration is for quick local testing, and should not be used for production:
- It does not include data persistence, authentication or network security
- It requires Linkurious Enterprise to be running locally For production deployments, please refer to the production setup documentation.
The following steps will help you run Linkurious Entity Resolution using the docker
command
line.
ℹ️ In the following commands, please replace
LINKURIOUS_PRIVATE_REGISTRY
,YOUR_EAMAIL
andYOUR_DOWNLOAD_KEY
with the values found in the "download" section of the customer center.
First, download the Entity Resolution service:
docker login -u 'YOUR_EMAIL' -p 'YOUR_DOWNLOAD_KEY' LINKURIOUS_PRIVATE_REGISTRY
docker pull --platform linux/x86_64 LINKURIOUS_PRIVATE_REGISTRY/linkurious/linkurious-entity-resolution:1.0.0
Then, start the Entity Resolution service:
docker run -p 8080:8080 --platform linux/x86_64 LINKURIOUS_PRIVATE_REGISTRY/linkurious/linkurious-entity-resolution:1.0.0
Then, check that the service is running by opening this url in your browser: http://localhost:8080/status
You should see ("uptime" is the number of seconds since the service started):
{"status":"API is up","uptime":10}
Then, in Linkurious Enterprise, open the "Global configuration" page and set the following values in the "Entity resolution" section:
{
"enabled": true,
"url": "http://localhost:8080"
}
Finally, in Linkurious Enterprise, select the "Entity Resolution" entry in the main menu to configure a data mapping and start ingesting records.
Persistence for testing
The above steps are meant for local testing and do not include data persistence, which leads to inconsistent results as soon as the service is restarted.
By default, the service uses an SQLite database located within the container filesystem at
/usr/src/app/project/var/sqlite/G2C.db
.
The easiest way to enable persistence is to configure a persistent volume mapping for this file.
This is an example of mounting a volume for this file, please replace the path /path/to/host
to a valid path in your system.
docker run -p 8080:8080 -v /path/to/host/sqlite:/usr/src/app/project/var/sqlite LINKURIOUS_PRIVATE_REGISTRY/linkurious/linkurious-entity-resolution:1.0.0
Please refer to the docker documentation to learn how the configure named volumes. You can also configure another database vendor, more info in the "Persistence for production" section.
Production setup
In production, it is required to configure persistence, and strongly recommended to enable authentication and use private networking.
Persistence for production
In production, we recommend setting up persistence using MySQL
or MSSQL
.
To configure persistence, you need to set environment variables
when starting the linkurious-entity-resolution
docker container.
MySQL
Example configuration for MySQL:
SENZING_DATABASE_VENDOR: 'mysql'
SENZING_DATABASE_HOST: '127.0.0.1'
SENZING_DATABASE_PORT: '3306'
SENZING_DATABASE_USER: 'my-database-user'
SENZING_DATABASE_NAME: 'my-database-name'
SENZING_DATABASE_PASSWORD: 'my-database-password'
MSSQL
SENZING_DATABASE_TRUST_CERTIFICATE
is optional, and only supported on MSSQL. When set to
true
, connecting to a server using a self-signed certificate is allowed. It is recommended
to set this variable to false
in production.
Example configuration for MSSQL:
SENZING_DATABASE_VENDOR: 'mssql'
SENZING_DATABASE_HOST: '127.0.0.1'
SENZING_DATABASE_PORT: '1433'
SENZING_DATABASE_USER: 'my-database-user'
SENZING_DATABASE_NAME: 'my-database-name'
SENZING_DATABASE_PASSWORD: 'my-database-password'
SENZING_DATABASE_TRUST_CERTIFICATE: 'true'
Authentication
You can enable authentication to make sure that API calls to the Entity Resolution service are only accepted if the call contains a API Key.
To enable authentication:
- Start the
linkurious-entity-resolution
docker container with aLINKURIOUS_ENTITY_RESOLUTION_API_KEY
environment variable (the key must be at least 32 characters long). - In Linkurious Enterprise, in the "Global Configuration", edit the "Entity Resolution" section to use the
same API Key:
{ "enabled": true, "url": "http://my-linkurious-entity-resolution-server.svc:8080", "serviceApiKey": "your api key here" }
Networking
The docker image exposes the port 8080
for HTTP connections, but you can use another port via
docker mapping.
Please visit the docker documentation to learn how publish the ports of a container.
In production, it is strongly advised to have a private network between Linkurious Enterprise and the Entity Resolution service.