Search index: Incremental Indexing
Incremental indexing allows you to keep in sync your Elasticsearch index and your Neo4j graph database.
Linkurious Enterprise will index at a regular interval new and updated items from your database. This way, you avoid a complete reindex every time you update you data.
This is achieved by keeping track of a timestamp on every node and edge in the database, hereby allowing the indexer to only consider the nodes with newer timestamps and consequently reducing the indexing time.
You should consider this option if your database holds a significant number of items and needs to be updated frequently.
Requirements
1. APOC
Linkurious Enterprise relies on APOC triggers to ensure that every node and edge created or updated has a timestamp.
You need to make sure that you have installed APOC correctly and enabled APOC triggers. You can find all the information you need to install APOC from Neo4j documentation.
You can quickly verify that you have installed everything correctly by executing the following command from your Neo4j browser:
CALL apoc.trigger.list()
2. A property name
Your need to carefully choose the property that will hold the timestamp on every node/edge of your database. The consequence of this choice is that Linkurious Enterprise will create triggers that will store a timestamp on this property for all new and updated nodes/edge.
This means that any information stored on that property will be overwritten by the trigger
Enabling incremental indexing with Elasticsearch
After you have installed and configured APOC, you can enable incremental indexing from the data-source configuration page or by editing the configuration file with the following options:
incrementalIndexation
You can enable incremental indexing by switching this option to true.timestampPropertyName
You must then provide the timestamp property name to keep track of during incremental indexing. E.g: “lastEditTimestamp”.
After you have enabled incremental indexing for the first time, Linkurious Enterprise requires to perform a complete re-indexing of the data-source to ensure that every item has been indexed. You simply need to click on the "Start indexing" button to complete the configuration.
Linkurious Enterprise will index your data-source incrementally from that point forward using the timestamps generated by the APOC triggers.
Scheduling incremental indexing
Once you have set up the incremental indexation, you can configure the frequency at which it will be triggered. You can customize this schedule by adding a cron expression to your Elasticsearch configuration.
You can make this change from the configuration file located at linkurious/data/config/production.json
under datasources.index
for each dataSource.
By default, we have set all incremental indexing to be launched every Sunday at 12PM, but you can change it to the frequency that most suits your needs. We advise that you schedule your increments to run after you have updated your database with new information. Here are some examples of cron expressions:
- Update index every 5 minutes:
"incrementalIndexationCron": "*/5 * * * *"
- Update index every 30 minutes:
"incrementalIndexationCron": "*/30 * * * *"
- Update index every hour:
"incrementalIndexationCron": "0 * * * *"
- Update index every day at 3PM:
"incrementalIndexationCron": "00 15 * * *"
- Update index every 2 weeks (on the 1st and 15th):
"incrementalIndexationCron": "00 00 1,15 * *"
- Update index every month:
"incrementalIndexationCron": "00 00 1 * *"
If you need to index your data-source on a non regular interval, you can trigger it manually with the “Start indexation” button in the data-source configuration page or trigger it automatically with the start indexation api.
You can disable automatic indexing with the following configuration: "incrementalIndexationCron": "none"
Limitations
Here some important considerations when choosing incremental indexing.
Incremental indexing is only available on Neo4j v3.5.1 and above.