🔖 Introduction Link to heading
Setup and run a local ES instance on Linux
One of my recurrent problem with NLP tasks is how to store unstructured data on my PC. Let’s say for example that you had extracted a bunch of text comments from Twitter and you want to store all the entities found for each comment. How to store it? CSV and other tabular-like structures aren’t good for this kind of unstructured data.
A good solution could be to set up a local Elasticsearch (ES) instance and use it. At the cost of less speed at storing time, we will get some enjoyable pros:
First-class support to unstructured data
Not only to store but use it to analyze the data
Expose the data to more than one service, at the same time
Easily scaling if required: you can always copy your local ES indexes on cloud-based ES instances
So let’s setup a local ES instance and use it as a local database for all the unstructured data:
🔄 Updates Link to heading
21/03/2021
- Disable Kibana auto-updates instructions
🚀 Setup ES “start” and “stop” Link to heading
Tested on Linux - Pop!_OS 20.10
Download and extract es folder like original documentation, in this guide the folder downloaded is
elasticsearch-7.10.2Move the downloaded folder
sudo mkdir -p /opt/elasticsearch/ sudo mv elasticsearch-7.10.2 /opt/elasticsearchCreate the ES managers
- File
/usr/bin/elasticsearch-start.sh
#!/bin/ash bash /opt/elasticsearch/elasticsearch-7.10.2/bin/elasticsearch -p /tmp/elasticsearch-pid -d echo "Started es instance"File
/usr/bin/elasticsearch-stop.sh#!/bin/bash ES_PID=$(cat /tmp/elasticsearch-pid) echo "Killing es at pid $ES_PID" kill -SIGTERM "$ES_PID"Make the scripts executable
sudo chmod +x /usr/bin/elasticsearch-start.sh sudo chmod +x /usr/bin/elasticsearch-stop.shNow you can run and stop an ES instance from CLI
# Start ES instance ❯ elasticsearch-start.sh Started es instance # Check the instance status ❯ watch -n1 curl localhost:9200 # Stop the ES instance ❯ elasticsearch-stop.sh Killing es at pid 34010
- File
# 🖥 Setup kibana Link to heading
Kibana is a GUI application to easily interface with ES
We will install Kibana using the official guide, and then start and stop with those commands:
# Start / check / stop kibana instance ❯ sudo systemctl start kibana # Visit localhost:5601 ❯ sudo systemctl status kibana ❯ sudo systemctl stop kibana
Exclude kibana from auto update Link to heading
Kibana installed as linux package will be auto-updated with all the system packages, this could led to a mismatch between your Elasticsearch and Kibana versions (and the impossibility of run kibana service).
To avoid this inconvenience, exclude the kibana package from the auto-updates:
# Disable kibana auto-update
❯ sudo apt-mark hold kibana
# To re-enable kibana auto-update
❯ sudo apt-mark unhold kibana
🗒 Notes Link to heading
Why don’t install ES like Kibana using debian packages?
- In this way, we can easily switch to use multiple ES folders, for divide both in terms of versions and “projects” the instances.
💤 Todo Link to heading
- Register the ES instance as
systemctlservice
🔗 Links Link to heading
Stopping Elasticsearch | Elasticsearch Reference [master] | Elastic
Why do most systemd examples contain WantedBy=multi-user.target? - Unix & Linux Stack Exchange
How do I make my systemd service run via specific user and start on boot? - Ask Ubuntu
Install Kibana with Debian package | Kibana Guide [7.10] | Elastic
How to Exclude Specific Package from apt-get Upgrade | article