🔍 Elasticsearch local setup

1.2.2021 3-minute read

🔖 Introduction

Setup and run a local ES instance on Linux

One of my recurrent problem with NLP tasks is how to store unstructured data on my PC. Let’s say for example that you had extracted a bunch of text comments from Twitter and you want to store all the entities found for each comment. How to store it? CSV and other tabular-like structures aren’t good for this kind of unstructured data.

A good solution could be to set up a local Elasticsearch (ES) instance and use it. At the cost of less speed at storing time, we will get some enjoyable pros:

First-class support to unstructured data
Not only to store but use it to analyze the data
Expose the data to more than one service, at the same time
Easily scaling if required: you can always copy your local ES indexes on cloud-based ES instances

So let’s setup a local ES instance and use it as a local database for all the unstructured data:

🔄 Updates

21/03/2021
- Disable Kibana auto-updates instructions

🚀 Setup ES “start” and “stop”

Tested on Linux - Pop!_OS 20.10

Download and extract es folder like original documentation, in this guide the folder downloaded is elasticsearch-7.10.2

Move the downloaded folder

sudo mkdir -p /opt/elasticsearch/
sudo mv elasticsearch-7.10.2 /opt/elasticsearch

Create the ES managers

File /usr/bin/elasticsearch-start.sh

#!/bin/ash
bash /opt/elasticsearch/elasticsearch-7.10.2/bin/elasticsearch -p /tmp/elasticsearch-pid -d
echo "Started es instance"

File /usr/bin/elasticsearch-stop.sh

#!/bin/bash
ES_PID=$(cat /tmp/elasticsearch-pid)
echo "Killing es at pid $ES_PID"
kill -SIGTERM "$ES_PID"

Make the scripts executable

sudo chmod +x /usr/bin/elasticsearch-start.sh
sudo chmod +x /usr/bin/elasticsearch-stop.sh

Now you can run and stop an ES instance from CLI

# Start ES instance
❯ elasticsearch-start.sh
Started es instance
    
# Check the instance status
❯ watch -n1 curl localhost:9200
    
# Stop the ES instance
❯ elasticsearch-stop.sh 
Killing es at pid 34010

# 🖥 Setup kibana

Kibana is a GUI application to easily interface with ES

We will install Kibana using the official guide, and then start and stop with those commands:

# Start / check / stop kibana instance
❯ sudo systemctl start kibana # Visit localhost:5601
❯ sudo systemctl status kibana
❯ sudo systemctl stop kibana

Exclude kibana from auto update

Kibana installed as linux package will be auto-updated with all the system packages, this could led to a mismatch between your Elasticsearch and Kibana versions (and the impossibility of run kibana service).

To avoid this inconvenience, exclude the kibana package from the auto-updates:

# Disable kibana auto-update
❯ sudo apt-mark hold kibana

# To re-enable kibana auto-update
❯ sudo apt-mark unhold kibana

🗒 Notes

Why don’t install ES like Kibana using debian packages?
- In this way, we can easily switch to use multiple ES folders, for divide both in terms of versions and “projects” the instances.