๐Ÿ” Elasticsearch local setup

๐Ÿ”– Introduction

Setup and run a local ES instance on Linux

One of my recurrent problem with NLP tasks is how to store unstructured data on my PC. Let’s say for example that you had extracted a bunch of text comments from Twitter and you want to store all the entities found for each comment. How to store it? CSV and other tabular-like structures aren’t good for this kind of unstructured data.

A good solution could be to set up a local Elasticsearch (ES) instance and use it. At the cost of less speed at storing time, we will get some enjoyable pros:

  • First-class support to unstructured data

  • Not only to store but use it to analyze the data

  • Expose the data to more than one service, at the same time

  • Easily scaling if required: you can always copy your local ES indexes on cloud-based ES instances

So let’s setup a local ES instance and use it as a local database for all the unstructured data:

๐Ÿ”„ Updates

  • 21/03/2021

    • Disable Kibana auto-updates instructions

๐Ÿš€ Setup ES “start” and “stop”

Tested on Linux - Pop!_OS 20.10

  • Download and extract es folder like original documentation, in this guide the folder downloaded is elasticsearch-7.10.2

  • Move the downloaded folder

    sudo mkdir -p /opt/elasticsearch/
    sudo mv elasticsearch-7.10.2 /opt/elasticsearch
    
  • Create the ES managers

    • File /usr/bin/elasticsearch-start.sh
    #!/bin/ash
    bash /opt/elasticsearch/elasticsearch-7.10.2/bin/elasticsearch -p /tmp/elasticsearch-pid -d
    echo "Started es instance"
    
    • File /usr/bin/elasticsearch-stop.sh

      #!/bin/bash
      ES_PID=$(cat /tmp/elasticsearch-pid)
      echo "Killing es at pid $ES_PID"
      kill -SIGTERM "$ES_PID"
      
    • Make the scripts executable

      sudo chmod +x /usr/bin/elasticsearch-start.sh
      sudo chmod +x /usr/bin/elasticsearch-stop.sh
      
    • Now you can run and stop an ES instance from CLI

      # Start ES instance
      โฏ elasticsearch-start.sh
      Started es instance
          
      # Check the instance status
      โฏ watch -n1 curl localhost:9200
          
      # Stop the ES instance
      โฏ elasticsearch-stop.sh 
      Killing es at pid 34010
      

# ๐Ÿ–ฅ Setup kibana

Kibana is a GUI application to easily interface with ES

  • We will install Kibana using the official guide, and then start and stop with those commands:

    # Start / check / stop kibana instance
    โฏ sudo systemctl start kibana # Visit localhost:5601
    โฏ sudo systemctl status kibana
    โฏ sudo systemctl stop kibana
    

Exclude kibana from auto update

Kibana installed as linux package will be auto-updated with all the system packages, this could led to a mismatch between your Elasticsearch and Kibana versions (and the impossibility of run kibana service).

To avoid this inconvenience, exclude the kibana package from the auto-updates:

# Disable kibana auto-update
โฏ sudo apt-mark hold kibana

# To re-enable kibana auto-update
โฏ sudo apt-mark unhold kibana

๐Ÿ—’ Notes

  • Why don’t install ES like Kibana using debian packages?

    • In this way, we can easily switch to use multiple ES folders, for divide both in terms of versions and “projects” the instances.

๐Ÿ’ค Todo

  • Register the ES instance as systemctl service

๐Ÿ”— Links