e-Zest members share technology ideas to foster digital transformation.

Converting from MySQL to Elasticsearch

Written by Mayur Jain | Jun 25, 2014 1:03:07 PM

While working on a project called Lantanacloud at e-Zest Solutions Ltd. India, the project was initially implemented using MySQL for database storage. Later on we decided to use ElasticSearch for two primary reasons:

  1. Handle Big Data i.e. NoSQL
  2. Handle Millions of data records on a daily basis

Before I begin with the transition of MySQL to Elasticsearch you must know what is Elasticsearch, right ?
Here's a brief description -- Elasticsearch is an end-to-end distributed, restful search and analytics platform.

Some people have a misconception about Elasticsearch that it is just a search application, but Elasticsearch is actually an NoSQL database where you can store & retrieve data.

As Wikipedia says : Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

For more details visit: what is elasticsearch?

Now moving on further, I would say, if your application is very big and complex then you have to identify those tables which you need to convert. I would suggest that you begin with those tables which consume lot of data and are queried more on a daily basis.

Now you have to decide the database schema for Elasticsearch.

Refer this to understand the analogy between RDBMS & Elasticsearch.

Next you need to understand the concepts of sharding & replicas.

To begin with Elasticsearch you first need to download and install it. Refer installation for the steps.

I prefer using the bulk API for indexing documents because the bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed and reduces your number of API calls.

Read data from database/your source with an iterator and buffer up them into an array. Then pass on this array to the body field of bulk API.

The bulk API is mainly "bulk" in the sense that you can send a lot of stuff over at once, it doesn't allow Elasticsearch to do any special optimizations when inserting the data.

HOW I did it ?:

I created a class file where me & my colleague wrote many functions to perform various Elasticsearch operations:
Connect to Elasticsearch:

This function connects to the Elasticsearch and returns client handler.

/*Arguments:
@param port : connection should be established at this 'port'
@param host : connection should be established at this 'host'
*/
public function elasticsearch_connect( $host = 'localhost',$port = '9200' )
{
 $connect = array();
 $hostPort = $host . ':' . $port;
 $connect['hosts'] = $hostPort;
 $this->;clientHdlr = new Elasticsearch\Client( $connect );
}

Insert data into Elasticsearch:

This function inserts data into elasticsearch using BULK API.

/*Arguments:
@param index : elasticsearch index in which data is to be inserted.
@param type  : elasticsearch type in which data is to be inserted.
@param inputData : Array to be inserted.
*/
public function elasticsearch_insert( $index, $type, $inputData )
{
 $inputData['index'] = $index;
 $inputData['type']  = $type;
 $index_document     = $this->clientHdlr->bulk( $inputData );
 if($index_document) {  return 1; } else {  return 0;  }
}

Check if index exits or not:

This function checks that index and type exists or not in the elasticsearch.

public function existsIndex( $index, $type )
{
 $checkArray = array();
 $checkArray['index'] = $index;
 $checkArray['type'] = $type;
 $checkArray['id'] = 1;
 if( $this->clientHdlr->exists( $checkArray ) )
  { return true; }
 else
  { return false; }
}

Search Operation:

This function is used to search a document in Elasticsearch. If we directly search for a particular document & if the index and type of the document are absent then we get an error and rest of the code execution stops.
So we first check that the required index & type exists or not and if the document also exists then we return the result, but if the document is missing then we return a false.

/*Arguments:
@param json : A search query in json format.
*/
public function searchIndex( $index, $type, $json )
{
 $searchArray = array();
 $searchArray['index'] = $index;
 $searchArray['type']  = $type;
 $searchArray['body'] = $json;
 if( !( $this->existsIndex( $index, $type ) ) )
  { return false; }
 else
 {
  $retArray = $this->clientHdlr->search( $searchArray );
  if( $retArray['hits']['total'] == 0 )   { return false; }
  else { return $retArray ;} // Leaving with return array
 }
}

Get the Aggregate value:

This Function will return aggregated value of the intended field from elasticsearch.

One problem of Elasticsearch is that it does not support auto-increment feature provided by MySQL.
So before calling this function we first checked that the required index & type exists or not and if the document exists then we searched for the maximum value of that field using this function. By this method we used get the last value of the field that was indexed and then increment it before indexing further documents.

/*Arguments:
@param fieldName : Name of the field for which aggregated value is to be fetched.
@param aggType   : Type of aggregation( e.g.: "max", "min", "mean" .... ).
@return          : Result of aggregation query.
*/
public function getAggIdValue( $index, $type, $fieldName, $aggType )
{
 $aggArray = array();
 $aggArray['index'] = $index;
 $aggArray['type']  = $type;
 //multi search, i.e. search in multiple indices.
 $json = '{
  "query" : {
   "match_all" : {
    }
   },
  "facets" : {
   "'.$fieldName.'"  :  {
    "statistical" : {
     "field" : "'.$fieldName.'"
    }
   }
  }
 }';

 $aggArray['body'] = $json;
 $retValue = $this->clientHdlr->search( $aggArray );
 return $retValue['facets'][$fieldName][$aggType];
}

Update a document:

When we update a document Elasticsearch basically overwrites the document.

So to avoid re-writing the entire document again, I suggest the following steps:

  1. Search the document to be modified and assign the result to a temporary array.
  2. Assign the new values to the specific key-value pairs of the array that you to modify.
  3. Again perform insert operation with this temporary array.

PLUGINS:

Elasticsearch has provided many plugins which prove to be very useful for analysis. To install them visit: Plugins.

For developers who like a good GUI to see how cluster looks like, I recommend them to install the elasticsearch-head plugin. Using this 'head' plugin you can generate queries as well as test your own queries before you implement them in your code.

I hope this post helps you a lot.

Feel free to ask any doubts, drop comments, mail me.

You can also share this post if you like.