e-Zest members share technology ideas to foster digital transformation.

Basic Apache Solr search analysis using PHP & log4j

Written by Pravin Patil | Oct 29, 2012 6:17:44 PM

If you are using Apache Solr as a search engine for your application then it is necessary to analyze Solr logs to improve your user experience and application performance.

You can track your user activities using logs files and based on that you can offer better services to your users.

I have listed some simple steps to setup logs file, parse logs using PHP and analyze data using MySQL database.

I have executed the following steps on Apache Solr3.5 & Apache Tomcat: 7.0

Assuming that Solr and Tomcat is installed under /opt on Linux OS

Solr Path: /opt/solr
Tomcat Path: /opt/tomcat

log4j.rootLogger=INFO, logfile

log4j.appender.logfile=org.apache.log4j.DailyRollingFileAppender
log4j.appender.logfile.File=/opt/solr/example/logs/solr.log

log4j.appender.logfile.DatePattern='.'yyyy-MM-dd
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d{yyyy-MM-ddHH:mm:ss} %p [%c{3}] %m%n
  • Step4:
  • Start Solr server. It will create log file under /opt/solr/example/logs/solr.log. It will rotate log on daily basis and archive the file like solr.log.2012-09-02

    Note: Make sure that /opt/solr/example/logs/ having write permission

  • Step5:
  • Create MySQL database table with the following schema to store parsed data into table.

    CREATE TABLE `qlog` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `core` varchar(255) NOT NULL,
      `qterm` varchar(255) NOT NULL,
      `param` varchar(1000) NOT NULL,
      `start` int(5) NOT NULL,
      `rows` int(5) NOT NULL,
      `qtime` int(5) NOT NULL,
      `hits` int(11) NOT NULL,
      `date` date NOT NULL,
      `time` time NOT NULL,
      PRIMARY KEY (`id`)
    )
    
  • Step6:
  • Parse solr log file using following PHP script and store into MySQL database

    Note: You need to update your MySQL sever details

    <?php
    
    //log file path
    $logFilePath = '/opt/solr/example/logs/solr.log.yyyy-mm-dd';
    
    $handle = @fopen($logFilePath, "r");
    
    if (file_exists($logFilePath))
    {
    	//connect to mysql
    	mysql_connect("localhost", "root", "") or die("Could not connect: " . mysql_error());
    	mysql_select_db("solr");
    
    	if ($handle) {
    
    		$rCnt = 0;
    		while (!feof($handle)) {
    
    		$buffer = fgets($handle, 4096);
    		preg_match("/^(S+) (S+) (INFO) ([solr.core.SolrCore]) [(S*)] (webapp=/(S+)) (path=/(S+)/) (params={(S+)}) (hits=(d+)) (status=(d+)) (QTime=(d+))/", $buffer, $matches); 
    
    		if(isset($matches[0])) {
    
    			$queryString = $matches[11];
    
    			preg_match_all('/([^?&=#]+)=([^&#]*)/',$queryString,$matches1);
    			$param = @array_combine( $matches1[1], $matches1[2]);
    
    			//collect required data
    			$core = $matches[5];
    			$q = urldecode($param['q']);
    			$start = $param['start'];
    			$rows = $param['rows'];
    			$qtime = $matches['17'];
    			$hits = $matches['13'];
    			$date = $matches['1'];
    			$time = $matches['2'];
    
    			//Insert into mysql table
    			$sql = "INSERT INTO qlog (core,qry,param,start,rows,qtime,hits,date,time)
    					VALUES ('".$core."','".$q."','".$queryString."',".$start.",".$rows.",".$qtime.",".$hits.",'".$date."','".$time."')";
    			$result = mysql_query($sql) or die("Unable to execute SQL: " . mysql_error());
    			$rCnt++;
    
    			} //if
    		} //while
    		fclose($handle);
    	} //if
    	echo $rCnt.' rows imported successfully...';
    }
    else
    {
    	echo 'Sorry cannot open log file.';
    }
    ?>
    
  • Step7:
  • Set crontab to run log parser script on daily basis or you can parse it manually

    00 05 * * * php path/to/your/logfile.php

  • Step8:
  • Analyze your solr log data using following simple queries

    • Top 5 search terms
    • This will help us to find top 5 searches

      SELECT qterm,COUNT(*) AS cnt FROM qlog WHERE hits 0 GROUP BY qterm ORDER BY cnt DESC LIMIT 0,5

    • Get number of queries per second
    • This will help us to know how many requests are coming per second.

      SELECT COUNT(*) AS cnt FROM qlog GROUP BY time ORDER BY cnt DESC LIMIT 0,1

    • Get search term which returns empty result set
    • This will help us to find out how many search terms are not having any matching records.

      SELECT qterm, COUNT(*) AS cnt FROM qlog WHERE hits = 0 GROUP BY qterm

    • Get search term which took more than 10 milliseconds to execute
    • This will help us to find out the queries taking long time to execute, and then we can optimize those search terms.

      SELECT qterm, qtime FROM qlog WHERE qtime> 10 ORDER BY qtime DESC

    We can also analyze data in more details for facet and mlt

  • Step9:
  • We can write simple PHP script to execute all above SQL queries. It will look like this

    For more details about solrlog4j and tomcat please refer:

    http://globalgateway.wordpress.com/2010/01/06/configuring-solr-1-4-logging-with-log4j-in-tomcat/