If you are using Apache Solr as a search engine for your application then it is necessary to analyze Solr logs to improve your user experience and application performance.
You can track your user activities using logs files and based on that you can offer better services to your users.
I have listed some simple steps to setup logs file, parse logs using PHP and analyze data using MySQL database.
I have executed the following steps on Apache Solr3.5 & Apache Tomcat: 7.0
Assuming that Solr and Tomcat is installed under /opt on Linux OS
Solr Path: /opt/solr
Tomcat Path: /opt/tomcat
- Step1:
- Step2:
- Step3:
Download log4j & slf4j from
http://www.apache.org/dyn/closer.cgi/logging/log4j/1.2.17/log4j-1.2.17.zip
http://www.slf4j.org/download.html
Remove "slf4j-jdk14-1.6.1.jar" from /opt/tomcat/webapps/solr/WEB-INF/lib
Extract downloaded files, copy "log4j-1.2.16.jar" & "slf4j-log4j12-1.6.4.jar" into /opt/tomcat/webapps/solr/WEB-INF/lib
Create new folder "classes" under /opt/tomcat/webapps/solr/WEB-INF/
Create new file "log4j.properties" under /opt/tomcat/webapps/solr/WEB-INF/classes
and add following content
log4j.rootLogger=INFO, logfile log4j.appender.logfile=org.apache.log4j.DailyRollingFileAppender log4j.appender.logfile.File=/opt/solr/example/logs/solr.log log4j.appender.logfile.DatePattern='.'yyyy-MM-dd log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d{yyyy-MM-ddHH:mm:ss} %p [%c{3}] %m%n
Start Solr server. It will create log file under /opt/solr/example/logs/solr.log
. It will rotate log on daily basis and archive the file like solr.log.2012-09-02
Note: Make sure that /opt/solr/example/logs/
having write permission
Create MySQL database table with the following schema to store parsed data into table.
CREATE TABLE `qlog` ( `id` int(11) NOT NULL AUTO_INCREMENT, `core` varchar(255) NOT NULL, `qterm` varchar(255) NOT NULL, `param` varchar(1000) NOT NULL, `start` int(5) NOT NULL, `rows` int(5) NOT NULL, `qtime` int(5) NOT NULL, `hits` int(11) NOT NULL, `date` date NOT NULL, `time` time NOT NULL, PRIMARY KEY (`id`) )
Parse solr log file using following PHP script and store into MySQL database
Note: You need to update your MySQL sever details
<?php
//log file path
$logFilePath = '/opt/solr/example/logs/solr.log.yyyy-mm-dd';
$handle = @fopen($logFilePath, "r");
if (file_exists($logFilePath))
{
//connect to mysql
mysql_connect("localhost", "root", "") or die("Could not connect: " . mysql_error());
mysql_select_db("solr");
if ($handle) {
$rCnt = 0;
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
preg_match("/^(S+) (S+) (INFO) ([solr.core.SolrCore]) [(S*)] (webapp=/(S+)) (path=/(S+)/) (params={(S+)}) (hits=(d+)) (status=(d+)) (QTime=(d+))/", $buffer, $matches);
if(isset($matches[0])) {
$queryString = $matches[11];
preg_match_all('/([^?&=#]+)=([^&#]*)/',$queryString,$matches1);
$param = @array_combine( $matches1[1], $matches1[2]);
//collect required data
$core = $matches[5];
$q = urldecode($param['q']);
$start = $param['start'];
$rows = $param['rows'];
$qtime = $matches['17'];
$hits = $matches['13'];
$date = $matches['1'];
$time = $matches['2'];
//Insert into mysql table
$sql = "INSERT INTO qlog (core,qry,param,start,rows,qtime,hits,date,time)
VALUES ('".$core."','".$q."','".$queryString."',".$start.",".$rows.",".$qtime.",".$hits.",'".$date."','".$time."')";
$result = mysql_query($sql) or die("Unable to execute SQL: " . mysql_error());
$rCnt++;
} //if
} //while
fclose($handle);
} //if
echo $rCnt.' rows imported successfully...';
}
else
{
echo 'Sorry cannot open log file.';
}
?>
Set crontab to run log parser script on daily basis or you can parse it manually
00 05 * * * php path/to/your/logfile.php
Analyze your solr log data using following simple queries
- Top 5 search terms
- Get number of queries per second
- Get search term which returns empty result set
- Get search term which took more than 10 milliseconds to execute
This will help us to find top 5 searches
SELECT qterm,COUNT(*) AS cnt FROM qlog WHERE hits 0 GROUP BY qterm ORDER BY cnt DESC LIMIT 0,5
This will help us to know how many requests are coming per second.
SELECT COUNT(*) AS cnt FROM qlog GROUP BY time ORDER BY cnt DESC LIMIT 0,1
This will help us to find out how many search terms are not having any matching records.
SELECT qterm, COUNT(*) AS cnt FROM qlog WHERE hits = 0 GROUP BY qterm
This will help us to find out the queries taking long time to execute, and then we can optimize those search terms.
SELECT qterm, qtime FROM qlog WHERE qtime> 10 ORDER BY qtime DESC
We can also analyze data in more details for facet and mlt
We can write simple PHP script to execute all above SQL queries. It will look like this
For more details about solrlog4j and tomcat please refer:
http://globalgateway.wordpress.com/2010/01/06/configuring-solr-1-4-logging-with-log4j-in-tomcat/