Developer of software for creating technical documentation

How to create a site search with the most relevant results

site search implementation
configuring Apache Solr on three servers
custom module development for highlighting search results
setting up relevance sorting with Ngram
Want the same?

Client

The client is an international IT company that develops software for creating technical documentation.

Request

There was so much information on the client’s website that it needed a site search. The client provided a detailed specification for developers, which covered a significant part of the requirements for formats, functioning, search results, and the Drupal CMS itself.

Search on a Drupal site

Often, the Search API module is used to configure search on a Drupal site. This module serves as an interlayer between Drupal and various search engines, including Apache Solr, which the ADCI Solutions team uses on projects most often. The Search API has a lot of settings for both the Drupal site and Apache Solr. The latter takes into account what information is important to store so that the search functions well.

In such cases, Drupal developers’ work comes down to installing the module and configuring standard parameters and the server. It looked like the work on this project would follow the same pattern. But our assumption was wrong.

Server

The client gave us access to the development, stage, and production servers. To set up Apache Solr on all three servers from scratch, our team lead tried on the role of a DevOps engineer. In addition, it was crucial to choose the right version of Solr: Solr 7 was more trustworthy, but Solr 8 agreed with Drupal the best. We eventually settled on Solr 8.

The brand new task and its initial conditions did not let us complete the job quickly, and the work dragged on for a couple of dozen hours. On the bright side, now we have the necessary skills to get it done in 2-3 hours on a similar project. But one way or another, the time spent on a task depends on the circumstances.

Site search

As we’ve mentioned, the spec explained how the search should work in great detail. But several points were still missing.

A site search consists of two major parts: the search box in the website header with a magnifying glass icon and the page with search results.

search box in website header

website search results page

There are also search categories to filter general search results.

on-site search filters

A search result includes a title, a snippet with the keyword and its derivatives in bold, and breadcrumbs.

on-site search results

The site has several language versions. The search results depend on the language version the person is using: searching for German words in the English version will not yield anything, and vice versa.

Breadcrumbs

Breadcrumbs have become the number one problem. In the design layout, they were very long and bled off the page. The client had an idea to hide the part that did not fit on the page with an ellipsis. But each part of the breadcrumbs was supposed to be clickable; if we had hidden the part that didn’t fit, the user would not understand where they were and would not be able to go to the next section. We successfully explained this to the client and set up line breaking using CSS.

Search results page

According to the technical specifications, a search result summary had to include the keywords from the query, and the keywords had to be put in bold. Initially it didn't work that way. The summary was generated by the Search API that crawled the page from top to bottom and pulled out pieces of text containing the requested word. Sometimes the summary included the first part of a sentence containing the word, but the word itself was not visible. In the Search API, the PHP highlight processor was responsible for this, but we developed a custom module, pasted the PHP file into it, and configured the highlighting mechanism as we needed.

Search results sorting

We arranged for the search results to be sorted by relevance. Materials with the keyword in the heading ranked highest, followed by results with this word and its derivatives in the article body. For this sorting, we used Ngram, a plugin for Solr. It recognizes words derived from the query, which makes the search deeper and more complete.Search box

Using the Boosting results procedure, each search result is assigned its own place in the search results. The position is calculated by multiplying the boost factor, which is set by the developer, and the relevance score of the content piece, which is set in Solr. The higher the boost factor value, the higher the search result will rank.

Results

With the exception of Solr issues that do not affect search results, our team carried out the work successfully.

Read more:

Another project with a website search and a complex system of search filters →

Drop us a line

Your email isn’t going to end up in the inbox abyss, never replied.
At ADCI Solutions, we care about each inquiry

Next case study

Multifunctional Dashboard

Universal admin or manager panel that can be customized to a specific business