In a recent project, we had a chance to work with mongo dB, document oriented NoSQLdatabase.
The problem statement was to provide generic search capabilities on proprietary nested XMLs structures. These proprietary XMLs contained building construction rules and were defined by studying construction rules laid out by government agencies.
These XMLs were used as an input to AutoCAD for validating construction site plan drawings.
A new product solution was envisioned by the customer where they planned to develop a portal from where the registered user (architects, builders) could search the rules for their civil and commercial projects for various local city planning authorities.
The information in the XML is represented as nested structures and each of the nodes in the XML tree had key attributes on which search was required. The structure and content of the XML was different for each local city planning authority.
The key points for designing the solution:
- Adaptability for changes in structure in future.
- Simple and generic search interface from usability perspective.
- Fast response to complex search queries.
- Scalability of solution with increase in data and user requests.
Solution approaches
MySQL - Relation Database Approach
The First approach of representing the XML tree in a relational model was not fit for the requirement due to following constraints:
- XML structure was subjected to change frequently.
- Search component logic i.e. code would also need to change to adapt to change in structure.
- For every Rule category, separate SQL will be required to be built dynamically.
- Complex import program will be required to load the xml and populate the database
So, the approach was rejected in our internal design discussions.
ExistDB - XML Database Approach
The Second approach to use the XML as data source and their by using XPath based search queries to pull the resultant nodes.
As a part of the proof of concept, the following items were taken up:
- Storing the XML in a database column and then using SQL, XPath combination to retrieve results.
- Storing the XML in existDB which is specialised XML database.
- Storing the XML in the specialised existDB or a XML column in relational database, solves the storage problem. The next problem was to query and apply filter on XML content, we defined several XPATH based on the query-able elements and attributes.
We faced issue with selecting of XML nodes from the XML. Since we needed the content from the root node, and filtered children nodes in the same result set.
Mongo DB – Document Oriented NoSQL Database Approach
The Third Approach: Pre-processing the XML tree structure and converting it into a flat model in mongo DB document store.
In our case we have a single collection in mongo DB which will have combination of the above attributes. Querying on this model is easy, since the schema is de-normalized. No joins are required. So we plan to use the Mongo DB based solution for Storage and Querying of the Rule data.
Advantages of document oriented database:
- They don’t enforce as fixed schema, which makes them more flexible than traditional database tables.
- No initial setup for tables is required as in case of relational database.
- No object-relation impedance mismatch output of mongo dB is JSON which can be directly passed to the UI Layer without conversion the result to object models.
- The full text search capabilities of mongo db, enabled free text search on descriptive field and textual content.
- The Lookup operation enabled joining two different collection and project a customized result data as expected by UI component.
- Aggregation Framework and rich set of operations helped to build customized data processing pipeline.
- A generic search component could be easily built on top of Mongo DB with UI layer handling the change in search fields and search based on client side configuration.
After our initial assessment we used the JMeter tool to measure performance variety of search queries. The performance of the queries was as expected.
Finally, our decision was to choose Mongo DB, after a carefully planned proof of concept phase that helped us to make the right decision.