Big Data? - Get Big Insights!
by Sanjeet Banerji, on Sep 25, 2019 11:34:00 AM
Estimated reading time: 4 mins
As data grows exponentially it becomes difficult to handle the increasing complexity and dimensionality. This data can be in the form of consumer sentiments on social media websites, online market place, and online communities. It can also be in the form of consumer behaviour captured in the form of video and audio interactions. Here, Big Data and Analytics based framework provides a breakthrough with instant access to powerful predictive and visual analytic tools for businesses in general and market research in particular. It helps to draw and visualize powerful insights from a wide range of data sources.
Where to begin:
Data exploration entails searching for relationships and patterns you did not know that they ever existed. Hence the primary need is to build an engine to understand enterprise data and develop models to gain more insights leading to advanced usage of data for predictive modelling and decision support systems. It should be based on an intelligent analytics framework and flexible tool chain to help uncover business insights from complex datasets.
The engine should be able to support the following tasks:
- Flexible, intuitive text analytics
- Semantic analysis of text
- Recognize pattern and text-semantics
- Establish relationships between information entities
- Ingest data from different sources; e.g. text files, PDF files, MS Office documents, emails, RSS feeds, web content, JSON, Tiff, etc.
- Organize and classify data in terms of definable Ontology and Taxonomy
- Extensive data visualization platform
- Geo-spatial or temporal representation of data
- Fast search on the document keywords / metadata of text entities
- Auto-summarization of text / documents
- Multimedia analysis
- Machine Learning (ML) algorithms
- Natural Language Processing (NLP)
- Infinitely scalable data store
- High performance by using MAPR / YARN
- Interface with third party analytics tools
Whitepaper on "Big Data and Big Insights" elucidates how to monetize Big Data Analytics in real-life. Download now >
The Big Data Analytics Framework:
The framework is hugely scalable and powered by NLP, Text Analytics, and ML. It has easy to plug in modules that help in ingesting data from different sources. It helps to statistically analyse data based on business needs and visualize the output in a graphical format. The framework essentially provides a goal to turn text into data for analysis by using NLP and Analytics. It aggregates data from different sources into a Big Data infrastructure by using custom user-defined functions (UDFs), customizable workflows, and data modellers. It provides statistical packages, text analytics tools, and ML algorithms to perform functions, such as search, text analysis, geospatial analysis, multi-media analysis, statistical and mathematical modelling & analysis, and graphical analysis.
The framework provides engines for visualization, statistics & maths, and text analytics & co-relation:
- Statistics & math engine supports interactive algebra, calculus, discrete mathematics, graphics, numerical computation and many other areas of mathematics. It also supports statistical computing including statistical and graphical techniques – linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.
- Text analytics & correlation engine has three main components – text mining, semantic correlation between text segments, and probabilistic inference engine.
- Text mining engine includes modules for text classification, text summarization, document annotation, and ontology definition. It derives meaningful insights from humungous amounts of documents. Text classification module uses supervised learning algorithms to conduct document classification and regression analysis. Text summarization module uses algorithms to extract key sentences from the text and properly assembles them together. Document annotation module leverages Natural Language Tool Kit (NLKT) concepts in Hadoop HDFS environment with Storm to capture noun phrases and verb phrases to form annotations. Epistemology and Ontology uses Web Ontology Language (OWL) to determine the classes and properties in the domain, domains and ranges of properties, characteristics of classes, etc.
- Semantic correlation engine uses several well researched algorithms to derive probabilistic inferences based on similarities in text sources to produce optimal results. This method outperforms vector based lexical matching and is more accurate.
- Probabilistic inference engine uses Bayesian Networks vide a Probabilistic Relational Data Mining (PRDM) approach to provide probabilistic, directed graphical models for statistical inferences.
Benefits, limitations, and the future scope:
The Big Data Analytics framework is only limited by the limitations of imagination. The framework provides agility in the efforts to unearth latent knowledge and discover critical business insights from high data volumes. The massively parallel processing offered by the Big Data infrastructure provides an extremely fast way of crunching data allowing response in a couple of minutes for petabytes of data with the time increasing accordingly for higher orders.
The Big Data Analytics framework is recommended for document analysis and exploratory works. They can be typically used in Market Research scenarios, where the researcher does not know what he or she is looking for and needs to explore the data and relationships to gain deeper insights into the area of study. The framework is also used in scenarios where a systematic data discovery needs to be done in the fields of Intelligence Analysis, Fraud Analytics, Legal document analysis, clinical research, etc.
Future scope includes refinement in the framework to include high speed Audio and Video Analytics. Use of streams instead of batch processing systems would exponentially improve the performance in case of motion analytics or video analytics in real-time. Semantic analytics or understanding of the emotional context underpinning voluminous data is also an upcoming field and work is under progress to deliver high performance for live data even in continuously streaming mode.
Excerpt from the white paper 'Big Data and Big Insights' written by Mr. Sanjeet Banerji, Executive Vice President & Head – Artificial Intelligence & Cognitive Sciences, Datamatics Global Services Limited