Introduction

Yewno’s mission is to extract and organize valuable insights from an overwhelming quantity of unstructured data. We are building the next generation Knowledge Graph which helps people to overcome the information overload problem, and to research and understand the world in a more natural manner. In contrast to classical information-retrieval engines, rooted in theoretical computer science, our approach is inspired by the way humans process information from multiple sensorial channels and it leverages state-of-the-art Computational Linguistics, Network Theory, Machine Learning, as well as methods from the traditional Artificial Intelligence.  

At the core of our technology is the framework that extracts, processes, links and represents atomic units of knowledge – concepts – from heterogeneous data sources. A Deep Learning Network continuously “reads” high-quality sources and projects concepts into a multi-layered and multi-dimensional Embedding Space where similarity measures are used to group together related concepts along different dimensions (semantic, syntactic, just to name a few).         

Yewno Knowledge Graph

Thanks to these techniques, a graph-like network is induced and advanced tools from the field of complex networks are utilized to extract insights and detect emerging properties. Different sources of data (e.g. news, patents, stock prices, etc.) and different similarity metrics (e.g. above mentioned semantic, syntactic, factual) yield different networks. These networks are collectively called Yewno Knowledge Graph (YKG). Additionally, Yewno’s approach explicitly addresses the evolution of the Knowledge Graph over time and extracts insights by analyzing not only the nature of the interconnected concepts and communities, but also their temporal dynamics. Figure 1 shows the snapshot of our Knowledge Graph at a particular point in time. 

A snapshot of the Yewno Knowledge Graph. Note: only a limited number of connections per concept are shown.Figure 1: A snapshot of the Yewno Knowledge Graph. Note: only a limited number of connections per concept are shown.

Content Universe

As mentioned earlier, layers of Yewno Knowledge Graphs are induced from a plethora of sources listed in the following. Where otherwise specified, we provide a 5 years history of related data:         

  • News
  • Patents
  • Official Filings
  • Transcripts
  • Clinical Trials
  • Judicial Documents
  • Scientific Publications
  • (Intraday) Stock Prices

Each source above undergoes a number of processing steps implemented within the Yewno Intelligent Engine that eventually culminate with the creation and continual update of one, or more layers of the Knowledge Graph. It is worth noting that different sources exhibit intrinsically different update patterns of the Knowledge Graph, from news being updated in real-time, to official filings updated quarterly. Most other networks are updated on a daily basis.

An Evolving Data Framework

Yewno’s Knowledge Graph is a powerful and dynamic representation of knowledge across a vast corpus of documents, and evolves in real time.  It is interpreted by an Inference Engine which detects and explains the relationships and changes in connections over time.

Connections amongst represented concepts can be traced back to the original source document, down to the sentence and their ontological class.  This methodology allows for the analysis of new information on an intraday basis, providing anomaly detection, novelty and relevance scores.

Graph Embeddings And Exposures: A Multidimensional Data Framework 

The dynamic nature of the Yewno Knowledge Graph enables the usage of advanced machine learning algorithms to extract hidden insights that emerge as a consequence of network effects.  Networks, such as supply chain networks, social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the financial dependencies, structure of society, language, and different patterns of communication, to mention a few. Many approaches have been proposed to perform such an analysis. Effective graph analytics provide users with a deeper understanding of what is behind the data, and thus can benefit a plethora of real world applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer from high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem [1]. It converts the graph data into a low dimensional space in which the graph structural information and graph topological properties are maximumly preserved.

Graph Embeddings

We have used Deep Neural Network embedding and DeepWalk techniques [2-3] that are unsupervised learning methods for building latent topological representations of graph vertices. Topological representations are hidden features of the vertices that capture neighbourhood similarity and community membership. The hidden representations encode topological relations (weights assigned to the edges) in a continuous vector space with a relatively small number of dimensions. Graph Embeddings take a graph as input and produces a hidden representation as an output.  Graph Embeddings is an online algorithm, and it is also a scalable algorithm. It uses local information obtained from truncated random walks to learn hidden representations by treating walks as unsupervised training set. It is also suitable for a broad class of real-world applications such as network classification, and anomaly detection.     

Figure 2: Embedding a graph into a lower dimensional metric space

Exposures: Quantifying the importance of connections in the Knowledge Graph

While a visual inspection of networks can reveal interesting connectivity patterns, finding suitable algorithms that can extract hidden insights is still an active research problem. Here we propose Exposures, as measures of distance between any two nodes, or groups of nodes, within the network or its embedded counterpart. Exposure computation takes into account not only first order relationships from the graph, but also higher-order relationships permitting hidden patterns and emerging properties to emerge – task that without the support of a Knowledge Graph would otherwise be impossible.

Concept exposures are defined as a similarity metric taking higher-order relationships into account. The metric is calculated using a simple approach of combining multiple components derived from the Knowledge Graph into a linear combination. These components are standalone metrics on their own varying in complexity. A total of four factors (Contribution, Pureplay, Centrality, Similarity) are computed that measure the exposure of any entities to a set of target concepts. We will focus our analysis here to entity-to-concepts exposures, but the method is generally applicable to the exposure between any pair of concepts.

  1. Contribution Score is a measure of how much each entity was mentioned or published documents related to the target connected concept relative to all the companies  in the asset universe
  2. Pure Play Score is a measure of the percentage of each entity’s mention or document publication to the target concept relative to its mentions or publications across other  concepts
  3. Centrality Score is based on centrality diffusion (local PageRank) of the network constructed from mentions/publications between concepts. Incorporates second-order connections that favors central nodes connected to other central nodes.
  4. Similarity Score is based on how close are the companies and concept projections in the embedded space.
  5. The Aggregated Score f(Contribution, Pure play, Centrality, Similarity) is calculated as a weighted linear combination of the previous scores normalized by the maximum.    

Using the combination of the mentioned components we are able to derive hidden relationships between concepts. Feeding different weights to the objective function of the method results in a different order of concepts in the result which is caused by different exposure values from the target concepts to the source concept. The method is therefore highly flexible in the way of forcing it to capture local connections rather than hidden ones and vice versa. Next section will highlight how the usage of such metrics can help uncovering hidden connections in a real-world scenario of the discovery of a “Vaccine” for Covid-19.

Discovery of a “Vaccine” for Covid-19

Covid-19 which originated in Wuhan, China defined 2020. The disease which continues to be rampant has killed over 1.3 million people globally and warped financial markets and global economies in ways never felt or seen before.

It is generally accepted that the only long term solution to eradicating the virus is the discovery of a viable vaccine. So focused are global markets on this “Vaccine” solution, that any breakthrough small or large by pharmaceutical companies has moved global markets dramatically.

It is now imperative for investors to be on top of any developments in this “Vaccine” solution. As more and more companies are declaring breakthroughs. It is key to be able to understand their real potential impact and benefit to wider society and implication on the investment universe. This is certainly a challenge as the information landscape – from news coverage to market data – shifts, compounds and accumulates each instant.    

Analysis of exposures

Yewno’s Knowledge Graph autonomously performed an exposure analysis of the companies that are mostly influenced by new emerging concepts – in this case the Vaccine – measured by the above mentioned aggregated score. The following table shows the list of the Top 10 companies exposed to the Vaccine, ordered by the average of daily aggregated exposure scores in 7th – 15th November 2020:

Ranking Company Exposure
1 Pfizer 0.733886625
2 Beximco Pharmaceuticals Ltd. 0.416748
3 Moderna Therapeutics 0.4000245
4 Immunoprecise Antibodies Ltd. 0.381429
5 CSL Limited 0.364196875
6 Cansino Biologics Inc. 0.354410833
7 AstraZeneca 0.350555375
8 Sinovac Biotech 0.34255975
9 Novavax 0.300888
10 Biontech Se 0.294341875

Table 1: Ten most exposed companies to Vaccine ordered by average daily exposure

In addition, we can see the temporal evolution of daily exposures in Nov 2020 for the above Top 10 companies exposed to the Coronavirus. We can observe how Yewno technology was capable of recognizing the influence that the Vaccine concept had of these companies:

Temporal Evolution

Figure 3: Temporal Evolution of highest average exposed companies over time period

Top 10 Companies Exposed to Vaccine over timeframe:

Figure 4: Evolution of exposures over time

Selected Metadata for Pfizer

Selected Snippets for Pfizer

Article ID:
79b42f821a5aaa90510473315cdaa3ad
Snippet:
Pfizer and BioNTech’s preliminary analysis suggests their vaccine is 90% effective at preventing symptomatic COVID-19, offering the best hope yet of curbing the pandemic, as coronavirus cases and related hospitalizations reach record highs.

Concepts Extracted from snippet:
Vaccine, Covid-19, Biontech Se, Coronavirus, Pandemic, Hospital, Pfizer

 

Article ID:

908502c457b3fdeea9c0e699c3489fa9
Snippet:

Stock markets rocketed higher after Pfizer said early data show its coronavirus vaccine is effective. Investors breathed a sigh of relief after days of U.S. presidential limbo ended with Democrat Joe Biden declared the president-elect. Markets were already sharply higher on the U.S. election result when Pfizer said that data shows vaccine shots may be 90 per cent effective at preventing COVID-19, indicating the company is on track this month to file an emergency use application with U.S. regulators.

Concepts Extracted from snippet:
Covid-19, Pfizer, Joe Biden, 2020 US Presidential Election, Stock market

 

Article ID:
B86b116a75aab3b8fb9f1934b897d634
Snippet:
In July, my administration reached an agreement with Pfizer to provide $1.95 billion to support the mass manufacturing and distribution of 100 million doses, with the option to purchase a total of 600 million doses shortly thereafter, Trump said. “Our investment will make it possible for the vaccine to be provided by Pfizer free of charge,” he added.

Concepts Extracted from snippet:
Donald Trump, Pfizer, Covid-19

 

Article ID:

D1f1e9af183e30418b4782b67172b8cc

Snippet:
Pricing for the future Covid-19 vaccines ranges from AstraZeneca’s $3-$5 a dose and Johnson & Johnson’s $10 on a not-for-profit basis guaranteed for the current pandemic to Pfizer’s $19.50 a shot and Moderna’s $37.

Concepts Extracted from snippet:

Johnson&Johnson, Covid-19, Pfizer, Moderna, AstaZeneca

Explaining the Connections and Exposures

One of the key features of the Yewno’s Knowledge Graph is its ability to point back to fragments of information that caused certain connections or exposures to appear. This allows to provide data either as a data-feed, a fragment of which is shown for specific days in November 2020, or through Yewno|Edge, an AI-based investment research platform.

Conclusions on “Vaccine”

Identifying exposed companies to a trending concept (or event) gives us the possibility to perform further analysis on the effects these exposures could have on the financial system; such exposure measures return a negative or positive factor model against those detected companies and their stocks. Similarly, we can perform the same analysis using bulk data-feeds that provide all the factors in a format suitable for machine analysis and interpretation.

Complementing Research with Yewno|Edge

We want to show the breadth and width of data that an analyst would have access to through our Yewno|Edge platform in the Coronavirus case study. We can do an overall search of the concept “Vaccine” on Document search, finding all the information that contains the concept in News, Patents, Official Filings, Transcripts and Clinical Trials, as shown in Figure 5. 

Figure 5: All the information containing the concept “Vaccine”

Yewno|Edge can also help us in comparing the exposure values with the stock prices and sentiment for a specific company and/or concept. Figure 5 shows the evolution of exposures between AstraZeneca PLC, one of the highly exposed companies, and Vaccine. We can observe an increasing trend in exposures as well as the positive sentiment values, likely due to the announcement of successful trials against the virus.

Figure 6: Exposure of AstraZeneca towards “Vaccine” year to date

In addition Yewno|Edge can provide evidence of such exposure and positive sentiment through the Key Development module shown in Figure 7.

Figure 7: Key Developments between Astrazeneca and Vaccine

References

[1] Palash Goyal, Emilio Ferrara. “Graph Embedding Techniques, Applications, and Performance: A Survey”. Knowledge-Based Systems, vol. 151, pp. 78-94, 2018, doi: https://doi.org/10.1016/j.knosys.2018.03.022

[2] Bryan Perozzi, Rami Al-Rfou, Steven Skiena. “DeepWalk: Online Learning of Social Representations”. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, ACM, pp. 701–710,  New York, New York, USA, 2014, doi: https://doi.org/10.1145/2623330.2623732

[3] Grover, Aditya, and Jure Leskovec. “node2vec: Scalable feature learning for networks.” Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855-864. 2016.

[4] Ruggero Gramatica. “Information network with linked information nodes”. US10025862B2

[5] Ruggero Gramatica, Haris Dindo, “Structuring data in a knowledge graph”. US10528871B1

Leave a Reply