Yewno’s mission is to extract and organize valuable insights from an overwhelming quantity of unstructured data. We are building the next generation Knowledge Graph which helps people to overcome the information overload problem, and to research and understand the world in a more natural manner. In contrast to classical information-retrieval engines, rooted in theoretical computer science, our approach is inspired by the way humans process information from multiple sensorial channels and it leverages state-of-the-art Computational Linguistics, Network Theory, Machine Learning, as well as methods from the traditional Artificial Intelligence.
At the core of our technology is the framework that extracts, processes, links and represents atomic units of knowledge – concepts – from heterogeneous data sources. A Deep Learning Network continuously “reads” high-quality sources and projects concepts into a multi-layered and multi-dimensional Embedding Space where similarity measures are used to group together related concepts along different dimensions (semantic, syntactic, just to name a few).
Yewno Knowledge Graph
Thanks to these techniques, a graph-like network is induced and advanced tools from the field of complex networks are utilized to extract insights and detect emerging properties. Different sources of data (e.g. news, patents, stock prices, etc.) and different similarity metrics (e.g. above mentioned semantic, syntactic, factual) yield different networks. These networks are collectively called Yewno Knowledge Graph (YKG). Additionally, Yewno’s approach explicitly addresses the evolution of the Knowledge Graph over time and extracts insights by analyzing not only the nature of the interconnected concepts and communities, but also their temporal dynamics. Figure 1 shows the snapshot of our Knowledge Graph at a particular point in time.
Figure 1: A snapshot of the Yewno Knowledge Graph. Note: only a limited number of connections per concept are shown.
As mentioned earlier, layers of Yewno Knowledge Graphs are induced from a plethora of sources listed in the following. Where otherwise specified, we provide a 5 years history of related data:
- Official Filings
- Clinical Trials
- Judicial Documents
- Scientific Publications
- (Intraday) Stock Prices
Each source above undergoes a number of processing steps implemented within the Yewno Intelligent Engine that eventually culminate with the creation and continual update of one, or more layers of the Knowledge Graph. It is worth noting that different sources exhibit intrinsically different update patterns of the Knowledge Graph, from news being updated in real-time, to official filings updated quarterly. Most other networks are updated on a daily basis.
An Evolving Data Framework
Yewno’s Knowledge Graph is a powerful and dynamic representation of knowledge across a vast corpus of documents, and evolves in real time. It is interpreted by an Inference Engine which detects and explains the relationships and changes in connections over time.
Connections amongst represented concepts can be traced back to the original source document, down to the sentence and their ontological class. This methodology allows for the analysis of new information on an intraday basis, providing anomaly detection, novelty and relevance scores.
Graph Embeddings And Exposures: A Multidimensional Data Framework
The dynamic nature of the Yewno Knowledge Graph enables the usage of advanced machine learning algorithms to extract hidden insights that emerge as a consequence of network effects. Networks, such as supply chain networks, social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the financial dependencies, structure of society, language, and different patterns of communication, to mention a few. Many approaches have been proposed to perform such an analysis. Effective graph analytics provide users with a deeper understanding of what is behind the data, and thus can benefit a plethora of real world applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer from high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem . It converts the graph data into a low dimensional space in which the graph structural information and graph topological properties are maximumly preserved.
We have used Deep Neural Network embedding and DeepWalk techniques [2-3] that are unsupervised learning methods for building latent topological representations of graph vertices. Topological representations are hidden features of the vertices that capture neighbourhood similarity and community membership. The hidden representations encode topological relations (weights assigned to the edges) in a continuous vector space with a relatively small number of dimensions. Graph Embeddings take a graph as input and produces a hidden representation as an output. Graph Embeddings is an online algorithm, and it is also a scalable algorithm. It uses local information obtained from truncated random walks to learn hidden representations by treating walks as unsupervised training set. It is also suitable for a broad class of real-world applications such as network classification, and anomaly detection.
Figure 2: Embedding a graph into a lower dimensional metric space
Exposures: Quantifying the importance of connections in the Knowledge Graph
While a visual inspection of networks can reveal interesting connectivity patterns, finding suitable algorithms that can extract hidden insights is still an active research problem. Here we propose Exposures, as measures of distance between any two nodes, or groups of nodes, within the network or its embedded counterpart. Exposure computation takes into account not only first order relationships from the graph, but also higher-order relationships permitting hidden patterns and emerging properties to emerge – task that without the support of a Knowledge Graph would otherwise be impossible.
Concept exposures are defined as a similarity metric taking higher-order relationships into account. The metric is calculated using a simple approach of combining multiple components derived from the Knowledge Graph into a linear combination. These components are standalone metrics on their own varying in complexity. A total of four factors (Contribution, Pureplay, Centrality, Similarity) are computed that measure the exposure of any entities to a set of target concepts. We will focus our analysis here to entity-to-concepts exposures, but the method is generally applicable to the exposure between any pair of concepts.
- Contribution Score is a measure of how much each entity was mentioned or published documents related to the target connected concept relative to all the companies in the asset universe
- Pure Play Score is a measure of the percentage of each entity’s mention or document publication to the target concept relative to its mentions or publications across other concepts
- Centrality Score is based on centrality diffusion (local PageRank) of the network constructed from mentions/publications between concepts. Incorporates second-order connections that favors central nodes connected to other central nodes.
- Similarity Score is based on how close are the companies and concept projections in the embedded space.
- The Aggregated Score f(Contribution, Pure play, Centrality, Similarity) is calculated as a weighted linear combination of the previous scores normalized by the maximum.
Using the combination of the mentioned components we are able to derive hidden relationships between concepts. Feeding different weights to the objective function of the method results in a different order of concepts in the result which is caused by different exposure values from the target concepts to the source concept. The method is therefore highly flexible in the way of forcing it to capture local connections rather than hidden ones and vice versa. Next section will highlight how the usage of such metrics can help uncovering hidden connections in a real-world scenario of the discovery of a “Vaccine” for Covid-19.
Discovery of a “Vaccine” for Covid-19
Covid-19 which originated in Wuhan, China defined 2020. The disease which continues to be rampant has killed over 1.3 million people globally and warped financial markets and global economies in ways never felt or seen before.
It is generally accepted that the only long term solution to eradicating the virus is the discovery of a viable vaccine. So focused are global markets on this “Vaccine” solution, that any breakthrough small or large by pharmaceutical companies has moved global markets dramatically.
It is now imperative for investors to be on top of any developments in this “Vaccine” solution. As more and more companies are declaring breakthroughs. It is key to be able to understand their real potential impact and benefit to wider society and implication on the investment universe. This is certainly a challenge as the information landscape – from news coverage to market data – shifts, compounds and accumulates each instant.
Analysis of exposures
Yewno’s Knowledge Graph autonomously performed an exposure analysis of the companies that are mostly influenced by new emerging concepts – in this case the Vaccine – measured by the above mentioned aggregated score. The following table shows the list of the Top 10 companies exposed to the Vaccine, ordered by the average of daily aggregated exposure scores in 7th – 15th November 2020:
|2||Beximco Pharmaceuticals Ltd.||0.416748|
|4||Immunoprecise Antibodies Ltd.||0.381429|
|6||Cansino Biologics Inc.||0.354410833|
Table 1: Ten most exposed companies to Vaccine ordered by average daily exposure
In addition, we can see the temporal evolution of daily exposures in Nov 2020 for the above Top 10 companies exposed to the Coronavirus. We can observe how Yewno technology was capable of recognizing the influence that the Vaccine concept had of these companies:
Figure 3: Temporal Evolution of highest average exposed companies over time period
Top 10 Companies Exposed to Vaccine over timeframe:
Figure 4: Evolution of exposures over time
Selected Metadata for Pfizer
Selected Snippets for Pfizer
Pfizer and BioNTech’s preliminary analysis suggests their vaccine is 90% effective at preventing symptomatic COVID-19, offering the best hope yet of curbing the pandemic, as coronavirus cases and related hospitalizations reach record highs.
Concepts Extracted from snippet:
Vaccine, Covid-19, Biontech Se, Coronavirus, Pandemic, Hospital, Pfizer
Stock markets rocketed higher after Pfizer said early data show its coronavirus vaccine is effective. Investors breathed a sigh of relief after days of U.S. presidential limbo ended with Democrat Joe Biden declared the president-elect. Markets were already sharply higher on the U.S. election result when Pfizer said that data shows vaccine shots may be 90 per cent effective at preventing COVID-19, indicating the company is on track this month to file an emergency use application with U.S. regulators.
Concepts Extracted from snippet:
Covid-19, Pfizer, Joe Biden, 2020 US Presidential Election, Stock market
In July, my administration reached an agreement with Pfizer to provide $1.95 billion to support the mass manufacturing and distribution of 100 million doses, with the option to purchase a total of 600 million doses shortly thereafter, Trump said. “Our investment will make it possible for the vaccine to be provided by Pfizer free of charge,” he added.
Concepts Extracted from snippet:
Donald Trump, Pfizer, Covid-19
Pricing for the future Covid-19 vaccines ranges from AstraZeneca’s $3-$5 a dose and Johnson & Johnson’s $10 on a not-for-profit basis guaranteed for the current pandemic to Pfizer’s $19.50 a shot and Moderna’s $37.
Concepts Extracted from snippet:
Johnson&Johnson, Covid-19, Pfizer, Moderna, AstaZeneca
Explaining the Connections and Exposures
One of the key features of the Yewno’s Knowledge Graph is its ability to point back to fragments of information that caused certain connections or exposures to appear. This allows to provide data either as a data-feed, a fragment of which is shown for specific days in November 2020, or through Yewno|Edge, an AI-based investment research platform.
Conclusions on “Vaccine”
Identifying exposed companies to a trending concept (or event) gives us the possibility to perform further analysis on the effects these exposures could have on the financial system; such exposure measures return a negative or positive factor model against those detected companies and their stocks. Similarly, we can perform the same analysis using bulk data-feeds that provide all the factors in a format suitable for machine analysis and interpretation.
Complementing Research with Yewno|Edge
We want to show the breadth and width of data that an analyst would have access to through our Yewno|Edge platform in the Coronavirus case study. We can do an overall search of the concept “Vaccine” on Document search, finding all the information that contains the concept in News, Patents, Official Filings, Transcripts and Clinical Trials, as shown in Figure 5.
Figure 5: All the information containing the concept “Vaccine”
Yewno|Edge can also help us in comparing the exposure values with the stock prices and sentiment for a specific company and/or concept. Figure 5 shows the evolution of exposures between AstraZeneca PLC, one of the highly exposed companies, and Vaccine. We can observe an increasing trend in exposures as well as the positive sentiment values, likely due to the announcement of successful trials against the virus.
Figure 6: Exposure of AstraZeneca towards “Vaccine” year to date
In addition Yewno|Edge can provide evidence of such exposure and positive sentiment through the Key Development module shown in Figure 7.
Figure 7: Key Developments between Astrazeneca and Vaccine
 Palash Goyal, Emilio Ferrara. “Graph Embedding Techniques, Applications, and Performance: A Survey”. Knowledge-Based Systems, vol. 151, pp. 78-94, 2018, doi: https://doi.org/10.1016/j.knosys.2018.03.022
 Bryan Perozzi, Rami Al-Rfou, Steven Skiena. “DeepWalk: Online Learning of Social Representations”. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, ACM, pp. 701–710, New York, New York, USA, 2014, doi: https://doi.org/10.1145/2623330.2623732
 Grover, Aditya, and Jure Leskovec. “node2vec: Scalable feature learning for networks.” Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855-864. 2016.
 Ruggero Gramatica. “Information network with linked information nodes”. US10025862B2
 Ruggero Gramatica, Haris Dindo, “Structuring data in a knowledge graph”. US10528871B1