• Introduction

    • Yewno Knowledge Graph

    • Content Universe

    • An Evolving Data Framework

  • Graph Embeddings and Exposures: A Multidimensional Data Framework

    • Graph Embeddings

    • Exposures: Quantifying the importance of connections in the Knowledge Graph

  • The case of Coronavirus outbreak 

    • Analysis of exposures

    • Explaining the Connections and Exposures

  • Conclusions

    • Yewno|Edge

    • Yewno Data Feeds

    • References

Introduction

Yewno’s mission is to extract and organize valuable insights from an overwhelming quantity of unstructured data. We are building the next generation Knowledge Graph which helps people to overcome the information overload problem, and to research and understand the world in a more natural manner. In contrast to classical information-retrieval engines, rooted in theoretical computer science, our approach is inspired by the way humans process information from multiple sensorial channels and it leverages state-of-the-art Computational Linguistics, Network Theory, Machine Learning, as well as methods from the traditional Artificial Intelligence.

At the core of our technology is the framework that extracts, processes, links and represents atomic units of knowledge – concepts – from heterogeneous data sources. A Deep Learning Network continuously “reads” high-quality sources and projects concepts into a multi-layered and multi-dimensional Embedding Space where similarity measures are used to group together related concepts along different dimensions (semantic, syntactic, just to name a few).

Yewno Knowledge Graph

Thanks to these techniques, a graph-like network is induced and advanced tools from the field of complex networks are utilized to extract insights and detect emerging properties. Different sources of data (e.g. news, patents, stock prices, etc.) and different similarity metrics (e.g. above mentioned semantic, syntactic, factual) yield different networks. These networks are collectively called Yewno Knowledge Graph (YKG). Additionally, Yewno’s approach explicitly addresses the evolution of the Knowledge Graph over time and extracts insights by analyzing not only the nature of the interconnected concepts and communities, but also their temporal dynamics. Figure 1 shows the snapshot of our Knowledge Graph at a particular point in time.

Figure 1: A snapshot of the Yewno Knowledge Graph. Note: only a limited number of connections per concept are shown.

Content Universe

As mentioned earlier, layers of Yewno Knowledge Graphs are induced from a plethora of sources listed in the following. Where otherwise specified, we provide a 5 years history of related data:

  • News
  • Patents
  • Official Filings
  • Transcripts
  • Clinical Trials
  • Judicial Documents
  • Scientific Publications
  • (Intraday) Stock Prices

Each source above undergoes a number of processing steps implemented within the Yewno Intelligent Engine that eventually culminate with the creation and continual update of one, or more layers of the Knowledge Graph. It is worth noting that different sources exhibit intrinsically different update patterns of the Knowledge Graph, from news being updated in real-time, to official filings updated quarterly. Most other networks are updated on a daily basis.

An Evolving Data Framework

Yewno’s Knowledge Graph is a powerful and dynamic representation of knowledge across a vast corpus of documents, and evolves in real time.  It is interpreted by an Inference Engine which detects and explains the relationships and changes in connections over time.

Connections amongst represented concepts can be traced back to the original source document, down to the sentence and their ontological class.  This methodology allows for the analysis of new information on an intraday basis, providing anomaly detection, novelty and relevance scores.

Graph Embeddings And Exposures: A Multidimensional Data Framework

(Dynamical) nature of the Yewno Knowledge Graph enables the usage of advanced machine learning algorithms to extract hidden insights that emerge as a consequence of network effects.  Networks, such as supply chain networks, social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the financial dependencies, structure of society, language, and different patterns of communication, to mention a few. Many approaches have been proposed to perform such an analysis. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a plethora of real world applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer from high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem [1]. It converts the graph data into a low dimensional space in which the graph structural information and graph topological properties are maximumly preserved.

Graph Embeddings

We have used Deep Neural Network embedding and DeepWalk techniques [2-3] that are unsupervised learning methods for building latent topological representations of graph vertices. Topological representations are hidden features of the vertices that capture neighbourhood similarity and community membership. The hidden representations encode topological relations (weights assigned to the edges) in a continuous vector space with a relatively small number of dimensions. Graph Embeddings take a graph as input and produces a hidden representation as an output.  Graph Embeddings is an online algorithm, and it is also a scalable algorithm. It uses local information obtained from truncated random walks to learn hidden representations by treating walks as unsupervised training set. It is also suitable for a broad class of real-world applications such as network classification, and anomaly detection.

Figure 2: Embedding a graph into a lower dimensional metric space

Exposures: Quantifying the importance of connections in the Knowledge Graph

While a visual inspection of networks can reveal interesting connectivity patterns, finding suitable algorithms that can extract hidden insights is still an active research problem. Here we propose Exposures, as measures of distance between any two nodes, or groups of nodes, within the network or its embedded counterpart. Exposure computation takes into account not only first order relationships from the graph, but also higher-order relationships permitting hidden patterns and emerging properties to emerge – task that without the support of a Knowledge Graph would otherwise be impossible.

Concept exposures are defined as a similarity metric taking higher-order relationships into account. The metric is calculated using a simple approach of combining multiple components derived from the Knowledge Graph into a linear combination. These components are standalone metrics on their own varying in complexity. A total of four factors are computed that measure the exposure of any entities to a set of target concepts. We will focus our analysis here to entity-to-concepts exposures, but the method is generally applicable to the exposure between any pair of concepts.

  • Importance Scores are based on the number of co-occurrences  between the entity and the target concepts or publication of documents by the entities with mentions of target concepts; two scores are provided:
    • Contribution Score is a measure of how much each entity was mentioned or published documents related to the target connected concept relative to all the companies  in the asset universe
    • Pure Play Score is a measure of the percentage of each entity’s mention or document publication to the target concept relative to its mentions or publications across other  concepts
  • Centrality Score is based on centrality diffusion (local PageRank) of the network constructed from mentions/publications between concepts. Incorporates second-order connections that favors central nodes connected to other central nodes.
  • Similarity Score is based on how close are the companies and concept projections in the embedded space.
  • The Aggregated Score is calculated as a weighted linear combination of the previous scores normalized by the maximum.

Using the combination of the mentioned components we are able to derive hidden relationships between concepts. Feeding different weights to the objective function of the method results in a different order of concepts in the result which is caused by different exposure values from the target concepts to the source concept. The method is therefore highly flexible in the way of forcing it to capture local connections rather than hidden ones and vice versa. Next section will highlight how the usage of such metrics can help uncovering hidden connections in a real-world scenario of 2020 Wuhan Coronavirus outbreak.

The Case Of Coronavirus Outbreak

The newest Coronavirus, which originated in Wuhan, China, continues to spread as more countries are declaring victims of the infection. Despite China’s best efforts to lock down 16 cities comprised of 50 million people, cases of the disease continue to compound. Fear of a global pandemic is rising as the death toll continues to mount.

With no halt to the spread in sight, economic impacts are starting to be felt across the globe.  Companies with exposure to the global travel industry in particular are experiencing disruption. For example, airlines such as British Airways have suspended China flights and many airlines globally are following suit, citing a lack of demand for flights to mainland China and even Hong Kong. Ripple effects throughout global supply chains and logistics are starting to present themselves more significantly.

On Sunday February 2nd 2020, Chinese stocks took a hit and lost over 7%. Economists believe this virus could paralyze the chinese economy.

It is now imperative for investors to be on top of this evolving global threat. The effect on the Chinese economy will also have an impact on global stocks in the weeks or months to come. It is key to be able to understand the potential impact in your investment universe. This is certainly a challenge as the information landscape – from news coverage to market data – shifts, compounds and accumulates each instant.

Analysis of exposures

Yewno’s Knowledge Graph autonomously performed an exposure analysis of the companies that are mostly influenced by new emerging concepts – in this case the Wuhan Coronavirus outbreak – measured by the above mentioned aggregated score. The following table shows the list of the Top 10 companies exposed to the Coronavirus, ordered by the average of daily aggregated exposure scores in January 2020:

Ranking Title Average Exposure
1 Inovio Biomedical Corp. 0.105160
2 Novavax 0.098364
3 Gilead Sciences 0.075550
4 BioCryst Pharmaceuticals 0.072604
5 AbbVie Inc. 0.040073
6 China Southern Airlines 0.038125
7 Hoffmann-La Roche 0.036747
8 Johnson & Johnson 0.034025
9 Merck & Co. 0.032821
10 Meridian Bioscience 0.031959
Table 1: Ten most exposed companies to Coronavirus ordered by average daily exposure

In addition, we can see the temporal evolution of daily exposures in January 2020 for the above Top 10 companies exposed to the Coronavirus. We can observe how Yewno technology was capable of recognizing the influence that the virus outbreak had of these companies:

Figure 3: Evolution of exposures over time

Explaining the Connections and Exposures

One of the key features of the Yewno’s Knowledge Graph is its ability to point back to fragments of information that caused certain connections or exposures to appear. This allows to provide data either as a data-feed, a fragment of which is shown for specific days in January 2020, or through Yewno|Edge, an AI-based investment research platform.

2020-01-21

• concepts: (Coronavirus, Inovio Biomedical Corp.)

event/class: health treatment, sentiment: 0.2292

Source snippet extraction: Inovio Pharmaceuticals Inc , which is working on an anti-Middle East respiratory syndrome coronavirus DNA vaccine, and BioCryst Pharmaceuticals, Inc. , which is developing galidesivir, primarily seen as a yellow fever and Marbug virus drug, but which has demonstrated in vitro success against other viruses, including coronaviruses, also saw their stocks move higher.

2020-01-22

• concepts: (Coronavirus, Inovio Biomedical Corp.),

event/class: pharmacology, sentiment: 0.1917

Source snippet extraction: Other companies that develop vaccines, including Aethlon Medical, Inovio Pharmaceuticals and BioCryst Pharmaceuticals, also saw their shares rise as the coronavirus spread.

• concepts: (Coronavirus, Novavax),

event/class: medical research, sentiment: 0.2315

Source snippet extraction: Novavax has initiated development of a vaccine candidate in response to the emergence of the Wuhan-version of the coronavirus seen recently in China.

• concepts: (Coronavirus, Novavax),

event/class: scientific research, sentiment: 0.3500

Source snippet extraction: Novavax touted its experience developing vaccine candidates for coronaviruses, including vaccines to protect against Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS).

2020-01-23

• concepts: (Coronavirus, Inovio Biomedical Corp.),

event/class: pharmacology, sentiment: 0.3315

Source snippet extraction: Inovio Pharmaceuticals Inc is developing a vaccine to treat the MERS strain of coronavirus.

• concepts: (Coronavirus, Novavax),

event/class: scientific research, sentiment: -0.0500

Source snippet extraction: Several biotech firms have said they are developing a vaccine to the Wuhan coronavirus, including Moderna, Novavax and Inovio.

• concepts: (Coronavirus, Gilead Sciences),

event/class: health and safety at work, sentiment: -0.4159

Source snippet extraction: Gilead Sciences Inc said on Thursday it was assessing whether its experimental Ebola treatment could be used against the new coronavirus. The new coronavirus has sickened hundreds of people in China.

• concepts: (Coronavirus, Gilead Sciences),

event/class: health treatment, sentiment: 0.2300

Source snippet extraction: Gilead Sciences Inc said it was assessing whether its experimental Ebola treatment could be used to treat coronavirus infection.

• concepts: (Coronavirus, BioCryst Pharmaceuticals),

event/class: health treatment, sentiment: 0.0000

Source snippet extraction: BioCryst Pharmaceuticals is in a Phase 1 study for a range of viruses, including coronaviruses.

2020-01-24

• concepts: (Coronavirus, Novavax),

event/class: scientific research, sentiment: 0.3450

Source snippet extraction: Novavax says it is now working on one for the Wuhan coronavirus. Novavax has a vaccine in development against MERS.

2020-01-26

• concepts: (Coronavirus, AbbVie Inc.),

event/class: pharmacology, sentiment: 0.1364

Source snippet extraction: China is testing an HIV drug as a treatment for symptoms of the new coronavirus that is rapidly spreading, said drugmaker AbbVie Inc on Sunday.

2020-01-27

• concepts: (Coronavirus, Inovio Biomedical Corp.),

event/class: scientific research, sentiment: 0.2250

Source snippet extraction: Inovio Pharmaceuticals, Inc. recently announced the Coalition for Epidemic Preparedness Innovations (CEPI) has awarded Inovio a grant of up to$9 million to develop a vaccine against the recently emerged strain of coronavirus (2019-nCoV).

• concepts: (Coronavirus, Inovio Biomedical Corp.),

event/class: scientific research, sentiment: 0.0667

Source snippet extraction: A few days later, Inovio Pharmaceuticals disclosed that it, too, was working on an experimental coronavirus vaccine.

• concepts: (Coronavirus, Johnson & Johnson),

event/class: scientific research, sentiment: 0.2000

Source snippet extraction: Pharmaceutical giant Johnson & Johnson is working to develop a vaccine for the emergent coronavirus identified last month in Wuhan, China, launching initial efforts to construct from the virus’ genetic sequence a candidate that could be tested in humans.

• concepts: (Coronavirus, Johnson & Johnson),

event/class: scientific research, sentiment: 0.0333

Source snippet extraction: Dr Paul Stoffels, Johnson & Johnson’s chief scientific officer, said the drug-maker can develop a vaccine in a few months to fight the coronavirus but that it could take as much as a year to bring it to market.

2020-01-28

• concepts: (Coronavirus, Inovio Biomedical Corp.),

event/class: scientific research, sentiment: 0.4333

Source snippet extraction: Shares of clinical-stage DNA vaccine maker Inovio Pharmaceuticals Inc. (INO) have gained more than 60 percent over the last 5 trading days as the deadly Chinese coronavirus outbreak continues to make headlines in the news.

2020-01-29

• concepts: (Coronavirus, China Southern Airlines),

event/class: air transport, sentiment: 0.0000

Passengers arriving on a China Southern Airlines flight from Changsha are screened coronavirus.

• concepts: (Coronavirus, Johnson & Johnson),

event/class: health and safety at work, sentiment: 0.1364

Johnson & Johnson today announced that it is mobilizing resources at its Janssen Pharmaceutical Companies to launch a multi-pronged response to the novel coronavirus (also known as 2019-nCoV or Wuhan coronavirus) outbreak.

• concepts: (Coronavirus, Johnson & Johnson),

event/class: medical service, sentiment: 0.1833

Johnson&Johnson’s multi-pronged approach includes a review of known pathways in coronavirus pathophysiology to determine whether previously tested medicines can be used to help patients survive a 2019-nCoV infection and reduce the severity of disease in non-lethal cases.

2020-01-30

• concepts: (Coronavirus, Inovio Biomedical Corp.),

event/class: scientific research, sentiment: 0.6333

Source snippet extraction: Dr Kate Broderick, from Inovio Pharmaceuticals in Pennsylvania thinks her team is the closest to developing a vaccine to protect against the Wuhan coronavirus.

• concepts: (Coronavirus, Inovio Biomedical Corp.),

event/class: health treatment, sentiment: 0.3000

Source snippet extraction: Early in the week, the Wall Street Journal reported that shares of companies including Inovio Pharmaceuticals, Moderna, and Novavax were surging as the coronavirus spread further.

• concepts: (Coronavirus, Gilead Sciences),

event/class: medical research, sentiment: -0.4500

Source snippet extraction: Gilead Sciences is reviewing its experimental Ebola drug for potential use against the coronavirus.

• concepts: (Coronavirus, Hoffmann-La Roche),

event/class: international relations, sentiment: -0.0200

Source snippet extraction: Roche does not anticipate rising business from its coronavirus tests.

• concepts: (Coronavirus, Hoffmann-La Roche),

event/class: medical service, sentiment: 0.4932

Source snippet extraction: Swiss drugmaker Roche Holding (RHHBY) said it will introduce the first commercial test for the new coronavirus. A development could speed up testing of patients.

• concepts: (Coronavirus, Meridian Bioscience),

event/class: livestock farming, sentiment: 0.0000

Source snippet extraction: Meridian Bioscience announced that its freeze-dried Lyo-Ready 1-Step RT q-PCR mix is being used in the coronavirus outbreak in China.

• concepts: (Coronavirus, Johnson & Johnson),

event/class: scientific research, sentiment: 0.1000

Source snippet extraction: Skunkworks’ at Johnson & Johnson is rushing to develop a coronavirus vaccine by Sam Wood.

2020-01-31

• concepts: (Coronavirus, Gilead Sciences),

event/class: pharmacology, sentiment: 0.0833

Source snippet extraction: Gilead Sciences Inc (GILD.O) said on Friday it provided its experimental Ebola therapy for use in a small number of patients with the Wuhan coronavirus and is working with China’s health authorities to set up a trial to test the drug against the virus.

• concepts: (Coronavirus, Gilead Sciences),

event/class: health treatment, sentiment: -0.0625

Source snippet extraction: Gilead Sciences Inc (GILD.O) said on Friday it provided its experimental Ebola therapy for use in a small number of patients with the coronavirus that has killed over 200 so far in China and is working with the country’s authorities to set up a study.

• concepts: (Coronavirus, Johnson & Johnson),

event/class: scientific research, sentiment: 0.1000

Source snippet extraction: After Father Quarantined for Coronavirus Projects to develop a vaccine, which are part of that process, have launched across the world at research institutions in the U.S., U.K., China, Belgium, Germany, Russia and Australia, as well as biotech companies such as Inovio, Moderna and the pharmaceutical giant Johnson and Johnson.

Conclusions

Identifying exposed companies to a trending concept (or event) gives us the possibility to perform further analysis on the effects these exposures could have on the financial system; such exposure measures return a negative or positive factor model against those detected companies and their stocks. Similarly, we can perform the same analysis using bulk data-feeds that provide all the factors in a format suitable for machine analysis and interpretation.

Yewno|Edge

We want to show the breadth and width of data that an analyst would have access to through our Yewno Edge platform in the Coronavirus case study. We can have an overall outlook of the portfolio of companies that are the most exposed to Coronavirus, as shown in Figure 4.

Figure 4: The most exposed companies to the concept Coronavirus

We can check what other concepts or companies are exposed to our portfolio, as shown in Figure 5.

Figure 5: Portfolio exposures

Yewno Edge can also help us in comparing the exposure values with the stock prices and sentiment for a specific company and/or concept. Figure 6 shows the evolution of exposures between Inovio, one of the highly exposed companies, and coronavirus. We can observe an increasing trend in exposures as well as the positive sentiment values, likely due to the announcement of successful trials against the virus. We can see that Yewno was able to detect Coronavirus as a top exposure and positive sentiment since Jan 21!

Figure 6: Exposures and Sentiment (Inovio vs. Coronavirus)

In addition, Yewno Edge can provide evidence of such high exposure through our Key Development module, shown in Figure 7.

Figure 7: Key Developments between Inovio and Coronavirus

Yewno Data Feeds

Similar analysis can be performed programmatically by having access to Yewno Concept Exposures and Sentiment Data Feeds. Figure 8 below shows an analysis where we can observe how exposures and sentiment are acting as predictive signals for the price of the exposed stock, in this case Inovio Pharmaceuticals, Inc.. We can observe that an increase in the exposure as well as a change in the sentiment was detected days before the stock actually reacted to these events, showing an incredible power of the patented Yewno Knowledge Graph technology [4-5]!

Figure 8: Exposures and Sentiment (computed with respect to Coronavirus) and open prices of Inovio Pharmaceuticals, Inc.

References

[1] Palash Goyal, Emilio Ferrara. “Graph Embedding Techniques, Applications, and Performance: A Survey”. Knowledge-Based Systems, vol. 151, pp. 78-94, 2018, doi: https://doi.org/10.1016/j.knosys.2018.03.022

[2] Bryan Perozzi, Rami Al-Rfou, Steven Skiena. “DeepWalk: Online Learning of Social Representations”. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, ACM, pp. 701–710,  New York, New York, USA, 2014, doi: https://doi.org/10.1145/2623330.2623732

[3] Grover, Aditya, and Jure Leskovec. “node2vec: Scalable feature learning for networks.” Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855-864. 2016.

[4] Ruggero Gramatica. “Information network with linked information nodes”. US10025862B2

[5] Ruggero Gramatica, Haris Dindo, “Structuring data in a knowledge graph”. US10528871B1