by Bernardo Bravo, Senior Data Scientist, Yewno,
and Ali Limon, Senior Data Scientist, Yewno
- About Yewno
- Yewno Knowledge Graph
- Content Universe
- An Evolving Data Framework
Graph Embeddings And Exposures: A Multidimensional Data Framework
- Graph Embeddings
- Sentiment: Quantifying positivity/negativity of connections in the Knowledge Graph
- Sentiment: Trend Discovery and Prediction
Sentiment to ‘Covid-19’ Concept
- Sentiment Data Retrieval Using Yewno’s API
- Sentiment of Companies Over Time
- Finding the Most Positive and Negative Companies
- Case Studies: Sentiment vs. Stock Performance
- Additional Case Studies: Sentiment vs. Stock Performance
Conclusion on ‘Covid-19’ Sentiment
Sentiment data is particularly useful for examining trends over time. Broadly speaking, sentiment data becomes valuable if it allows us to identify trends as they develop early on, gauge the evolution of trends over time, and use trailing sentiment to understand longer-term stock price movements. This whitepaper demonstrates how Yewno’s sentiment data feed provides all the data necessary to uncover important trends and inform business outlooks.
For the purpose of this paper we performed a variety of examples and case studies showing how sentiment related to companies in the context of Covid-19 served as a predictor of trends over time. The reason for using Covid-19 is multifold. First, it was a widely unknown and unpredictable trend. Second, it would prove pervasive economically across industries and geographies, giving us ample data to test and backtest in order to determine whether or not the sentiment data was predictive overall. In the case of pharmaceutical companies and travel, we found particularly high correlations between sentiment and stock price. For example, we were able to see stock prices of pharmaceuticals highly exposed to Covid-19 sentiment rise well before the approval of any vaccines.
Overall, sentiment data provides a more comprehensive pulse of key developments in the world allowing investors and risk-managers to note hidden trends as they begin to develop and position their investments accordingly. The Travel and Tourism industry, whose business fundamentals are highly tied to ‘Covid-19’, saw sentiment was often a lagging indicator of directional stock price movements.
Sentiment analysis can be extrapolated to new concepts, enabling investors to better understand business drivers, performance, and underlying risk. In this paper we will explain Yewno’s Technology, highlight the underlying data, and demonstrate how our sentiment scores are calculated. We will then apply sentiment analysis to the concept of Covid-19, highlighting which companies and industries were most affected, and how exposure to new concepts and trends can be predictive in the future.
Yewno’s mission is to extract and organize valuable insights from an overwhelming quantity of unstructured data. We are building the next generation Knowledge Graph which helps people to overcome the information overload problem, and to research and understand the world in a more natural manner. In contrast to classical information-retrieval engines, rooted in theoretical computer science, our approach is inspired by the way humans process information from multiple sensorial channels and it leverages state-of-the-art Computational Linguistics, Network Theory, Machine Learning, as well as methods from the traditional Artificial Intelligence.
At the core of our technology is the framework that extracts, processes, links and represents atomic units of knowledge – concepts – from heterogeneous data sources. A Deep Learning Network continuously “reads” high-quality sources and projects concepts into a multi-layered and multi-dimensional Embedding Space where similarity measures are used to group together related concepts along different dimensions (semantic, syntactic, just to name a few).
Yewno Knowledge Graph
Thanks to these techniques, a graph-like network is induced and advanced tools from the field of complex networks are utilized to extract insights and detect emerging properties. Different sources of data (e.g. news, patents, stock prices, etc.) and different similarity metrics (e.g. above mentioned semantic, syntactic, factual) yield different networks. These networks are collectively called Yewno Knowledge Graph (YKG). Additionally, Yewno’s approach explicitly addresses the evolution of the Knowledge Graph over time and extracts insights by analyzing not only the nature of the interconnected concepts and communities, but also their temporal dynamics. Figure 1 shows the snapshot of our Knowledge Graph at a particular point in time.
Figure 1: A snapshot of the Yewno Knowledge Graph. Note: only a limited number of connections per concept are shown.
As mentioned earlier, layers of Yewno Knowledge Graphs are induced from a plethora of sources listed in the following. Where otherwise specified, we provide a 5 years history of related data:
- Official Filings
- Clinical Trials
- Judicial Documents
- Scientific Publications
- (Intraday) Stock Prices
Each source above undergoes a number of processing steps implemented within the Yewno Intelligent Engine that eventually culminate with the creation and continual update of one, or more layers of the Knowledge Graph. It is worth noting that different sources exhibit intrinsically different update patterns of the Knowledge Graph, from news being updated in real-time, to official filings updated quarterly. Most other networks are updated on a daily basis.
An Evolving Data Framework
Yewno’s Knowledge Graph is a powerful and dynamic representation of knowledge across a vast corpus of documents, and evolves in real time. It is interpreted by an Inference Engine which detects and explains the relationships and changes in connections over time.
Connections amongst represented concepts can be traced back to the original source document, down to the sentence and their ontological class. This methodology allows for the analysis of new information on an intraday basis, providing anomaly detection, novelty and relevance scores.
Graph Embeddings And Exposures: A Multidimensional Data Framework
The dynamic nature of the Yewno Knowledge Graph enables the usage of advanced machine learning algorithms to extract hidden insights that emerge as a consequence of network effects. Networks, such as supply chain networks, social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the financial dependencies, structure of society, language, and different patterns of communication, to mention a few. Many approaches have been proposed to perform such an analysis. Effective graph analytics provide users with a deeper understanding of what is behind the data, and thus can benefit a plethora of real world applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer from high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem . It converts the graph data into a low dimensional space in which the graph structural information and graph topological properties are maximumly preserved.
We have used Deep Neural Network embedding and DeepWalk techniques [2-3] that are unsupervised learning methods for building latent topological representations of graph vertices. Topological representations are hidden features of the vertices that capture neighbourhood similarity and community membership. The hidden representations encode topological relations (weights assigned to the edges) in a continuous vector space with a relatively small number of dimensions. Graph Embeddings take a graph as input and produces a hidden representation as an output. Graph Embeddings is an online algorithm, and it is also a scalable algorithm. It uses local information obtained from truncated random walks to learn hidden representations by treating walks as unsupervised training set. It is also suitable for a broad class of real-world applications such as network classification, and anomaly detection.
Figure 2: Embedding a graph into a lower dimensional metric space
Sentiment: Quantifying positivity/negativity of connections in the Knowledge Graph
Yewno sentiment combines a custom Deep Learning-based model with the Yewno Knowledge Graph to deliver concept-to-concept sentiment, where a concept can be a company as well. Sentiment scores are constructed by propagating the average concept sentiment score from each concept to other concepts to which the concepts are exposed.
Raw (level 1) sentiment is calculated on text snippets related to concepts of interest, taking into account the sentence structure and financial domain specific text
Concept(company)-to-concept (level 2) sentiment measures the polarity of how directly and indirectly connected concepts impact each other, two versions of the score are provided:
- Average Sentiment is the average of all sentiment scores of text snippets that mention both concepts over the reference window. This measure best captures the true sentiment between two concepts, but only exists when they are directly connected.
- Exposure Sentiment leverages Yewno Concept Exposures to propagate the sentiment of a concept to any other concept by weighing by the exposure score. This measures how concepts can indirectly impact each other.
Sentiment: Trend Discovery and Prediction
Sentiment data is particularly useful for examining trends over time. For instance, tracking the sentiment of a company to a concept like ‘sustainability’ may be important in understanding a company’s prospects in the future as companies increasingly are under pressure to meet certain environmental goals. Broadly speaking, sentiment data becomes valuable to analysts if it enables them to identify trends as they develop early on, gauge said trends’ evolution over time, and potentially use trailing sentiment to understand longer-term stock price movements. As the case studies below will highlight, Yewno’s sentiment data feed provides the ability to do a variety of analytics to uncover important trends and inform business outlooks.
Sentiment to ‘Covid-19’ Concept
Covid-19 which originated in Wuhan, China has defined most of 2020. The virus has had 81.9 million reported infections globally and 1.79 million deaths. The repercussions on financial markets and global economies continue to exist as new infections remain on the rise.
It is generally accepted that the only long term solution to eradicating the virus is the discovery of a viable vaccine. To date, Pfizer, Moderna, and Astrazeneca have all had their vaccines approved by federal regulators. As mass rollout and vaccinations proceed, the investment community continues to be hyper-focused on the idiosyncratic effect of ‘Covid-19’ on individual businesses. Mass amounts of new data and news coverage makes understanding various companies’ exposure to Covid-19 difficult, yet important. The following discussions provide some insight into how Yewno’s sentiment data derived from the Yewno Knowledge Graph reveal some interesting company-specific insights.
Sentiment Data Retrieval Using Yewno’s API
Extracting sentiment data from Yewno’s Concept Exposure and Sentiment API is quick, and simply requires a few search parameters to define the data scope. Below is a screenshot of the API request with a brief description of the output dataset used in the following analysis.
The output dataset are sentiment scores derived from news sources, spanning from 2020-04-30 to 2020-12-29. The concept of interest was ‘Covid-19’ (cd8d87ad8fbc347df01d4036d0f0e6b1) and the output dataset had ~1.6 million rows. Cleaning the data required keeping rows that had concepts involving public companies. These public companies were then used for the analysis below.
Sentiment of Companies Over Time
Given the sentiment scores of a myriad of companies over time, simple Python scripts enable us to plot the sentiment of various companies in relation to ‘Covid-19’ over the time span. Below we include the 25 companies with the most sentiment data, and provide sentiment scores using a 1-week, 1-month, and 3-month moving average. It is important to note the level of volatility in sentiment for 1-week, 1-month, and 3-month moving averages. 1-week moving averages show more fluctuation which may be useful for quantitative hedge funds looking for short-term trading signals. On the other hand, 3-month moving averages are smoothed which may be useful for long-term investors to treat trailing sentiment as a trend to better understand risk, headwinds/tailwinds, and generally do more robust due diligence.
Exhibit 1: Company sentiment score to ‘Covid-19’ concept over time (1 week moving average)
Exhibit 2: Company sentiment score to ‘Covid-19’ concept over time (1 month moving average)
Exhibit 3: Company sentiment score to ‘Covid-19’ concept over time (3 month moving average)
Finding the Most Positive and Negative Companies
Of particular interest to investors is pinpointing companies with the most positive and negative sentiment to various themes (green energy, 5G, artificial intelligence, Covid-19, etc.). Below we extract the 10 companies with the most positive sentiment and negative sentiment to ‘Covid-19’ based on average sentiment score over the time period 2020-04-30 to 2020-12-29.
Exhibit 4: Companies with most positive sentiment (1 month moving average) to ‘Covid-19’
Exhibit 5: Companies with most negative sentiment (1 month moving average) to ‘Covid-19’
Case Studies: Sentiment vs. Stock Performance
Below we select a few interesting case studies from the top positive and negative sentiment companies identified above to highlight unique insights as they relate to stock performance over the relevant time period. Specifically, we plot the 1-month moving average sentiment score in red, the company stock price in blue.
Exhibit 6: Dr. Reddy’s Laboratories sentiment score (1 month moving average) versus stock price performance
Dr. Reddy’s Laboratories (NYSE: RDY) is an Indian multinational pharmaceutical company located in Hyderabad, India. The graph above is highly interesting as sentiment during this period in relation to ‘Covid-19’ appears to be a strong lagging indicator of stock price. Further, in early July Fujifilm Holdings Corp said it was partnering with Dr. Reddy’s Laboratories to sell the anti-flu drug Avigan in India and elsewhere to treat COVID-19. Since then, sentiment has generally improved. While there is certainly no causality between the monthly average sentiment score and stock price, there appears to be a high correlation and indication that the sentiment score data can be used to understand thematic shifts that the investing community is focused on.
Exhibit 7: Biontech Se’s sentiment score (1 month moving average) versus stock price performance
Above we see another interesting example with Biontech SE (NASDAQ: BNTX). Biontech has partnered with Pfizer in creating the first US-based COVID-19 vaccine that just recently advanced to a FDA advisory committee meeting and was approved for rollout across the United States. In the time frame presented, sentiment towards ‘Covid-19’ has clearly been what investors are trading on, given the positive economic repercussions of mass producing a needed vaccine in the US and globally. What is particularly interesting is that in many time periods sentiment appears to be a lagging indicator of stock price movement. As such, a particularly compelling use case for Yewno’s sentiment data would be incorporating sentiment into a factor model for companies whose business operations are closely related to a given concept.
Additional Case Studies: Sentiment vs. Stock Performance
In addition to some of the most positive/negative companies flagged and presented above, the data enables us to self-select companies of interest to examine. The travel and hospitality industry is of particular interest as it was the hardest hit industry after travel restrictions were imposed. Specifically, we were curious to track sentiment of cruise lines, air lines, and travel booking companies in relation to their respective stock price performance. Below we provide a case study of one company from each of these industries.
Exhibit 8: Norwegian Cruise Line sentiment score (1 month moving average) versus stock price performance
Exhibit 9: Delta Air Lines sentiment score (1 month moving average) versus stock price performance
Exhibit 10: Expedia, Inc. sentiment score (1 month moving average) versus stock price performance
As with the two case studies on Dr. Reddy’s Laboratories and Biontech, the hand-selected case for travel companies selected above indicate a strong correlation between sentiment and stock performance which makes sense given that the impact of ‘Covid-19’ is driving the business fundamentals of these companies. What is particularly interesting is the high correlation (0.755) of sentiment score to stock performance for Delta Airlines providing indication that various metrics derived from Yewno’s Knowledge Graph can be particularly useful for understanding and building models to trade on specific equities. More broadly, it is interesting to note that in the months leading up to approval of Pfizers’ and Moderna’s vaccine in early December, sentiment for all companies was generally trending upwards. Overall, sentiment data provides a more comprehensive pulse of key developments in the world allowing investors and risk-managers to note hidden trends as they begin to develop and position their investments accordingly.
Conclusion on ‘Covid-19’ Sentiment
Yewno’s innovative Knowledge Graph technology analyses thousands of news articles on a daily basis, identifying relevant concepts such as Covid-19, Vaccine, or companies like Pfizer and Moderna. The graph representation of the concepts allows for the computation of exposures and sentiment scores providing relevant information for investors.
Specifically, identifying sentiment exposure of companies to a trending concept (or event) gives us the possibility to perform further analysis on the effects these exposures could have on the financial system; such exposure allows investors to better understand what is driving stock performance, and adjust their portfolios to reflect drastic changes in sentiment that may be apparent in various data feeds before certain catalysts. The use cases presented above highlight the ability of sentiment data to delineate trends as they develop. Further, for specific equities in the travel and tourism industry whose business fundamentals are highly tied to ‘Covid-19’, we saw sentiment was often lagging indicator of direction stock price movements. This analysis can be extrapolated to new concepts, enabling investors to better understand business drivers, performance, and underlying risk.
For our analysis above, we used 1 month moving averages to balance sentiment score flexibility with noise reduction. As seen in exhibits 11 and 12 below, using weekly moving averages incorporates a lot of noise in the data making trend delineation difficult. Contrastingly, exhibits 13 and 14 show that using 3 month moving averages makes capturing short-term sentiment changes difficult. As such, 1 month moving averages is a viable compromise.
Exhibit 11: Companies with most positive sentiment (1 week moving average) to ‘Covid-19’
Exhibit 12: Companies with most negative sentiment (1 week moving average) to ‘Covid-19’
Exhibit 13: Companies with most positive sentiment (3 month moving average) to ‘Covid-19’
Exhibit 14: Companies with most negative sentiment (3 month moving average) to ‘Covid-19’
 Palash Goyal, Emilio Ferrara. “Graph Embedding Techniques, Applications, and Performance: A Survey”. Knowledge-Based Systems, vol. 151, pp. 78-94, 2018, doi: https://doi.org/10.1016/j.knosys.2018.03.022
 Bryan Perozzi, Rami Al-Rfou, Steven Skiena. “DeepWalk: Online Learning of Social Representations”. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, ACM, pp. 701–710, New York, New York, USA, 2014, doi: https://doi.org/10.1145/2623330.2623732
 Grover, Aditya, and Jure Leskovec. “node2vec: Scalable feature learning for networks.” Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855-864. 2016.
 Ruggero Gramatica. “Information network with linked information nodes”. US10025862B2
 Ruggero Gramatica, Haris Dindo, “Structuring data in a knowledge graph”. US10528871B1