Webinar Part 1:

Webinar Part 2:



High level overview of the research and findings

The key aim of the research was finding interesting alternative data sets with alpha generating signals. The research focused on finding these datasets and using Machine Learning to construct portfolios. Another key responsibility was actually onboarding new datasets. With Yewno data, I focused on finding the standalone value of individual concepts (to use a signal, or factor in investing).


How would a Quant Manager turn Yewno Concept Exposure Data into actionable insights for investing in their models?

In turning Concept Exposure Data into actionable insights, the first challenge is how to onboard a new dataset. Next, selecting concepts to determine which signals to test. Once the concepts are selected, the Quant Manager needs to determine what scores to use and what time frame to test. Once these things are determined, a Quant Manager can begin feeding the signals into a model and testing for predictive power.


Why is machine learning becoming more important for generating alpha?

Machine learning is becoming more important for alpha generation because it is simply impossible for humans to test all signals given vast amounts of data. Machine learning provides computational power to do this when humans cannot. Machine Learning also enables investors to leverage nonlinear and complex patterns. In my research, Boosted Tree models were used which find patterns difficult to discern. This allows investors to re-run models and create dynamic trends. You could also combine models and find synergies across models, which reduces biases in decision making.


What was the most surprising finding?

The most surprising finding was that the Concept IDs could be standalone signals. For example, companies exposed to concepts such as ‘partnership’ outperformed the broader market. There was also economic and financially sound reasoning of why certain concepts were strong signals. Another interesting finding was that certain signals had significance in different time horizons. For example, ‘securities fraud’ was important in the short term, but not long term. In the longer term, fundamental concepts like ‘assets’ and ‘earnings’ proved to be more important.


How did you first hear of Yewno’s AI technology, and how did you come to use it for your research?

The first time I encountered Yewno and their AI technology was at an alternative data conference. As a quant researcher, my job is to source new information and potential sources of data/innovation. The data was interesting because it was built on graph knowledge, which is a strong interest of mine.


Could you break down your process for onboarding the Yewno data and your methodology?

First, topics for stocks were generated, and then those topics were used to select concepts for further analysis and research. Primarily personal domain knowledge was used to select the topics used for concept generation.


How does Yewno’s data differentiate from some of the data sets you have seen in the industry?

Yewno has a new and different ambition than other datasets in the industry. Their mission is to model how the human brain processes information and creates and maps new links using knowledge graph technology. As such, the nature of the data is different and captures maps of information.


How were you able to find the concepts that provided the strongest signals? What methodologies can Yewno implement to identify the concepts that carry these signals?

Yewno’s database is very extensive – there are millions of concepts and about 30 were selected. In doing this numerous factors were taken into account. Here are some take-aways for a beneficial selection process:

  • For those that use systematic strategies: Access your own domain knowledge to select topics that you think would be most relevant, then use NLP methods to find concepts and concepts relevant to events. Yewno helps in this because if you have a concept in mind you can augment that concept and find concepts that are exposed to it.
  • Focus on only certain scores: Depending on the data, use case and the companies covered, you might find some factors more useful than others. I used Centrality, Similarity and Aggregated Scores in order to achieve maximum coverage and a better cross section, since they are based on higher order connection in the graph.
  • The type of data is also important: I used a 2-day rolling lookback window which allowed the concepts to be very dynamic. In addition, I focused on only financial news, where there are much more publications (around 12,000 articles a day) compared to other content types where the information is more infrequent.


Did you incorporate transaction costs and liquidity constraints in your exercises and is the signal from the data still strong after considering liquidity and costs?
I kept the transaction costs off, but applied a liquidity filter to get around 300 of the top stocks in the S&P 500. Signals amplify when you go in mid and small cap, so I wanted to first test in a very efficient environment.


Can you explain what are the benefits of training your models using the tails of the data?

There are three benefits of training models using the tails of data, which are:

  • The ability to capture more pronounced movements, which happen to be in the tails
  • Training on tails reduced computation time, since only a subset of the data is used
  • Generally training on tails gives better out-of-sample results


Can you summarize why it is so important to include concept based approaches for Quants going forward, and why those not adopting this approach will be left behind?

Some of the benefits of including concept based approaches are that you will have a broader choice of concept/topics, the ability to model according to alpha/semantic themes, and go beyond text parsing where attempting to understand “text”. Finance has been moving intuitively in this direction for a decade, and will only continue to do so.


As we round out 2020, we look forward to the new year. What trends will you be watching for 2021 and beyond that leverage the power of AI?

Knowledge Graphs as a whole could be blended beautifully with deep learning and genetic algorithms, which is going to be more of an industry focus for 2021. 


What data sets would you be curious to test next? 

I would like to test Yewno’s patent data alongside and as a complement to the concept exposure data in the models that I’ve built. 


What limitations is the finance industry still facing in harnessing data for actionable use? 

Technology is driving change in all aspects of our lives. The same thing applies to investing so it does not matter if you are talking about quant investing, quant signals or quant strategies (as everyone is blurring them) – technology can have a huge impact. Technology can really help quantitative investing by allowing investors to identify more data points. In order to embrace technology you have to be open-minded, but some parts of the finance industry including in the quantitative space are slow to change. 


Do you have any ideas as to how these issues could be addressed?

In order to drive change, we need a community to promote and share knowledge. Without this community, there is going to be less innovation. Legacy of system, legacy of politics, and too many egos which are in abundance in the finance industry are all hindrances to innovation. Knowledge Graph technology is a great example of innovation – not just from a risk perspective, but also from an investment (idea generation) perspective.

Leave a Reply