M.R. Temple-Raston, PhD, Chief Data Scientist at Decision Machine, Precision Alpha and Brandthro
A pandemic can be compared to a hurricane. It is a force of nature with effects we can only hope to avoid. Pandemics have been studied throughout history, as have hurricanes, typhoons and cyclones. Through our shared experience and the science of hurricanes, we now know a great deal about hurricanes. They are at their worst when the eyewall, the eye’s boundary, passes overhead. The lower the pressure in the eye, the fiercer the storm. After a day or perhaps two, the hurricane passes over, and further along its impact weakens as energy is drawn out of the storm and not replaced with energy from the warm waters that fed it. But, unlike a hurricane, we cannot see a pandemic’s physical force against structures, feel the changes in pressure and humidity, or hear its winds and lashing rain. What can we know about pandemics, and what can we (quickly) learn from the global data we are gathering? And how can we use this information to speed our economic recovery?
This article proposes to use the global data available to us through our web browsers, humanity’s highly developed science and considerable computational power to improve on existing rules-of-thumb for infectious diseases, and, with what we learn, see how it could be applied to our recovery.
The Death Toll Data
Comprehensive testing for a disease is packed with difficulties. First, foresight and effective action is required to deploy testing and protocols before the infection begins to spread. Even then, testing is limited by the sensitivity and specificity of the tests, and the operational expertise of the testing program. How many false negatives are we getting? How many false positives? Are we reaching and testing the right people? For positive results, do we perform comprehensive “contact tracing” to test all people that have been in contact with the positive result? If we are late to the game, then whatever we discover in testing is only the tip of the iceberg. Without comprehensive and proactive Covid-19 testing, confirmed cases can tell us that a wave is going to hit, but not the size.
In cases where the infection is already spreading in the population, unlike confirmed case data, deaths data can be counted with high confidence and used to our benefit. With mortality in hand, a good picture of the size of the infection can be calculated. But like a hurricane, the energy of the disease can change with time. So, while the mortality is assumed to be constant, the energy driving the disease and the size of the infection is most certainly not constant. Like a hurricane, we need to understand the source strength of the disease.
This section presents how machine learning on daily death toll data can be used to create a Covid-19 Geiger Counter. A Geiger Counter measures the source strength of radioactive material (indirectly); we intend to do essentially the same thing, measure the source strength of the Covid-19 disease (indirectly, through daily death toll). If you can picture a Geiger Counter, and would rather understand how it might be used for economic recovery, then you can skip this section with no harm done. The rest of this section will get a little technical.
Machine Learning at its very foundation integrates measurement data and states with the mathematical machinery of science to express probabilities, classifiers (ratios of probabilities), and so on. In some problems, the probabilities can be calculated exactly, in other more complex problems or for convenience, the probabilities are estimated numerically. In our problem, which counts the death toll and has a single variable for the mortality, the probabilities can be solved exactly, as can the disease source strength and the daily infection rate. Instead of creating a generic model with arbitrary parameters to fit the data, we create a detector designed for the problem at hand with no arbitrary parameters. Effectively, we design and create a Geiger Counter for Covid-19.
Our objective is to use the Covid-19 Geiger Counter on the daily death toll to measure the source strength of the disease in the population, and the inferred infections per day that agrees with the observed death toll with a conservative 3% mortality. But how important is the mortality value of 3% to our conclusions? Let’s say the mortality is less than 3%, which would sound like good news. Then to align with the observed deaths, the source strength and the number of people infected would need to be much larger than what we would report for 3%. So, reducing the mortality should not be viewed as good new. In fact, it could be viewed as alarming and should be changed only with compelling scientific evidence. However, it is important to understand that the life-cycle phases to the pandemic that we introduce below to signal phases of economic recovery would not change with different mortalities.
First, the Covid-19 Geiger Counter can be used to evaluate the effectiveness of public health measures and help inform when and how they can be relaxed so that we can return to work, and to the company of our extended family, friends and communities. Our analysis identifies four life-cycle phases to the pandemic to help plan our return:
- Capture early indications of ebbing infection source power. This is our first indicator that our communities are winning the battle. While infection rates will continue to increase, the disease energy will begin to fall. This is also the first indication that we are bending the source strength and infection curves down. This is a critical measure for hospitals and support personnel (EMTs, Police, Fire, and so on), as it implies that the backlog of infected people can be spread more easily over our limited hospital and public resources.
- Respond early to any infection resurgence. There is always a possibility of resurgence. Our behavior is what is winning the battle. If our resolve weakens, then our hospitals and community support clean up the mess.
- Report a peak to the infection rate (inferred daily infections). At this point, the daily infections will start to go down. We can now start putting in place plans to loosen the public health measures we have taken. The loosening can begin if the daily infections continue to drop for ten days.
- Monitor the power of the infection source post-peak for resurgence. Public health policy measures are withdrawn, and the region monitored.
Second, as the shape of the pandemic becomes visible and certain regional milestones are achieved, regional social and economic recovery opportunities become possible. Instead of speculating or delaying, social and economic recovery can be scientifically anchored to a region’s improving public health conditions, to address the needs of communities through funding and support as soon as possible, with reduced risk and increased likelihood of an effective result.
Third, and finally, since the analysis is quantitative, CV19-GC measurements can be taken as an alternative data source that can enrich existing regional financial models, including machine learning models. For example, Decision Machine has an existing financial markets product (Precision Alpha), that uses non-equilibrium behavior in financial markets to identify large alpha opportunities in 85+ global financial markets by calculating non-equilibrium probabilities exactly. Precision Alpha can take CV19-GC data about improved health conditions to calculate conditional probabilities that a regional stock on any global exchange will go up (or down). Those probabilities can in turn be used to inform risk models to justify investment. This formal approach to justify financial investment, however, understates the opportunity as a “correlation”. Improving regional health conditions is much more than a “correlation”. It is a “cause-and-effect”. Still, investment dollars want to know where best to be put.
What are we currently seeing?
Decision Machine has built a Covid-19 Geiger Counter (CV19-GC). With raw data from Johns Hopkins and the New York Times, Decision Machine has used the Covid-19 Geiger Counter to analyze the pandemic:
- Country by country (Johns Hopkins global deaths file),
- US county by county (New York Times US county deaths file).
At the time of writing, we are in New York City and have seen the first week or two of the contagion. Yesterday, New York City’s daily death toll jumped by over 250% (March 28th). Decision Machine is reporting the results of our machine learning analyses on Covid-19 Geiger Counter (CV19-GC) for an assorted group of countries, and the New York City area. We hope to expand our coverage to other communities. From the CV19-GC we see the first evidence that Iran has reached the first life-cycle event (LC-1): the disease source strength energy is ebbing. We believe that we are beginning to detect similar LC-1 signals for Italy. Of course, we must be always on-guard for resurgence through constant monitoring.
A full life-cycle analysis can now be done for Hubei, China. Based on our CV19-GC results from Hubei, 40 days now seems a prudent number for self-isolation, and not overly cautious.
In Hubei province, China (capital Wuhan) the peak disease source strength developed about a month after the infection took off (roughly January 25th in our data). Adding another 10 days after the peak infection ensures that the trend is established. In fact, Wuhan waited yet an additional two weeks before they loosened public health policy a few days ago.
South Korea has been very effective in testing and contact tracing, but the energy has still found a higher level in the last week or so (see figure). Given the timeline seen for Hubei, we expect to see an energy drop for South Korea this week or next that would signal the LC-1 phase. South Korea should see peak infection by April 15th and be able to ease public health policy by the end of April.
The word “quarantine” is a mid 17th century Italian word that means forty days of isolation. So, based on human experience, there is already a rule-of-thumb for the length of time to isolate from disease. From the Hubei, China Covid-19 data we observed that the 40 day rule-of-thumb is a reasonable duration to self-isolate in this case, too. We go further. We show how to use data science to better understand the outlines and life-cycle of the Covid-19 pandemic, to be able to say when the storm has passed, and to be able to identify where and when to mobilize for recovery. We hope to expand our unbiased and purely scientific analysis to other communities in the United States and round the world.