This post is for the data geeks in the house. It pairs with my previous post. The previous post explains at a high level why San Francisco’s contact tracing is broken. This post supports the previous post by digging into the data details of why I think that San Francisco’s testing contact tracing is currently broken.

What is contact tracing and why should you care? In brief.

To understand this post you will need to know the basics about contact tracing which is summarized in the picture below. I’ve added a few more steps into the cycle than you might normally see because those steps will be important when we examine San Francisco’s dataset.

The previous post talks about when contact tracing effective. For the purposes of this post, know that contact tracing is effective when it moves fast, faster than the virus is spreading. If you catch the virus before it has had a chance to spread, you can break the chain of transmission. Also know that contact tracing is effective when it is catching a lot of people. The more people you can catch before they spread COVID-19, the better you will quash the virus.

Two key questions: If we knew how long the testing contact tracing loop was taking, and /or if we knew how many people the contact tracing effort was reaching, we would have an indication as to whether San Francisco’s contact tracing effort is working.

You should care whether San Francisco’s testing and contact tracing cycle is functional, because a broken cycle means that San Francisco will be continually going in and out of various stages of lockdown / social distancing. This social distancing dance is frustrating and annoying to everyone. Without effective testing and contact tracing the only control we currently have over this virus is social distancing, and we definitely won’t be moving beyond our current reopening state.

Explanation of data categories

All of San Francisco reopening metrics looked good in June, yet San Francisco was still having a surge. Even now at the start of August, apart from the new case count, San Francisco’s metrics look relatively good. I started scouring San Francisco’s website looking for data to tell me what was going on and I found the following hidden gem.

This simple bar chart had some categories that are very relevant to contact tracing.

Contact with a known case: This indicates that the person who tested positive came in contact with someone who they knew was positive. Presumably then they got their case of COVID-19 from this known contact.
Community Contact: This indicates that the patient who has COVID-19 doesn’t know where they got their infection from.
Unknown: This means that the COVID-19 patient hasn’t been asked the question about where they got their infection from, and we can’t place the source of transmission into either of the other two categories.

If a contact tracer closes the contact tracing loop successfully, then the transmission category of this new case will show up as contact with a known case. If we just knew how the prevalence of these categories were changing over time.

The beauty of SF’s open data model

The beauty of San Francisco’s open data source is that they make available to everyone the data behind this simple bar graph, and it turns out that the data behind this simple bar graph is rich. If you follow the data source back to the following web page and download the file, you find that each of these categories has an entry for every date since March. The date in the dataset is the date that the specimen was taken (ie the date that the person’s nose was swabbed.) With the addition of this date information, we can start to get at answers to how SF’s contact tracing effort is going.

So let us graph this dataset and start asking questions.

The thin lines above the thick lines roughly indicate the amount of uncertainty in the thick line. They are the totals that occur when you add all of the unknown case categorizations to either the community spread, in the case of the red line, or a… — The thin lines above the thick lines roughly indicate the amount of uncertainty in the thick line. They are the totals that occur when you add all of the unknown case categorizations to either the community spread, in the case of the red line, or add all the unknowns to the known case categorizations in the case of the blue line.

First observation- Community transmission is going up.

In the first of these plots, the red line is community transmission, and this line indicates that the rate of community transmission is going up. Of course, it is not good for us to have a surge of cases in SF, but it is even worse to have a surge of cases and not know where they’re coming from.

Second observation- The growth in community transmission is even faster than the growth in known cases.

Back in May and early June, when San Francisco had a better handle on the pandemic, the number of new cases in each of the transmission categories was about equal. Unfortunately, since then the rate of community transmission the blue line is growing even faster than the red line which is the number of transmissions from known contact. To see how these categories grow at different rates, we plot the ratio of community vs known transmission. This curve has grown since late May and early June. San Francisco contact tracing effort is definitely overwhelmed if the growth in the known contact transmissions isn’t keeping up with the growth of community transmissions.

To summarize a surge is bad, a surge of unknown origin is even worse, and a surge that causes you to lose ground is even worse than that. This answers the second of the key questions I started this post with. San Francisco is not reaching as many people as it needs to to make its contact tracing effective.

More tricks with the dataset- how to use changes in the dataset over time.

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Those static graphs above are not the only things we can do with this dataset. I also noticed that this dataset is changing and evolving over time. For you to understand this next section of analysis, I need to explain the basics of how the dataset is evolving. The easiest way for me to do this is to animate the dataset. Ever since July 16th, I have been collecting a snapshot of this data everyday. I currently have 19 snapshots of this data which I have animated in a stacked bar graph.

You might want to play this animation more than once. Paying attention to different aspects of the animation each time. This first time you play the animation, focus on how long the lag is between the newest date and the date that the bar graphs that are static. The second time through, pick a single day and focus on the total height of the bar values in that single specimen date and how that grows. Quantifying these two observations lets us put a number to how long it takes a test result to be returned to the city.

Watch the animation a third time. This time, again, focus on a single specimen date. Instead of focusing on the total height of the bar graph, focus on the way in which the large unknown (yellow) values eventually sort them selves out into the community and known categories. The movement of a case from the unknown category into one of the other two categories means that a contact tracer has reached out to the COVID-19 patient and has gotten at least this little bit of information. Quantifying this observation gives us an indication of how long the contact tracing delay is.

The evolution of this dataset allows us to roughly answer the first key question I asked at the start of this post. Is SF doing its testing and contact tracing fast enough?

Quantifying the testing and contact tracing delay. Currently way too long.

For every specimen date, I quantified how long it took on average for results to be returned for that date. This weighted average delay is the large blue ball on the graph below. Since San Francisco has a green goal of reaching 90% of the people that test positive for COVID-19, I also asked the question how long did it take for the bar on this specimen date to fill up 90% of the way towards full. This answer is the small blue ball.

Similarly for every specimen date I quantified how long it took on average for results to be sorted into community and known contact categories. This weighted average is the large red ball and is the length of time between the specimen being taken and the contact tracer starting to get information from the person carrying COVID-19. Again, because just reaching the average number of people isn’t enough, and because San Francisco and the Bay Area have set themselves an ambitious 90% goal, I asked how long it took for the red and blue bars together (the community and known case transmission categories) to reach 90% of the eventual total height. This delay is represented by the small red ball. Sometimes the number of unknown cases for a particular specimen date remains too large, and not enough cases are correctly categorized to reach the 90% threshold. In this instance, no small red ball is plotted.

Remember that a good turn around time for the entire contact tracing loop is three days, ideally less. In this plot, you can see that even the average delay of 5 to 6 days just for test results is well beyond the target for the entire loop. The 90th percentile value for testing return is disastrous hovering near 10 days. Although, as I write this post it is starting to drop. However, I tend to distrust the most recent numbers as those are the ones that are still updating.

On average, the contact tracers do seem to be relatively efficient at their job with the large red balls typically being within a single day of the test result delay. The 90th percentile of the contact tracing delay (the small red balls) is disastrous, because the test delay is disastrous. Once you are having tests take more than a week to come back, then contact tracing is mostly useless, and I would not really fault contact tracing staff for abandoning those cases. Still having so many unknown cases is not a good look.

Recap

The metrics San Francisco currently has up on its website are inadequate to gauge whether San Francisco is prepared to making progress in fighting COVID-19. These metrics are inadequate because despite their good standing back in early June, San Francisco had a surge in cases.

I have shown that San Francisco’s testing and contact tracing loop is broken because

community spread is going up not down
the growth in community spread is faster than the growth in spread through known contact
the testing delays are longer than the time the entire contact tracing cycle should take.

In the process of quantifying these problems, I have created a couple of new metrics I am going to keep an eye on:

Is community transmission going up or down
Is community transmission growing faster or slower than transmission by known contact
what is the testing delay
what is the contact tracing delay.

An archeological dig into a dataset and what that tells us about the state of testing and contact tracing in San Francisco.