We’re confident that we’ve found substantially all the federal covid coverage cases, among other reasons because we have many federal cases in our database that are not coded as Covid-19 cases in Lex Machina. But we know that we don’t have all the state cases. The reason is simple. There is no PACER for state courts.
We now have what we believe is the first data-based estimate of the total number of state cases. That number is 696, which is 317 more than we have in the database. So there are nearly 1700 total covid coverage cases nationwide.
Our estimation method is crude, but defensible. It relies on the delay between the date a complaint is filed in state court and the date that the case is removed to federal court. That delay gives us the opportunity to find that state case using our usual state court methods. If we haven’t found the case by the time it’s removed, the odds are that we won’t find it through those usual methods (CourtWire, Courthouse News, Bloomberg, networking with lawyers).
We use the percentage of removed cases that already exist in our database when they are removed to estimate the percentage of total state cases that we have in the database. So, for example, if we have half of the removed cases in our database before they’re removed, that means there are twice as many state cases as we have in the database.
(A technical note: to make this estimate we rely only on cases removed after August 15, because we didn’t have a stable method of identifying state cases until early August.)
The accuracy of our estimate depends on whether we are as likely to find a state case that will be removed as we are to find a state case that will not be removed. One obvious way in which that assumption may be incorrect follows from the $75,000 threshold for diversity jurisdiction. It’s likely that the reporters who provide the raw material to CourtWire and Courthouse News regard smaller cases as less newsworthy, thus biasing the sample of cases that we hear about from those sources in the direction of larger cases, which are more likely to be removed because they will meet that threshold.
We’re eager to hear about ways to test these and other assumptions underlying our estimate.