The Impact of Data Mining on Information Disclosure by Regulatory Agencies: With an Application to Redlining Article
Date of Publication:
Recommended Citation
Will Bunting, The Impact of Data Mining on Information Disclosure by Regulatory Agencies: With an Application to Redlining, 56 Harv. J. on Legis. 355 (2019)Clicking on the button will copy the full recommended citation.
Data mining techniques can be used to locate statistical outliers that are incorrectly characterized as evidence of unlawful conduct. Using home mortgage loan data made publicly available by financial regulators, a simple data mining exercise finds that approximately three percent of all lender-MSA pairs (or approximately seven to nine percent of all lending institutions) flagged as having redlined minority neighborhoods is attributable to a failure to correct for the multiple hypothesis testing problem. The false positive rate does not fully explain, however, the estimated high frequency of statistical redlining. Three possible models of information disclosure by regulatory agencies are considered: (1) full information, (2) no information, and (3) limited information. Under a limited information model, litigation serves to correctly implement statistical hypothesis testing: a plaintiff must formulate a hypothesis prior to examination of the data and obtains the information necessary to test this hypothesis only through discovery.