How Well Can Computers Connect Symptoms to Diseases?

Friday, 10 January 2020

Rob Matheson, for MIT News:

The team analyzed how various models used electronic health record (EHR) data, containing medical and treatment histories of patients, to automatically “learn” patterns of disease-symptom correlations. They found that the models performed particularly poorly for diseases that have high percentages of very old or young patients, or high percentages of male or female patients – but that choosing the right data for the right model, and making other modifications, can improve performance.

We are still in the very early stages of medical AI, but we need to start somewhere. This decade has all the indicators to be AI-focused.

Choices in the dataset-creation process impacted the model performance as well. One of the datasets aggregates each of the 140,400 patient histories as one data point each. Another dataset treats each of the 7.4 million annotations as a separate data point. A final one creates “episodes” for each patient, defined as a continuous series of visits without a break of more than 30 days, yielding a total of around 1.4 million episodes.

Computers do things uncountably faster than us. Bringing this amount of data together is remarkable.

I see this data in my mind like The Matrix.