ILPD (Indian Liver Patient Dataset)

Death by liver cirrhosis continues to increase, given the increase in alcohol consumption rates, chronic hepatitis infections, and obesity-related liver disease. Notwithstanding the high mortality of this disease, liver diseases do not affect all sub-populations equally. The early detection of pathology is a determinant of patient outcomes, yet female patients appear to be marginalized when it comes to early diagnosis of liver pathology. The dataset comprises 584 patient records collected from the NorthEast of Andhra Pradesh, India. The prediction task is to determine whether a patient suffers from liver disease based on the information about several biochemical markers, including albumin and other enzymes required for metabolism.

Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification

Attribute Type
--
# Instances
583
# Attributes
10

Info

This data set contains records of 416 patients diagnosed with liver disease and 167 patients without liver disease. This information is contained in the class label named 'Selector'. There are 10 variables per patient: age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos. Of the 583 patient records, 441 are male, and 142 are female. The current dataset has been used to study - differences in patients across US and Indian patients that suffer from liver diseases. - gender-based disparities in predicting liver disease, as previous studies have found that biochemical markers do not have the same effectiveness for male and female patients.


Introductory Paper

Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction

By I. Straw, Honghan Wu. 2022

Published in BMJ Health & Care Informatics

Provided by
University of California, Irvine


Creators
  • Bendi Ramana
  • N. Venkateswarlu

DOI

10.24432/C5D02C

Login to Download

New to AIM-AHEAD Connect?
Create an account!

Features

Attribute Name Role Type Demographic Description Units Missing Values
Age Feature Integer Age Age of the patient. Any patient whose age exceeded 89 is listed as being of age "90". years no
Gender Feature Binary Gender Gender of the patient no
TB Feature Continuous Total Bilirubin no
DB Feature Continuous Direct Bilirubin no
Alkphos Feature Integer Alkaline Phosphotase no
Sgpt Feature Integer Alamine Aminotransferase no
Sgot Feature Integer Aspartate Aminotransferase no
TP Feature Continuous Total Proteins no
ALB Feature Continuous Albumin no
A/G Ratio Feature Continuous Albumin and Globulin Ratio no
Selector Target Binary Selector field used to split the data into two sets (labeled by the experts) no