Medical data signature extraction using modified TF-IDF in DataBridge project Public Deposited

Downloadable Content

Download PDF
Last Modified
  • February 28, 2019
  • Hu, Wei
    • Affiliation: School of Information and Library Science
  • This project is a part of the DataBridge project, where we try to find similar datasets among a large number of medical datasets stored in the DataBridge server using key words extraction and similarity algorithms. In this project, a sample of 1,000 datasets were randomly chosen from the 18,000 datasets corpus. Modified TF-IDF was used in the sample data to generate key words for the 1,000 datasets and similarity analysis was followed. According to the results, we find that the key words extraction works fine in calculating similarities between different datasets.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Rajasekar, Arcot
  • Master of Science in Information Science
Degree granting institution
  • University of North Carolina at Chapel Hill
  • 40
Deposit record
  • 30f8261f-89ed-44cf-9b2a-ab48eda760ef

This work has no parents.