Discovering collectively informative descriptors from high-throughput experiments

Jeffries, Clark D; Ward, William O; Perkins, Diana; Wright, Fred A.

Download PDF

Request Version for Screen Reader

Creator

Jeffries, Clark D
- Affiliation: Eshelman School of Pharmacy, Renaissance Computing Institute
Ward, William O
- Other Affiliation: NHEERL Environmental Carcinogenesis Division, United States Environmental Protection Agency, Research Triangle Park, NC, USA
Perkins, Diana
- Affiliation: School of Medicine, Department of Psychiatry
Wright, Fred A.
- Affiliation: Gillings School of Global Public Health, Department of Biostatistics

Abstract

Abstract Background Improvements in high-throughput technology and its increasing use have led to the generation of many highly complex datasets that often address similar biological questions. Combining information from these studies can increase the reliability and generalizability of results and also yield new insights that guide future research. Results This paper describes a novel algorithm called BLANKET for symmetric analysis of two experiments that assess informativeness of descriptors. The experiments are required to be related only in that their descriptor sets intersect substantially and their definitions of case and control are consistent. From resulting lists of n descriptors ranked by informativeness, BLANKET determines shortlists of descriptors from each experiment, generally of different lengths p and q. For any pair of shortlists, four numbers are evident: the number of descriptors appearing in both shortlists, in exactly one shortlist, or in neither shortlist. From the associated contingency table, BLANKET computes Right Fisher Exact Test (RFET) values used as scores over a plane of possible pairs of shortlist lengths 12. BLANKET then chooses a pair or pairs with RFET score less than a threshold; the threshold depends upon n and shortlist length limits and represents a quality of intersection achieved by less than 5% of random lists. Conclusions Researchers seek within a universe of descriptors some minimal subset that collectively and efficiently predicts experimental outcomes. Ideally, any smaller subset should be insufficient for reliable prediction and any larger subset should have little additional accuracy. As a method, BLANKET is easy to conceptualize and presents only moderate computational complexity. Many existing databases could be mined using BLANKET to suggest optimal sets of predictive descriptors.

Date of publication

December 18, 2009

DOI

https://doi.org/10.17615/wdqy-hj63

Identifier

https://doi.org/10.1186/1471-2105-10-431
20021653

Resource type

Article

Rights statement

In Copyright

Rights holder

Clark D Jeffries et al.; licensee BioMed Central Ltd.

License

http://creativecommons.org/licenses/by/2.0

Journal title

BMC Bioinformatics

Journal volume

10

Journal issue

1

Page start

431

Language

English

Is the article or chapter peer-reviewed?

Yes

ISSN

1471-2105

Bibliographic citation

BMC Bioinformatics. 2009 Dec 18;10(1):431

Publisher

BioMed Central Ltd

Access right

Open Access

Date uploaded

August 23, 2012

Relations

Parents:

Items

Title	Date Uploaded	Visibility	Actions
1471-2105-10-431.pdf	2019-05-07	Public	Download
1471-2105-10-431.xml	2019-05-07	Public	Download
R program for BLANKET. R program for BLANKET. This program yields a value that can be tested in Table 1 for statistical significance of the discovered shortlists.	2019-05-07	Public	Download

Discovering collectively informative descriptors from high-throughput experiments

Downloadable Content

Relations

Items