Gene Myers
Gene Myers was one of the first seven group leaders to join the Janelia Farm Research Campus of the Howard Hughes Medical Institute in 2005. Gene came to the Janelia from UC Berkeley where he was on the faculty of Computer Science from 2003 to 2005.
He was formerly Vice President of Informatics Research at Celera Genomics for four years where he and his team determined the sequences of the Drosophila, Human, and Mouse genomes using the whole genome shotgun technique that he advocated in 1996. Prior to that Gene was on the faculty of the University of Arizona for 17 years and he received his Ph.D in Computer Science from the University of Colorado in 1981.
Gene is first and foremost a computer scientist who develops algorithms and software for complex problems. In his early career he focused on combinatorial problems in graph theory and sequence analysis. He then gradually transitioned to devoting most of his efforts to computational biology, focusing on search engines and DNA sequencing. Today, his focus is exclusively on trying to build models and atlases from microscope-generated imagery of biological entities.
He is best known for the development of BLAST -- the most widely used tool in bioinformatics, and for the paired-end whole genome shotgun sequencing protocol and the assembler he developed at Celera that delivered the fly, human, and mouse genomes in a three-year period. He has also written many seminal papers on the theory of sequence comparison. He was awarded the IEEE 3rd Millenium Acheivement Award in 2000, the Newcomb Cleveland Best Paper in Science award in 2001, and the ACM Kanellakis Prize in 2002. He was voted the most influential in bioinformatics in 2001 by Genome Technology Magazine and was elected to the National Academy of Engineering in 2003. In 2004 he won the International Max-Planck Research Prize and in 2005 was selected as one of two distinguished alumni (with David Haussler) at his alma-mater, the University of Colorado. In 2006 Gene was inducted into Leopoldina, the German Academy of Science, and awarded an honarary doctorate at ETH, Zurich.
Janelia Publications
Digital reconstruction of neurons from microscope images is an important and challenging problem in neuroscience. In this paper, we propose a model-based method to tackle this problem. We first formulate a model structure, then develop an algorithm for computing it by carefully taking into account morphological characteristics of neurons, as well as the image properties under typical imaging protocols. The method has been tested on the data sets used in the DIADEM competition and produced promising results for four out of the five data sets.
Full reconstruction of neuron morphology is of fundamental interest for the analysis and understanding of neuron function. We have developed a novel method capable of tracing neurons in three-dimensional microscopy data automatically. In contrast to template-based methods, the proposed approach makes no assumptions on the shape or appearance of neuron's body. Instead, an efficient seeding approach is applied to find significant pixels almost certainly within complex neuronal structures and the tracing problem is solved by computing an graph tree structure connecting these seeds. In addition, an automated neuron comparison method is introduced for performance evaluation and structure analysis. The proposed algorithm is computationally efficient. Experiments on different types of data show promising results.
The V3D system provides three-dimensional (3D) visualization of gigabyte-sized microscopy image stacks in real time on current laptops and desktops. V3D streamlines the online analysis, measurement and proofreading of complicated image patterns by combining ergonomic functions for selecting a location in an image directly in 3D space and for displaying biological measurements, such as from fluorescent probes, using the overlaid surface objects. V3D runs on all major computer platforms and can be enhanced by software plug-ins to address specific biological problems. To demonstrate this extensibility, we built a V3D-based application, V3D-Neuron, to reconstruct complex 3D neuronal structures from high-resolution brain images. V3D-Neuron can precisely digitize the morphology of a single neuron in a fruitfly brain in minutes, with about a 17-fold improvement in reliability and tenfold savings in time compared with other neuron reconstruction tools. Using V3D-Neuron, we demonstrate the feasibility of building a 3D digital atlas of neurite tracts in the fruitfly brain.
Automatic alignment (registration) of 3D images of adult fruit fly brains is often influenced by the significant displacement of the relative locations of the two optic lobes (OLs) and the center brain (CB). In one of our ongoing efforts to produce a better image alignment pipeline of adult fruit fly brains, we consider separating CB and OLs and align them independently. This paper reports our automatic method to segregate CB and OLs, in particular under conditions where the signal to noise ratio (SNR) is low, the variation of the image intensity is big, and the relative displacement of OLs and CB is substantial. We design an algorithm to find a minimum-cost 3D surface in a 3D image stack to best separate an OL (of one side, either left or right) from CB. This surface is defined as an aggregation of the respective minimum-cost curves detected in each individual 2D image slice. Each curve is defined by a list of control points that best segregate OL and CB. To obtain the locations of these control points, we derive an energy function that includes an image energy term defined by local pixel intensities and two internal energy terms that constrain the curve's smoothness and length. Gradient descent method is used to optimize this energy function. To improve both the speed and robustness of the method, for each stack, the locations of optimized control points in a slice are taken as the initialization prior for the next slice. We have tested this approach on simulated and real 3D fly brain image stacks and demonstrated that this method can reasonably segregate OLs from CBs despite the aforementioned difficulties.
The centrosome is a dynamic structure in animal cells that serves as a microtubule organizing center during mitosis and also regulates cell-cycle progression and sets polarity cues. Automated and reliable tracking of centrosomes is essential for genetic screens that study the process of centrosome assembly and maturation in the nematode Caenorhabditis elegans.
Linking activity in specific cell types with perception, cognition, and action, requires quantitative behavioral experiments in genetic model systems such as the mouse. In head-fixed primates, the combination of precise stimulus control, monitoring of motor output, and physiological recordings over large numbers of trials are the foundation on which many conceptually rich and quantitative studies have been built. Choice-based, quantitative behavioral paradigms for head-fixed mice have not been described previously. Here, we report a somatosensory absolute object localization task for head-fixed mice. Mice actively used their mystacial vibrissae (whiskers) to sense the location of a vertical pole presented to one side of the head and reported with licking whether the pole was in a target (go) or a distracter (no-go) location. Mice performed hundreds of trials with high performance (>90% correct) and localized to <0.95 mm (<6 degrees of azimuthal angle). Learning occurred over 1-2 weeks and was observed both within and across sessions. Mice could perform object localization with single whiskers. Silencing barrel cortex abolished performance to chance levels. We measured whisker movement and shape for thousands of trials. Mice moved their whiskers in a highly directed, asymmetric manner, focusing on the target location. Translation of the base of the whiskers along the face contributed substantially to whisker movements. Mice tended to maximize contact with the go (rewarded) stimulus while minimizing contact with the no-go stimulus. We conjecture that this may amplify differences in evoked neural activity between trial types.
Volume-object annotation system (VANO) is a cross-platform image annotation system that enables one to conveniently visualize and annotate 3D volume objects including nuclei and cells. An application of VANO typically starts with an initial collection of objects produced by a segmentation computation. The objects can then be labeled, categorized, deleted, added, split, merged and redefined. VANO has been used to build high-resolution digital atlases of the nuclei of Caenorhabditis elegans at the L1 stage and the nuclei of Drosophila melanogaster's ventral nerve cord at the late embryonic stage. AVAILABILITY: Platform independent executables of VANO, a sample dataset, and a detailed description of both its design and usage are available at research.janelia.org/peng/proj/vano. VANO is open-source for co-development.
The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single-cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. Molecular signatures identified repeating cell fate modules within the cell lineage and enabled the generation of a molecular differentiation map that reveals points in the cell lineage when developmental fates of daughter cells begin to diverge. These results demonstrate insights that become possible using computational approaches to analyze quantitative expression from many genes in parallel using a digital gene expression atlas.
MOTIVATION: Caenorhabditis elegans, a roundworm found in soil, is a widely studied model organism with about 1000 cells in the adult. Producing high-resolution fluorescence images of C.elegans to reveal biological insights is becoming routine, motivating the development of advanced computational tools for analyzing the resulting image stacks. For example, worm bodies usually curve significantly in images. Thus one must 'straighten' the worms if they are to be compared under a canonical coordinate system. RESULTS: We develop a worm straightening algorithm (WSA) that restacks cutting planes orthogonal to a 'backbone' that models the anterior-posterior axis of the worm. We formulate the backbone as a parametric cubic spline defined by a series of control points. We develop two methods for automatically determining the locations of the control points. Our experimental methods show that our approaches effectively straighten both 2D and 3D worm images.
Prior Publications (9)
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
We report on the quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it. Three independent external data sources essentially agree with and support the assembly's sequence and ordering of contigs across the euchromatic portion of the genome. In addition, there are isolated contigs that we believe represent nonrepetitive pockets within the heterochromatin of the centromeres. Comparison with a previously sequenced 2.9- megabase region indicates that sequencing accuracy within nonrepetitive segments is greater than 99. 99% without manual curation. As such, this initial reconstruction of the Drosophila sequence should be of substantial value to the scientific community.
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints.
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.






