Regulation of Gene Expression, Cellular Proliferation, and Differentiation in Male Germ Cell Development
To develop an understanding of the functional significance of protein and regulatory networks requires a complete description of the cellular transcriptome. Following the sequencing of the genes from several mammalian species the idenitification of functional genes within a network is based on in silico predictions substantiated by evidence of transcription in vivo. Conservative estimates suggest that there are about 20 000 protein-encoding genes in the mammalian genome. However, in-depth analyses of the transcriptional outputs from a range of experimental approaches suggests that the information content of the genome is much more complex than previously perceived. We used the developing male germ cells as a model system to study key developmental processes that lead to the transition from one cell type to another. To explore and identify the transcriptome complexity in male germ cells, we previously applied Serial Analysis of Gene Expression (SAGE) to profile the expression signature of the major stages of spermatogenesis, including type A spermatogonia (Spga), pachytene spermatocyte (Spcy) and round spermatid (Sptd). This study successfully identified a large number of novel transcripts, alternative splicing transcript variants and anti-sense transcripts.
To provide an unbiased and higher resolution profile of the transcriptome, we expanded our SAGE findings by incorporating whole-genome 25 bp–resolution tiling expression arrays(Affymetrix®). With 45 million oligonucleotide probes and 35 bp probe spacing, we generated a high-definition transcriptome map of developing male germ cells with an unbiased and germ cell–specific whole-genome expression map. Preliminary data proved that, combining the SAGE data set with the tiling platform is powerful, and provides new insights into an analysis of the germ cell. In summary we found that more than 45% of transcripts were not annotated; current annotation only accounts for about 30% of our data set, the remainder of the data set mostly contains expressed sequence tags (ESTs). We identified thousands of transcript cluster units located within introns and internal exons of protein-coding genes indicative that promoter sites are common, and that transcriptional organization is complex. This transcriptional architecture implies that most genomic regions serve multiple functions. Although a large proportion of human transcription occurs outside the boundaries of known genes, the functional significance of such transcription remains unknown.
To further facilitate data analysis of the high throughput genomic assay generated in our laboratory custom bioinformatics tools were developed. GermSAGE is a comprehensive web-based database generated by the Serial Analysis of Gene Expression (SAGE) of the major stages during mouse male germ cell development with a sequence tag coverage of 150,000 in each SAGE library. A total of 452,095 tags derived from Spga, Spcy and round Sptd were included. GermSAGE provides web-based tools for browsing, comparing and searching male germ cell transcriptome data at different stages with customizable searching parameters. The data can be visualized in tabulated format or further analyzed by alignment with various annotations available in the UCSC genome browser. This flexible platform will be useful for gaining a better understanding of the genetic networks that regulate spermatogonial cell renewal and differentiation, and will allow novel gene discovery. GermSAGE is freely available at http://germsage.nichd.nih.gov/.
We are also developing an online tool called TransfragMap to map tiling array data to various gene annotations and mark particular gene features. This tool will accelerate qualitative analysis of tiling microarray data, and is platform independent. It is compatible with tiling microarrays from major tiling array platforms, including Affymetrix and Nimblegen. The tool will support the human and mouse genome in its initial version, and will be expanded to other species.
Previous work has shown that GDNF-receptor-α-1 (GFRA1) is specifically expressed in spermatogonial stem cells (SSCs) and is required for their stem-cell properties. To characterize the molecular phenotype of SSCs, GFRA1(+) and GFRA1(-) spermatogonia were isolated from 6-day-old mice using magnetic activated cell sorting with an antibody to GFRA1 and their microarray-based expression profiles compared. The expression of a number of genes were found to be up-regulated in GFRA1(+) spermatogonia, with the most over-expressed one being Csf1r; it encodes the receptor for granulocyte-macrophage colony stimulating factor (GM-CSF), which has a well-established role in hematopoietic stem cell function. A number of chemokine ligands were also highly overexpressed in SSCs. Analysis revealed the potential role of chemokine signaling in SSCs and suggested a common pathway for GFRa-1 and Csf1r, which may lead to their self-renewal. This is a collaborative study with Dr. Martin Dym, Georgetown University.
Analyses of antisense transcripts suggested the existence of RNA-dependent RNA polymerase (RdRP) activity in mouse germ cells. Antisense transcripts complementary to multiple coding exons were identified for Tcte3, Ldh3, and Calm2. We focused our studies on Calm2 antisense transcript. The Calm2 antisense transcript was present in mouse testis and 3 mouse cell lines, namely,CRL-2576 (mouse spermatogonia cell line), CRL-1715 (mouse Sertoli cell line) and CRL-6436 (mouse kidney cell line). Confirmation of the antisense transcript as a product of the sense transcript was provided by a knockdown experiment. Knocking down the sense transcript of Calm2 using siRNA demonstrated reduced levels of both sense and antisense transcripts indicating that the synthesis of Calm2 antisense transcript was dependent on the sense transcript. Calm2 antisense was not synthesized starting from the 3’ end of the sense mRNA. The sequence representing the potential start site of the action of RdRP was defined. A hybrid RNA containing this sequence ligated to EGFP on its 5’ end was generated and introduced into CRL-6436 cells. Orientation specific RT-PCR showed production of an antisense RNA derived from hybrid RNA. The level of Calm2 anitsense transcript appeared to be independent of the stage of growth of cultured cells and was ubiquitous in all mouse tissues examined (testis, overy, liver, lung, kidney, spleen, thymus, heart, brain and embryo). These results provide further proof of the existence of RdRP activity in mammalian cells. Experiments to isolate and purify RdRP activity are underway.
Through algorithmic analysis of mouse tiling array signals of the 3 main cell stages of spermatogenesis with reference sequences, a pool of stage specific genes were identified. Transmembrane and coiled-coil domain 5A (Tmco5A) and 4930563P21Rik (Tmco5B) were two of the novel genes identified which express exclusively in mouse testis during the post-meiotic stage of male germ cell development. Both genes have a single CAGE tag in testis. Comparative genomics showed conservation of the 2 genes in different species, including human. Evidence from protein structure prediction, gene mapping, sequence alignment and expression profiles suggested that Tmco5A and Tmco5B are very likely to belongto a single gene family with a specific role in male post-meiotic development or sperm function. Our studies are aimed at characterization of the two genes in mouse testis.
Unlike most cancers which peak at old age, testicular germ cell tumors (TGCT) are common in young males. Genomic mutations may be one of the causes of familial TGCT, while accumulating information suggests that aberrant epigenetic changes may contribute to tumorigenesis of different cancers including TGCT. DNA methylation is one of the epigenetic hallmarks that affects chromatin structure, genomic stability and/or altered transcriptional activity. We compared global DNA methylation alterations between normal and testicular tumor cells using methylated DNA immunoprecipitation and tiling array hybridization (MeDIP-chip). A high resolution cytosine methylation map of the human germ cell cancer was obtained and more than 6 thousand Differential Methylated Regions (DMRs) between normal and TGCT cells were identified. More than 70% of DMRs resided in intergenic regions. Promoter methylation accounted for 9%. About 1/3 (27%) of these genes demonstrated the perceived relationship between promoter methylation and gene expression, i.e. hypermethylation associated with suppression of gene expression.
A focal analysis of DMRs located in the regulatory regions of annotated genes yielded 207 differentially methylated genes. We selected 3 candidate genes for further characterization in primary tumor tissues. There three genes were Apolipoprotein L domain containing 1 (APOLD1), Retrotransposon gag domain containing 1(RGAG1) and Protocadherin 10 (PCDH10).The open reading frame of APOLD1 encodes an apolipoprotein-L domain-containing protein whose function is unknown. Remarkably, APOLD1 is located in 12p13.1, a TGCT susceptibility locus identified previously by genetic linkage analysis. RGAG1 is an X-linked retrotransposon-derived neogene with unknown function. Expressed sequence tags (EST) of RGAG1 were found predominantly in testis, suggesting that this retrogene might be important in male germ cell development. PCDH10 encodes a membrane protein for cell adhesion. It had been implicated as a tumor suppressor gene in nasopharyngeal, esophageal, breast, colorectal, cervical, lung and, hepatocellular carcinomas. Differential methylation of these three genes observed in cultured tumor cells was confirmed in primary testicular tumor tissues by bisulfite sequencing and methylation sensitive PCR. Differential expression of these genes in tumor and control tissues was confirmed by RT-PCR. Thus, cultured cells could be used as a model for studying the mechanism of altered methylation in tumors and these genes may serve as novel non-invasive epigenetic markerer for molecular diagnosis of TGCT.
The role of these intergenic DMRs is not clear. They could simply be a consequence of inappropriate epigenetic establishment during PGC differentiation, or might have a hidden regulatory role such as the maintenance of genomic stability or chromatin condensation. Another possible function of non-genic DMRs is the regulation of noncoding RNAs. We mapped the non-genic DMRs to current noncoding RNA database and found that 3 miRNAs (miR-199a-2, miR-124a-2 and miR-184) were hypermethylated in TGCT. Notably, miR-124a was first identified in embryonic stem cells, suggesting that this miRNA may be important in differentiation. Coincidentally, this miRNA was identified to be epigenetically silenced in colon cancer and consequently activate an oncogene CDK6. Furthermore, we also found that 3 snoRNAs (hb11-240, aca22 and aca8) were hypomethylated. Epigenetic changes of noncoding RNAs might lead to deregulation of genetic networks in a wider spectrum, as a single miRNA is capable of regulating numerous target genes.