Although the information of cDNAs is indispensable for analyzing gene function, most of the cDNA sequences stored in 
current databases are imperfect in the sense that they lack the precise information of 5' end termini. To overcome this 
difficulty, a team at the Human Genome Center, Institute of Medical Science, University of Tokyo developed an oligo-capping 
 
method to obtain full-length cDNAs, the information of which has been partly deposited in public databases.  In this 
 
study, they further constructed human cDNA libraries enriched in clones containing the cap structure to systematically 
 
explore the 5' end structure of expressed genes. Of about 284,687 5' end sequences obtained, 155,304 corresponded to 
cDNA sequences of known genes (8,996 genes) and are presented in the DataBase of Transcriptional Start Sites (DBTSS).
Sequence comparison between the DBTSS entries and those of the reference sequence database, RefSeq, revealed that 
4,802 (34.2 %) of RefSeq sequences should be extended towards the 5' ends. The 
team also mapped each sequence on 
the human draft genome sequence to identify its transcriptional start site, which provided more detailed information 
on distribution patterns of transcriptional start sites and adjacent regulatory regions.