GExplore User Guide

Welcome to GExplore. The help pages are arranged to help a new user navigate this website. Experienced users can jump directly to the FAQ section.

  • The introduction part provides background information.
  • The Databases section explains the content of the GExplore databases.
  • The Search fields section explains what is behind the search fields, some of which allow complex searches.
  • When you hit the 'search' button you will be presented with a result page that allows you to further explore the data related to genes that fulfilled your search conditions. Those are covered under the 'Display option' section.
  • The 'Tools' part contains two simple features that can help you to continue with the genes you found outside of GExplore.
  • The FAQ section might address some questions that come up, when you use the website.

Last updated on Oct 14, 2024

Introduction

Overview

GExplore is a tool designed to offer quick access to data related to gene or protein function in C. elegans. The interface is simple, and response times are fast to encourage exploratory searches with a large number of genes.

The database contains selected genome-wide data sets, including protein domain organization, gene expression, and phenotype data, which are important indicators for gene and protein function. Data sets were obtained from WormBase and selected original research papers and processed to integrate data and to simplify searches.

This site should be useful for experimental planning of genome-scale experiments quick and survey-type queries. With this interface, you should be able to get quick answers to questions such as:

  • How many small secreted proteins are there?
  • Which putative cell surface receptors are expressed in neurons?
  • How many kinases expressed in muscle cells have putative null alleles available?

Users can also enter their own lists of genes (up to several thousand) and extract subets of genes with certain characteristics.

What's new?

The latest release of GExplore (version 1.5, October 2024) is a major update including updates to the gene, mutation and protein databases as well as novel data sets and features. The web interface was completely redesigned

Database update
  • Gene and mutation databases were updated to WormBase release WS294.
  • The Gene database now contains disease associations, interacting genes and expression data from 'genomic studies' for text searches.
  • Display functions for the mutation database were expanded to show the breakpoints of deletions and whether the deletion causes a frameshift or not.
  • The protein database now contains proteome and ortholog data from 19 nematodes species and five 'model organsims'.
  • GExplore now hosts three genome-wide published RNA-seq data sets:
    • stages: expression profiles from 18 different embryonic and larval stages; see Boeck, M et al., (2016).
    • tissues: expression profiles for 27 tissues and cell types at the L2 larval stage; see Cao et al. (2017).
    • embryo: expression profiles for seven embryonic cell populations from gastrulation to terminal differentiation; see Warner et al. (2019).)

Website redesign

  • All webpages were redesigned to make the site more accessible for novel users
  • The help pages were reorganized and expanded.
  • The search pages have short help text to show what the purpose of the various search fields.
  • The result pages have a side bar with sections for search results data, display and download options as well as a help section.

Latest news - March 2025

A new RNAseq database was added. It contains data from several hundred embryonic cells at different stages in C. elegans and C. briggsae.

Last updated on March 16, 2025

Databases

GExplore consists of six different databases, each with its own search page.

Genes

The gene database contains curated datasets relevant to predict the putative function(s) of genes and proteins. These datasets are:

  • Protein domain organization
  • General gene description
  • Phenotype and expression data
  • Gene interaction data
  • Gene ontology (GO) terms
  • Disease associations

In addition the database contains the genetic/nucleotide position of the genes allowing the user to select/identify all genes in a particular region of a chromosome. All data have been sourced from WormBase. The current version is based on WormBase release 294 (October 2024).

The search interface allows users to quickly identify groups of genes that share specific features, such as genes associated with Parkinson's Disease that are expressed in dopaminergic neurons. Alternatively, users can input large lists (hundreds or thousands) of candidate genes and filter them based on certain characteristics.

The database contains all splice variants of a gene. However, gene-specific data such as description, expression, phenotype etc are only associated with the 'longest splice variant'. The other splice variants will have "N/A" in the corresponding columns. This means the database does not contains any splice-variant specific data. This includes the 'nucleotide position'.

The system’s fast response time and flexible output options make it easy to perform efficient, survey-type queries with large groups of genes in both the input and output.

Mutations

Mutation data were downloaded from WormBase release 294 (October 2024). The dataset includes only exonic alleles and alleles affecting splice junctions, while SNPs from wildtype isolates have been excluded. This curated list of mutations is enriched for alleles likely to affect protein function.

The location of the mutations is mapped onto the domain organization of the protein, providing a visual display of both the location and nature of the mutations. For deletions, a color code is used to distinguish between in-frame deletions (purple) and frameshift deletions (red). The deletion endpoints are indicated as being located either 'outside' the gene, in an 'intron', or in an 'exon'. Display options also include the ability to show the protein sequence with the affected amino acids highlighted. Mutations are mapped to all known splice variants of the gene, enabling easy identification of alleles specific for a particular splice variant.

The WS294 version of the database contains:

  • 227,186 alleles in 19,661 genes (28,244 splice variants)
  • 13,006 deletions
  • 183,553 missense alleles
  • 10,015 nonsense alleles
  • An additional 9,042 alleles without molecular information.
    These are mostly older alleles where the molecular information was never entered in WormBase.
Proteins

The protein database contains data from 13 Caenorhabditis species (C. becei, C. brenneri, C. briggsae, C. elegans, C. japonica, C. parvicauda, C. quiockensis, C. remanei, C. sulstoni, C. uteleia, C. tropicalis, C. waitukubuli, C. zanzibari), representing all branches of the Caenorhabditis phylogenetic tree.

In addition, the database contains proteome data from 7 other nematode species (B. malayi, O. tipulae, O. volvulus, P. pacificus, P. redivivus, S. ratti, T. muris) that have high-quality data available. For comparison, the database also includes the proteomes of major model organisms (S. cerevisiae, D. melanogaster, D. rerio, M. musculus, H. sapiens).

Protein data were downloaded either from WormBase (for nematode species) or from the reference proteome site for model organisms. SMART was used for protein domain prediction.

To facilitate comparisons across species, the database also contains information about homologs in all species included in the database. The search interface allows users to select proteins with specific protein domains or domain arrangements for a chosen species. The output can be limited to proteins with homologs in one or more other species, enabling quick identification of either nematode-specific genes or evolutionarily conserved genes of a certain type (e.g. kinases).

Expressions
Expression (Stages)

The expression data for the different stages of the life cycle of C. elegans were obtained as part of the NHGRI modENCODE project, in collaboration with the Waterston (Max Boeck, Chau Huynh, LaDeana Hillier, University of Washington), Reinke (Guilin Wang, Dionna Kasper, Yale University), and Miller (Clay Spencer, Vanderbilt University) labs ( Hillier et al. (2009), Gerstein et al. (2010), Gerstein et al. (2014), Boeck et al. (2016)).

Briefly, 18 samples were analyzed using RNA-seq. The results provided here represent a subset of that data, derived from synchronized whole animals at embryonic and post-embryonic stages. For post-embryonic stages, two or more biological replicates were obtained, and weighted averages are provided. For embryonic stages, four independent time series were collected, with samples taken every 30 minutes (though some samples were lost due to technical failures). For detailed data collection and analysis, see Boeck et al. (2016).

Expression for each gene is provided in dcpm units (depth of coverage per million 35 base reads)1. This measure offers base-pair resolution, which simplifies the analysis of shorter features such as exons and splice junctions. A dcpm value of 1.5 represents an average expression level. Values between 0.003 and 0.1 generally indicate significant expression. To approximate rpkm values, use the following equation: rpkm = dcpm * 1000 / 35.

Expression (Tissues)

The expression data for cells and tissues at the L2 larval stage were generated using Single-cell Combinatorial Indexing RNA-seq (sci-RNA-seq). For details on data collection and analysis, see Cao et al. (2017). In brief, the expression profiles of nearly 50,000 cells from the L2 stage were determined using sci-RNA-seq. From this dataset, consensus expression profiles for 27 cell types were defined.

The profiles available through GExplore correspond to the data in Tables S2 to S4 of Cao et al. (2017), while images of the expression profiles align with Figure S14. Expression levels for each gene are presented in transcripts per million (TPM).

Enrichment data were pre-calculated and are presented as the "ratio" of a gene's expression in the most highly expressing cell type compared to the second-most highly expressing cell type. "qval" represents the false detection rate (FDR) at which the gene is considered differentially expressed between the highest and second-highest expressing cell types.

For reference, in Cao et al. (2017), a gene is labeled as "cell type enriched" if the ratio >= 5 and q-val < 0.05.

Expression (Embryo)

The expression data for embryonic tissues were generated using RNA-seq. For detailed information on data collection and analysis, see Warner et al. (2019). In brief, synchronized populations of embryos expressing tissue-specific fluorescent markers were sampled at 90-minute intervals over five time points, starting 120 minutes after embryo isolation (about halfway through gastrulation, referred to as t0).

The final time point (480 minutes after isolation, t4) corresponds to the early 3-fold stage, when cells begin terminal differentiation, but before cuticle formation. Time series data were collected for five different tissues/organs—hypodermis, intestine, pharynx, muscle, and neurons—as well as for two lineage-specific populations (ABa and ABala).

Expression levels are reported in transcripts per million (TPM).

Expression (species) - coming soon

Synchronized embryos, spanning the 28-cell stage to terminal differentiation were dissociated into single cells and sequenced for their RNA content using the 10X Genomics platform. The dataset contains >200,000 C. elegans and >190,000 C. briggsae cells that were labeled for their cellular identity using known molecular markers. The data presented here are summary expression values in Transcripts Per Million (TPM) for each cell type annotated in the dataset. Terminally differentiated cell types were further subdivided into time bins using an estimated ‘embryo time’ value that attempts to approximate when in development the single cell originates from. Single-cell RNA-sequencing allows for the capture of expression for individual cells across an animal. However, due to methodological inefficiencies, the expression values are frequently ‘zero-inflated’, meaning that expression values are often zero for a given gene in a single cell. Using many individual cells of a cell type, we can estimate the average expression value in a summary metric such as the transcripts per million (TPM). However, genes with expression on the lower end of the spectrum tend to still be inherently noisy. We caution on the over interpretation of TPM values below 80 for most cell types in these data.

Using the strategy of correlating our single-cell expression values with the bulk embryo time course results in poor estimations for the germline, as these cells frequently correlate with early embryonic expression (<28-cell stage) greater than any other timepoint. To circumvent this issue, we use a method called pseudo time estimation from the Monocle3 package (https://www.nature.com/articles/nbt.2859). It tries to order cells along a developmental trajectory. We subdivided the germline into three pseudo time bins, ‘early’, ‘middle’ and ‘late’.

The database contains 615 data points (different cell types and/or different time points). Cells are grouped in the following way.

  • ABp lineage (96 cells)
  • ABal lineage (84 cells)
  • ABar lineage (41 cells)
  • MS lineage (53 cells)
  • E lineage (2 cells)
  • C lineage (22 cells)
  • D lineage (4 cells)
  • Ciliated neurons (20 cell types, 2-3 timepoints)
  • Non-Ciliated neurons (47 cell types, 1-3 timepoints)
  • Glia and excretory cells (10 cell types, 1-3 timepoints)
  • Muscle cells (8 cell types, 3 timepoints)
  • Mesoderm (8 cell types, 1-4 timepoints)
  • Hypodermis and seam (9 cell types, 1-4 timepoints)
  • Intestinal and rectal cells (6 cell types, 3-4 timepoints)
  • Pharyngeal cells (16 cell types, 1-3 timepoints)
  • Germline cells (3 timepoints)

The following color code is used in the expression graphs.

GExplore Logo

OrthoFinder2 was used to systematically group homologous C. elegans and C. briggsae genes into ‘orthogroups’ by reciprocal BLAST searches. On the display page the user has the option to display all members of the orthogroup(s) for a direct comparison of the expression of homologous genes in the two species.

Last updated on March 16, 2025

Search Fields

Overview

You only need to enter search terms in one of the search fields. If you enter search terms in multiple fields, they will be combined for an 'and' search (the result will contain only genes that fulfill all search criteria). You can hit the 'back' button on your broswer to go back to the search page to modify a search. When you start a completely new search, make sure that you clear the search fields. Any 'left-over' entries from previous search typically result in '0 genes found', since entires in all search fields are combined for an 'and' search.

What to expect?

Once the search is executed, the results will be displayed in table format. On the result page, you can:

  • Select additional data to display.
  • Remove genes from the result table to finetune the output
  • Export the table as text or csv file.
Gene Search Field

The gene search field accepts the following identifiers (or any combination):

  • WormBase gene ID: WBGene00003738
  • Gene sequence name: F54F3.1
  • Locus name: nid-1
  • Protein sequence name: F54F3.1a

Names can be separated by commas, semicolons, space or newline characters. You can enter (or copy) a list of thousands of gene names for a single search. * can be used as a wildcard character, representing one or more characters of any kind. For example, unc-* means all gene names beginning with unc-.

Protein Domain Search Fields
Domain

Enter one or more of the domain abbreviations from the table of domain abbreviations (example: EGF, KIN) or SMART/Pfam identifiers (example: SM00273, PF12423). Domain abbreviations are case-sensitive! This field features autosuggest for domain abbreviations. Search terms are not limited to the suggested terms, i.e. you can ignore them. You can use * as wildcard character, e.g. 7TM* will find all domains beginning with 7TM* (7TM_GPCR_Sra, 7TM_GPCR_Srab, 7TM_GPCR_Srb, etc)

Domain Pattern

Use this field to search for proteins with certain domain combinations. Think of a protein as a linear sequence of domains like IG IG IG FN3 FN3 TM 324. You can use * as wildcard character representing 0 or more domains of any kind and numbers (but no Boolean terms) in the following way:

Pattern Input Description
IG IG IG 3 IGs in a row with no other domain in between.
IG*FN3 The IG domain is eventually followed by an FN3 domain.
3IG Expanded to IG IG IG: 3 IG domains in a row. Ensure there’s no space between the number and the domain abbreviation.
3IG% Represents IG*IG*IG: 3 IGs with other domains interspersed (% acts as a wildcard during expansion).
3-5IG Represents 3 to 5 IG domains: IG IG IG, IG IG IG IG, or IG IG IG IG IG.
<3IG FN3 Translated as IG FN3 or IG IG FN3: meaning at least 1 but no more than 2 IGs followed by FN3.
>3IG FN3 Translated as IG IG IG IG*FN3: meaning at least 4 IGs followed by FN3.
Text Search Fields
General rules

The text search fields on the "Genes" search page are: Description, Phenotype, Expression, Gene Ontology, and Disease Association. All except for the 'description' text field use WormBase-defined vocabularies. This means the database only contains terms that are part of the corresponding vocabulary. To help the user identify, which terms can be used, these search fields offer autosuggest functionality. However, input is not limited to suggested terms. In any text search field you can use any term, e.g. searching for muscle will find any term containing muscle such as muscle_cell or vulval_muscle when used in the expression search field and spicule muscles of the male when used in the description text field.

Text search fields will return any match to the search term(s). You don't need to use * as wildcard character. You can ...

  • ... use comma-separated lists for an OR Search
  • ... use Boolean logic to combine search terms: ((axon and dendrite) or neuron) not muscle
Search Type Example Result Use Case
AND Search axon and dendrite Only results containing both "axon" and "dendrite". Use when you want to narrow down results to items that meet all criteria. This gives fewer, more specific results.
OR Search axon or dendrite Results containing either "axon", or "dendrite", or both. Use when you want to broaden the search to include items that match any of the criteria, increasing the number of results.
NOT Search neuron not muscle Results containing "neuron" but not "muscle". Use when you want to limit the search to exclude items.

On the output page, search terms for the text search fields will be highlighted for easier recognition.

Description Search Field

The Legacy Description is equivalent to the "Legacy manual gene description" in WormBase, while the Automated Description is the top-level description under the “Overview” tab in WormBase.

Wormbase Descriptions
Phenotype and Expression Search Fields

The Phenotype and Expression search fields use WormBase-defined vocabularies. These fields include an autosuggest function, which suggests search terms as you type. You are not required to use the suggested terms—you can search using any word, and GExplore will find matches containing your search word.

The Tissues and Stages (Genomic Studies) search field refers to data from single-cell RNAseq studies or enrichment analysis of high-throughput studies, where expression in specific tissues is compared to whole animals.

Gene Ontology and Disease Association Search Fields

These search fields also use predefined vocabularies and feature an autosuggest function to help find relevant terms.

Search Fields on the Mutation Search Page
Checkbox: "Do not show deletions affecting more than one gene"

Some deletions are large and affect multiple genes. If your goal is to identify mutations in a single gene, you can use this checkbox to exclude large deletions that affect more than one gene.

Mutation Type

The Mutation Type checkboxes allow you to limit the output to mutations that significantly impact the protein. Nonsense and Deletion mutation types give you more flexibility:

Nonsenses removing greater than x% and Deletions removing greater than x%:

Limits the output to nonsense allele/deletions that truncate the proteins by at least x%. This might be useful to identify putative null alleles by eliminating alleles that only affect the C-terminus of the protein.

Mutation Source (Million Mutation Project, KO Consortium, Other)

The Mutation Source field allows you to filter mutations based on their origin:

  • Million Mutation Project: A large number of mutations were identified by random mutagensis followed by whole-genome sequencing, see Thompson et al. (2013).
  • KO Consortium: The KO Consortium (Moerman and Barstead (2008)) provides putative null alleles.
  • Other Sources: Includes mutations from individual research projects by the C. elegans community.
Search Fields on the Expression Search Pages (Stages, Tissues, Embryos, Species)

Each of the four pages has unique search fields based on the underlying expression database. Details can be found directly on the search pages.

Last updated on Oct 24, 2024

Display Options

General Comments

All result pages now have a consistent format, with a sidebar showing either a summary of the results or various display options or a help section. Data are presented in a flexible table format. Users can export the results as raw text TXT or in CSV format for further local processing.

The "List of proteins/genes found" section contains the list of genes or proteins that meet the search criteria. This list can be copied and used to compare different search results using the Compare tool (refer to the Tools section for details).

Most display options add an extra column to the results table with the relevant data. The names of these options are generally self-explanatory.

Protein Domain Organization

Protein domains, such as SMART or Pfam domains, are displayed as small rectangles containing a unique domain abbreviation. Larger meta-domains, like 7TM_chemorcpt_1, which typically span the entire protein, are represented as larger rectangles. Note that the size of the rectangle does not correspond to the actual size of the domain.

A table of abbreviations is used for selected domains that are found in multiple proteins. SMART/Pfam domains that are not listed in the abbreviation table are shown as gray rectangles with the SMART/Pfam identifier inside. These rectangles link to the corresponding SMART or Pfam entry.

Numbers within the rectangles represent the number of amino acids and indicate parts of the protein that are not assigned to any domain. To improve performance, images representing the domain organization of all proteins were pre-calculated for faster response times.

Gaps between domains that are smaller than 30 amino acids are suppressed, as they likely reflect artificial boundaries of the consensus sequences used to predict domains, which tend to be smaller than the actual domain. For example, an Immunoglobulin (IG) domain is approximately 125 amino acids in size, while the consensus sequence for the SMART IG domain (SM000409) is only 100 amino acids.

On the mutation results page, users can adjust the suppression of gaps to a smaller value for more precise output.

Special Display Options for Mutations

Show gaps > x amino acids: By default, gaps greater than 30 amino acids are suppressed (as explained above). If a more precise location of the mutation is needed, users can lower this threshold to show smaller gaps between domains.

Special Display Options for Expression Data (Stages, Tissues, Embryo)

Checkboxes: Each of the three expression data sets (stages, tissues, embryo) has a graphic display of expression data for each gene. To provide a complete overview of these data sets, each results page includes checkboxes that allow users to also display the graphs for:

  • Stage-specific expression graph
  • L2 tissue expression graph
  • Embryonic tissue expression graph

By selecting these options, users can easily access all three graphical representations, offering a comprehensive view of the gene expression across different stages, tissues, and embryonic development.

Special Display Options for Species Expression Data

Checkboxes: The checkbox 'Homologs (orthogroups)' allows the display of all homologs of the original gene set. Sort the 'Orthogroup' column to have the corresponding homologs in adjacent rows. Rows can also be rearranged by dragging them into a new position.

Last updated on March 23, 2025

Tools

Compare Tool

Compare provides an easy way to find unique and common genes within two sets of genes (two lists of terms of any kind). The input fields accept a list of terms such as the "list of genes/proteins" found on the results pages of GExplore. Names can be separated by commas, semicolons, space or newline characters. The output consists of three lists: the common genes and the unique genes from each of the two input sets.

Convert Tool

The convert input field accepts any combination of C. elegans gene identifiers:

  • Locus Name: nid-1
  • Gene Sequence Name: F54F3.1
  • WormBase Gene ID: WBGene00003738
  • Protein Sequence Name: F54F3.1a
  • Wormpep ID: CE18731
  • Uniprot ID: Q93791

Names can be separated by commas, semicolons, space or newline characters. The output consists of a table with all the identifiers for all the genes in the input field. This can be used to standardize identifiers that come from different sources or to convert one set of identifiers into another one.

Last updated on Oct 14, 2024

Table of Domain Abbreviations

Domains are depicted as small rectangles containing a unique domain abbreviation. Meta-domains like 7TM_chemorcpt_1, which typically cover the entire protein, are displayed as larger rectangles. The size of the rectangle does not reflect the actual domain size.

Abbreviation Display Pfam SMART Description
14_3_3
PF00244 SM00101 14_3_3 domain
3b-HSD
PF01073 3-beta hydroxysteroid dehydrogenase/isomerase family
7TM_GPCR_Sra
PF02117 Nematode chemoreceptor, Sra
7TM_GPCR_Srab
PF10292 Serpentine type 7TM GPCR receptor class Srab
7TM_GPCR_Srb
PF02175 Serpentine beta receptor
7TM_GPCR_Srbc
PF10316 Serpentine type 7TM GPCR chemoreceptor Srbc
7TM_GPCR_Sre
PF03125 C. elegans Sre G protein-coupled chemoreceptor
7TM_GPCR_Srg
PF02118 Serpentine gamma receptor, Caenorhabditis species
7TM_GPCR_Srh
PF10318 Serpentine type 7TM GPCR receptor class Srh
7TM_GPCR_Sri
PF10327 Serpentine type 7TM GPCR receptor class Sri
7TM_GPCR_Srj
PF10319 Serpentine type 7TM GPCR receptor class Srj
7TM_GPCR_Srr
PF03268 Domain of unknown function DUF267
7TM_GPCR_Srsx
PF10320 Serpentine type 7TM GPCR receptor class Srsx
7TM_GPCR_Srt
PF10321 Serpentine type 7TM GPCR receptor class Srt
7TM_GPCR_Sru
PF10322 Serpentine type 7TM GPCR receptor class Sru
7TM_GPCR_Srv
PF10323 Serpentine type 7TM GPCR receptor class Srv
7TM_GPCR_Srw
PF10324 Serpentine type 7TM GPCR receptor class Srw
7TM_GPCR_Srx
PF10328 Serpentine type 7TM GPCR receptor class Srx
7TM_GPCR_Srxa
PF03383 Serpentine type 7TM GPCR receptor class Srxa
7TM_GPCR_Srz
PF10325 Serpentine type 7TM GPCR receptor class Srz
7TM_GPCR_Str
PF10326 Serpentine type 7TM GPCR receptor class Str
AA_perm
PF00324 amino acid permease
Aa_trans
PF01490 amino acid transporter
AAA
PF00004 SM00382 AAA ATPase
ABC2_m
PF01061 ABC-2 type transporter
ABC_transporter
PF00005 ABC transporter related
ABC_transporter_TM
PF00664 PF06472 ABC transporter, transmembrane region, type 1
ABHyd
PF00561 alpha/beta hydrolase fold
AbHyd3
PF07859 alpha/beta hydrolase fold
ACBP
PF00887 Acyl CoA binding protein
AcCoA_dh
PF00441 Acyl-CoA dehydrogenase, C-terminal domain
actin
PF00022 SM00268 Actin/actin-like
Acy_dh2
PF08028 Acyl-CoA dehydrogenase, C-terminal domain
Acy_dh_M
PF02770 Acyl-CoA dehydrogenase, middle domain
Acy_dh_N
PF02771 Acyl-CoA dehydrogenase, N-terminal domain
AcylT
PF00583 AhpC/TSA family
Acyltransferase
PF01757 Acyltransferase 3
AD
PF01421 Peptidase M12B, ADAM/reprolysin
ADH_N
PF00107 Zinc-binding dehydrogenase
Ah_perox
PF03098 Animal haem peroxidase
AhpC
PF00578 AhpC/TSA family
AKR
PF00248 Aldo/keto reductase family
Aldedh
PF00171 Aldehyde dehydrogenase family
AMPB
PF00501 AMP-binding enzyme
ANFR
PF01094 receptor family ligand binding region
Ankyr
PF00023 SM00248 Ankyrin repeat
APH
PF01636 Phosphotransferase enzyme family
Apple
PF00024 SM00223 SM00473 apple
Arm
PF00514 SM00185 armadillo repeat
Arre_C
PF02752 SM01017 Arrestin (or S-antigen), C-terminal domain
Arre_N
PF00339 Arrestin (or S-antigen), N-terminal domain
Asp
PF00026 Aspartate protease
AT
PF01821 SM00104 Anaphylatoxin/fibulin
AT_12
PF00155 Aminotransferase class I and II
AT_5
PF00266 Aminotransferase class-V
AT_hook
PF02178 SM00384 DNA binding domain with preference for A/T rich regions
B4_1N
PF00373 SM00295 Band 4.1, N-terminal
BACK
PF07707 SM00875 BTB And C-terminal Kelch
Band_7
PF01145 SM00244 Band 7 protein (found in stomatin)
Bestrophin
PF01062 Bestrophin
bHLH
PF00010 SM00353 Basic helix-loop-helix region
BPI2
PF02886 SM00329 BPI/LBP/CETP C-terminal domain
Branch
PF02485 Core-2/I-Branching enzyme
BRCT
PF00533 SM00292 BRCT
Bromo
PF00439 SM00297 Bromodomain
BTB
PF00651 SM00225 BTB/POZ-like
bZIP
PF07716 SM00338 Basic-leucine zipper (bZIP) transcription factor
C1
PF00130 SM00109 Protein kinase C, phorbol ester/diacylglycerol binding
C2
PF00168 SM00239 C2 calcium-dependent membrane targeting
C6
PF01681 PF02795 SM01048 Protein of unknown function C6
C_AcyT
PF00755 Choline/Carnitine o-acyltransferase
CA
PF00028 SM00112 cadherin
Carbesterase_B
PF00135 Carboxylesterase, type B
Cation_efflux
PF01545 cation efflux family
CBM
PF01607 SM00494 Peritrophin-A domain is found in chitin binding proteins
CBS
SM00116 Domain in cystathionine beta-synthase and other proteins
CH
PF00307 SM00033 Calponin homology domain
Chit
SM00636 Chitinase II
CHK
SM00587 ZnF_C4 abd HLH domain containing kinases domain
Chromo
PF00385 SM00298 Chromo domain
CK
SM00041 Growth factor, cystine knot
CL
PF00059 SM00034 C-type lectin
Claudin3
PF06653 Tight junction protein, Claudin-like
CLC
PF07062 claudin homolog
cNMP
SM00100 Cyclic nucleotide-monophosphate binding domain
COL
PF01391 collagen
ColN
PF01484 SM01088 N-terminus of cuticular collagens
CRIB
PF00786 SM00285 PAK-box/P21-Rho-binding
CT
PF00100 SM00241 Endoglin/CD105 antigen
Ctr
PF04145 Ctr copper transporter family
CtX
PF02363 C_tripleX; cysteine rich repeat
CUB
PF00431 PF02408 SM00042 CUB domain
CW
PF08277 SM00605 CW domain (found in nematodes only so far ...)
CX
PF01705 CX domain (found in nematodes only so far ...)
CY
PF00031 SM00043 Proteinase inhibitor I25, cystatin
CYCL
PF00134 SM00385 domain present in cyclins, TFIIB and Retinoblastoma
CysPc
PF00648 SM00230 Calpain-like thiol protease family
Cyt-b5
PF00173 Cytochrome b5
Cytochrome_P450
PF00067 Cytochrome P450
DAO
PF01266 FAD dependent oxidoreductase
DB
PF01682 DB domain (found in nematodes only so far ...)
DC
SM00289 DC-domain; Cysteine-rich repeat
DEAD_N
SM00487 DEAD-like helicase, N-terminal
Death
PF00531 SM00005 Death domain
DH_SDR
PF00106 SM00822 Short-chain dehydrogenase/reductase SDR
DI
PF00200 SM00050 disintegrin domain
DLH
PF01738 Dienelactone hydrolase family
DM
PF00751 SM00301 Doublesex DNA-binding motif
DnaJ
SM00271 DnaJ molecular chaperone homology domain
DOMON
PF03351 SM00664 DOMON domain
DSL
PF01414 SM00051 Delta/Serrate/lag-2 (DSL) domain
DSRM
PF00035 SM00358 Double-stranded RNA binding motif
DUF1096
PF06493 Domain of unknown function DUF1096
DUF1114
PF06542 Domain of unknown function DUF1114
DUF1248
PF06852 Domain of unknown function DUF1248
DUF1258
PF06869 Domain of unknown function DUF1258
DUF1261
PF06879 Domain of unknown function DUF1261
DUF1265
PF06887 Domain of unknown function DUF1265
DUF1280
PF06918 Domain of unknown function DUF1280
DUF13
PF01482 Domain of unknown function DUF13
DUF130
PF02343 Domain of unknown function DUF130
DUF148
PF02520 Domain of unknown function DUF148
DUF1605
PF07717 Domain of unknown function DUF1605
DUF1647
PF07801 Domain of unknown function 1647
DUF1679
PF07914 Domain of unknown function DUF1679
DUF19
PF01579 PF03236 Domain of unknown function DUF19
DUF227
PF02958 Domain of unknown function DUF227
DUF229
PF02995 Domain of unknown function DUF229
DUF23
PF01697 Domain of unknown function DUF23
DUF236
PF03057 Domain of unknown function DUF236
DUF2650
PF10853 Domain of unknown function DUF2650
DUF268
PF03269 Domain of unknown function DUF268
DUF271
PF03407 Domain of unknown function DUF271
DUF272
PF03312 Domain of unknown function DUF272
DUF273
PF03314 Domain of unknown function DUF273
DUF274
PF03409 Domain of unknown function DUF274
DUF281
PF03436 Domain of unknown function DUF281
DUF282
PF03380 Domain of unknown function DUF282
DUF288
PF03385 Domain of unknown function DUF288
DUF316
PF03761 Domain of unknown function DUF316
DUF3557
PF12078 Domain of unknown function DUF3557
DUF38
PF01827 Domain of unknown function DUF38
DUF595
PF04590 Domain of unknown function DUF595
DUF621
PF04789 Domain of unknown function DUF621, putative G-protein coupled receptor
DUF644
PF04870 Domain of unknown function DUF644
DUF672
PF05050 Domain of unknown function DUF672
DUF684
PF05075 Domain of unknown function DUF684
DUF780
PF05611 Domain of unknown function DUF780
DUF870
PF05912 Domain of unknown function DUF870
DUF976
PF06162 Domain of unknown function DUF976
DUF_CC
PF04942 Domian of unknown function, DUF-CC
E1E2ATP
PF00122 proton ATPase
EamA
PF00892 EamA-like transporter family
EB
PF01683 EB-domain (found in nematodes only so far ...)
ECH
PF00378 Enoyl-CoA hydratase/isomerase family
EF-Tu
PF00009 GTP-binding elongation factor family, EF-Tu/EF-1A subfamily
EF-Tu2
PF03144 Elongation factor Tu domain 2
EF_hand
PF00036 SM00054 Calcium-binding EF-hand
EGF
PF00008 PF07974 PF07645 SM00001 SM00179 SM00181 EGF module
Epi
PF01370 NAD dependent epimerase/dehydratase family
ET
PF01684 Domain of unknown function ET
Ets
PF00178 SM00413 Ets domain
EXOIII
SM00479 exonuclease domain in DNA-polymerase alpha and epsilon chain, ribonuclease T and other exonucleases
F5_8C
PF00754 SM00231 Coagulation factor 5/8 type, C-terminal
FADB2
PF00890 FAD binding domain
FARP
PF01581 FMRFamide related peptide family
Fba_2
PF07735 F-box associated type 2
Fbox
PF00646 SM00256 Fbox
FeoB_N
PF02421 Ferrous iron transport protein B
FERM_C
PF09380 FERM C-terminal PH-like domain
FG
PF00147 SM00186 Fibrinogen, alpha/beta/gamma chain, C-terminal globular
FHA
PF00498 SM00240 Forkhead-associated
FN3
PF00041 SM00060 Fibronectin, type III-like fold
FolN
PF09120 PF09289 SM00274 Follistatin-like, N-terminal
Fork_Hd
PF00250 SM00339 Fork head transcription factor
Fringe
PF02434 Fringe-like
FU
SM00261 furin repeat
Fukutin
PF06828 Fukutin-related
FZ
PF01392 SM00063 Frizzled cysteine-rich domain
G_alpha
PF00503 SM00275 Guanine nucleotide binding protein (G-protein), alpha subunit
G_patch
PF01585 SM00443 glycine rich nucleic binding domain
Gal_T
PF01762 Galactosyltransferase
Gcyc
PF00211 SM00044 guanylate cyclase
GL
PF00337 SM00276 SM00908 Galectin, galactose-binding lectin
Globin
PF00042 Globin-like
Glycoside_hydrolase
PF00933 Glycoside hydrolase, catalytic core
GlyT
PF01531 Glycosyl transferase family 11
GlyT28_C
PF04101 Glycosyltransferase family 28 C-terminal domain
GlyTr2
PF00535 Glycosyl transferase family 2
GPCR_Rhodopsin
PF00001 GPCR, rhodopsin-like superfamily
GPCR_secretin
PF00002 GPCR, family 2, secretin-like
GPS
PF01825 SM00303 GPS domain
GR
PF00462 Glutaredoxin
Gra6
PF05084 Gra6 domain
Gran
PF00396 SM00277 Granulin
Grd
PF04155 ground domain
GST_N
PF02798 Glutathione S-transferase, N-terminal domain
GTPase
PF00025 PF00071 PF08477 SM00010 SM00173 SM00174 SM00175 SM00176 SM00177 SM00178 small GTPases (all subfamilies, Ras, Ran, Rac, etc...)
H2A
SM00414 histone H2A
H2B
SM00427 histone H2B
H3
SM00428 histone H3
H4
SM00417 histone H4
HA2
PF04408 SM00847 Helicase associated domain (HA2)
HEAT
PF02985 HEAT
Helic_C
PF00271 SM00490 DNA/RNA helicase, C-terminal
HintC
SM00305 Hint (Hedgehog/Intein) domain C-terminal region
HintN
PF01079 SM00306 Hint (Hedgehog/Intein) domain N-terminal region
HisPhos2
PF00328 Histidine phosphatase superfamily (branch 2)
Histone
PF00125 PF00538 SM00526 Histone fold
HMG
PF00505 SM00398 High mobility group box, HMG1/HMG2, subgroup
HOM
PF00046 SM00389 homeobox
HSP20
PF00011 HSP20-like chaperone
Hsp70
PF00012 Heat shock protein 70
HX
PF00045 SM00120 hemopexin
Hydrol
PF00702 haloacid dehalogenase-like hydrolase
I29
SM00848 Cathepsin propeptide inhibitor domain (I29)
IBR
PF01485 SM00647 In Between Ring fingers
IFP
PF00038 Intermediate filament protein
IG
PF07654 PF07679 PF07686 PF00047 PF08205 SM00406 SM00407 SM00408 SM00409 SM00410 immunoglobulin domain
Innexin
PF00876 Innexin
Insulin
PF03488 SM00078 Nematode insulin-related peptide, beta type
Ion_glu_rcpt
PF00060 Ionotropic glutamate receptor
Ion_tr
PF00520 four TM helices additionally present in many channels
IQ
PF00612 SM00015 Short calmodulin-binding motif containing conserved Ile and Gln residues
JmjC
PF02373 PF08007 SM00558 A domain family that is part of the cupin metalloenzyme superfamily
K_chan_2pore
PF07885 Potassium channel, two pore-domain
K_chan_volt
PF02214 Potassium channel, voltage dependent, Kv, tetramerisation
Kelch
PF01344 PF07646 SM00612 Kelch repeat type 1
KH
PF00013 SM00322 K Homology
KIN
PF00069 PF07714 SM00219 SM00220 SM00221 protein kinase
Kin_Ct
SM00133 AGC-kinase, C-terminal
Kinesin_motor
PF00225 SM00129 Kinesin, motor region
KR
PF00051 SM00130 kringle
KU
PF00014 SM00131 Kunitz domain/Proteinase inhibitor I2
KZ
PF00050 PF07648 SM00280 Proteinase inhibitor I1, Kazal
LA
PF00057 SM00192 Low density lipoprotein-receptor, class A domain
Lactam_B
PF00753 SM00849 Metallo-beta-lactamase superfamily
LamB
PF00052 SM00281 Laminin B domain
LE
PF00053 SM00180 laminin EGF-like domain
LG
PF00054 PF02210 SM00210 SM00282 laminin G domain
LGC-GluB
PF10613 SM00918 Ligated ion channel L-glutamate- and glycine-binding site
LIM
PF00412 SM00132 LIM domain
Lin-8
PF03353 Ras-mediated vulval-induction antagonist
Lip_2
PF01674 Lipase (class 2)
Lip_3
PF01764 Lipase (class 3)
Lip_G
PF00657 GDSL-like Lipase/Acylhydrolase
LisH
SM00667 Lissencephaly type-1-like homology motif
LITAF
SM00714 Possible membrane-associated motif in LPS-induced tumor necrosis factor alpha factor
LN
PF00055 SM00136 laminin, N-terminal domain
LRR
PF00560 SM00369 SM00370 SM00368 leucine-rich repeat
LRRc
PF01463 SM00082 leucine-rich repeat, C-terminal
LRRn
PF01462 SM00013 leucine-rich repeat, N-terminal
LTD
PF00932 Lamin Tail Domain
LY
PF00058 SM00135 Low-density lipoprotein receptor, YWTD repeat
LZ
PF04916 laminin A domain
MAM
PF00629 SM00137 MAM domain
MATH
PF00917 SM00061 MATH domain
MBOAT
PF03062 membrane-bound o-acyltransferase
MD
SM00604 MD domain (found in nematodes only so far ...)
MFP2b
PF12150 Cytosolic motility protein
MFS_1
PF07690 Major facilitator superfamily MFS-1
Mitoch_carrier
PF00153 Mitochondrial substrate carrier
MMR
PF01926 50S ribosome-binding GTPase
MP
SM00235 metalloprotease (covers all, should be replaced in display by the appropriate subfamily, e.g. PepM12)
MreB
PF06723 MreB/Mbl protein
MSP
PF00635 Major sperm protein
MSP4
PF07993 Male sterility protein
MT_11
PF08241 Methyltransferase domain
MT_12
PF08242 Methyltransferase domain
MTS
PF05175 Methyltransferase small domain
MYND
PF01753 MYND finger
Myosin_head
PF00063 SM00242 Myosin head, motor region
Myosin_tail
PF01576 Myosin tail
N1
PF06119 SM00539 nidogen, N-terminal domain
Na_Ca_ex
PF01699 sodium/calcium exchanger protein
Na_chan_AS
PF00858 Na+ channel, amiloride-sensitive
Na_diCO_symport
PF00375 Sodium:dicarboxylate symporter
Na_H_ex
PF00999 sodium/hydrogen exchanger family
NC10
PF06482 endostatin
Neuro_channel_lgbd
PF02931 Neurotransmitter-gated ion-channel
Neuro_channel_tm
PF02932 Neurotransmitter-gated ion-channel transmembrane region
NhrLig
PF00104 SM00430 Nuclear hormone receptor, ligand-binding
NRF
SM00703 N-terminal domain in C. elegans NRF-6 (Nose Resistant to Fluoxetine-4) and NDG-4 (resistant to nordihydroguaiaretic acid-4)
Nuclease
PF00929 Polynucleotidyl transferase, Ribonuclease H fold
NucST
PF04142 Nucleotide-sugar transporter
NUDIX
PF00293 NUDIX domain
P-Diest
PF01663 Type I phosphodiesterase / nucleotide pyrophosphatase
P_ATP_C
PF00689 Cation transporting ATPase, C-terminus
P_ATP_N
SM00831 Cation transporter/ATPase, N-terminus
P_Propr
PF01483 Proprotein convertase P-domain
PALP
PF00291 Pyridoxal-phosphate dependent enzyme
Patatin
PF01734 Patatin-like phospholipase
Patched
PF02460 Patched
PAW
PF04721 SM00613 Protein of unknown function PAW
PAX
PF00292 SM00351 Paired box protein, N-terminal
PAZ
PF02170 SM00949 Argonaute and Dicer protein, PAZ
PBPb
PF00497 SM00062 Bacterial periplasmic substrate-binding proteins
PBPe
SM00079 Eukaryotic homologues of bacterial periplasmic substrate binding proteins
PDZ
PF00595 SM00228 PDZ domain
PepC1
SM00645 Peptidase family C1
PepM1
PF01433 Peptidase family M1
PepM10
PF00413 matrixin and adamalysin; Peptidase M10A and M12B
PepM12
PF01400 astacin, peptidase M12A
PepM13
PF01431 neprilysin
PepM13_N
PF05649 Peptidase family M13
PepM14
PF00246 SM00631 Peptidase M14, carboxypeptidase A
PepM16
PF00675 Insulinase (Peptidase family M16)
PepM16_C
PF05193 Peptidase M16 inactive domain
PepS1
PF00089 SM00020 Peptidase S1 and S6, chymotrypsin/Hap; Peptidase, trypsin-like serine and cysteine
PepS10
PF00450 Peptidase family S10
PepS28
PF05577 Serine carboxypeptidase S28
PepS8
PF00082 Peptidase S8 and S53, subtilisin, kexin, sedolisin
PepS9
PF00326 Prolyl oligopeptidase family
Pfam
all other Pfam domains
PGAM
PF00300 SM00855 Phosphoglycerate mutase family
PH
PF00169 PF00568 PF00640 SM00233 SM00461 SM00462 Pleckstrin-homology domain
Phox
PF00787 SM00312 Phox-like
Phtase
PF00149 SM00156 phosphoprotein phosphatase
PI3K
PF00794 PF00454 SM00144 SM00146 Phosphatidylinositol 3-kinase
PINT
PF01399 SM00088 motif in proteasome subunits, Int-6, Nip-1 and TRIP-15
Piwi
PF02171 SM00950 Stem cell self-renewal protein Piwi
PKD
PF08016 Polycystin cation channel
PlsC
SM00563 Phosphate acyltransferases
Pmp3
PF01679 proteolipid membrane potential modulator
Polys2
PF02719 Polysaccharide biosynthesis protein
POU
PF00157 SM00352 POU domain
PP2Cc
PF00481 PF07228 SM00331 SM00332 1) Sigma factor PP2C-like phosphatases, 2) Serine/threonine phosphatases, family 2C, catalytic domain
Prefoldin
PF01920 Prefoldin
PrmA
PF06325 Ribosomal protein L11 methyltransferase (PrmA)
Pro_iso
PF00160 Cyclophilin type peptidyl-prolyl cis-trans isomerase/CLD
Protea
PF00227 Proteasome subunit
PTP
PF00102 PF00782 SM00012 SM00194 SM00195 SM00404 protein-tyrosine phosphatase
Ptre
PF00088 SM00018 P-type trefoil domain
PUF
PF00806 SM00025 Pumilio-like repeats
Pyr_deC
PF00282 Pyridoxal-dependent decarboxylase conserved domain
PyrR2
PF07992 Pyridine nucleotide-disulphide oxidoreductase
Ras_as
PF00788 SM00314 ras-association domain
RasGAP
PF00616 SM00323 ras GTPase activating protein
RcpL
PF01030 EGF receptor, L domain
Redox
PF08534 Redoxin
RGS
PF00615 SM00315 Regulator of G protein signalling
RHOD
PF00581 SM00450 Rhodanese Homology Domain
RhoGAP
PF00620 SM00324 RhoGAP domain
RhoGEF
PF00621 SM00325 RhoGEF domain
Rhomb
PF01694 Rhomboid protease
RICIN
SM00458 Ricin-type beta-trefoil
RmSB
PF04321 RmlD substrate binding domain
RRM
PF00076 SM00360 SM00361 SM00362 RNA recognition motif, RNP-1
S1
PF00575 SM00316 Ribosomal protein S1-like RNA-binding domain
SAM
PF00536 PF07647 SM00454 Sterile alpha motif homology
SANT
PF00249 SM00717 SANT SWI3, ADA2, N-CoR and TFIIIB'' DNA-binding domains
SapB
SM00741 Saposin B
SCPlike
PF00188 SCP-like extracellular
SEA
PF01390 SM00200 SEA domain
SEC14
PF00650 SM00516 Domain in homologues of a S. cerevisiae phosphatidylinositol transfer protein (Sec14p)
SERPIN
PF00079 SM00093 SERine Proteinase Inhibitors
SET
PF00856 SM00317 SET domain
SH2
PF00017 SM00252 SH2 domain
SH3
PF00018 SM00326 SH3 domain
ShK
PF01549 SM00254 Metridin-like ShK toxin
Skp1
SM00512 Found in Skp1 protein family
Sm
PF01423 SM00651 snRNP Sm proteins
SM
PF01437 SM00423 semaphorin/plexin domain
SMART
all other SMART domains
SNF
PF00209 sodium neurotransmitter symporter
SNF2_N
PF00176 SNF2-related
Snf7
PF03357 Snf7
SP
signal peptide
Spec
PF00435 SM00150 spectrin repeat
SPK
PF04435 SM00583 SPK is a domain of unknown function found in SET and PHD domain containing proteins
SPRY
PF00622 SM00449 Domain in SPla and the RYanodine Receptor.
SR
PF00530 SM00202 Speract/scavenger receptor
Ster-S
PF12349 Sterol-sensing domain of SREBP cleavage-activation
Sugar_tr
PF00083 sugar transporter
Sushi
PF00084 SM00032 Sushi/SCR/CCP domain
SynN
SM00503 Syntaxin N-terminal domain
T1
PF00090 SM00209 Thrombospondin, type I
T_Box
PF00907 SM00425 Transcription factor, T-box
TBC
PF00566 SM00164 Domain in Tre-2, BUB2p, and Cdc16p. Probable Rab-GAPs.
TCP1
PF00118 HSP60
Tetraspanin
PF00335 tetraspanin
TGFb
PF00019 SM00204 TGF beta domain
TIL
PF01826 prot. Inhibitor domain
TM
transmembrane domain
TPR
PF00515 PF07719 SM00028 Tetratricopeptide region
TPT
PF03151 Triose-phosphate Transporter family
Transthyr_rel
PF01060 Transthyretin-like
Tro_My
PF00261 Tropomyosin
tSNARE
SM00397 helical region found in SNAREs
Tubulin_Cterm
PF03953 SM00865 Tubulin/FtsZ, C-terminal
Tubulin_GTPase
PF00091 SM00864 Tubulin/FtsZ, GTPase
TY
PF00086 SM00211 Thyroglobulin type-1 domain
UAA
PF08449 UAA transporter family
Ub_MT
PF01209 ubiE/COQ5 methyltransferase family
UBA
PF00627 SM00165 Ubiquitin associated domain
UBCc
SM00212 Ubiquitin-conjugating enzyme E2, catalytic domain
Ubiq
PF00240 SM00213 Ubiquitin
UBQ_conjugat_E2
PF00179 Ubiquitin-conjugating enzyme/RWD-like
UCH
PF00443 Ubiquitin carboxyl-terminal hydrolase
UDP_gluc_trans
PF00201 UDP-glucuronosyl/UDP-glucosyltransferase
UL
PF02140 D-galactoside lectin
UNC-93
PF05978 Ion channel regulatory protein UNC-93
V5_TPX
SM00198 Allergen V5/Tpx-1 related
VA
PF00092 SM00327 von Willebrand factor, type A domain
VC
PF00093 SM00214 SM00215 von Willebrand factor, type C domain
VD
PF00094 SM00216 von Willebrand factor, type D domain
Vincul
PF01044 Vinculin/alpha-catenin
VOMI
PF03762 Vitelline membrane outer layer protein I (VOMI)
WA
PF00095 SM00217 Whey acidic protein, 4-disulphide core
WD40
PF00400 SM00320 WD40/YVTN repeat
Wilms
PF02165 Wilm's tumour protein
WSN
PF02206 SM00453 WSN domain (found in nematodes only so far ...)
WW
PF00397 SM00456 WW/Rsp5/WWP
Zip
PF02535 zinc transporter
Znf
PF01428 PF01529 PF02135 PF01363 PF00643 PF00569 PF00642 PF00320 PF00105 PF00641 PF00098 PF00628 PF00096 SM00064 SM00154 SM00184 SM00249 SM00251 SM00291 SM00336 SM00343 SM0352 SM00355 SM00356 SM00399 SM00401 SM00451 SM00547 SM00551 SM00744 zinc finger
ZU5
PF00791 SM00218 ZU5 domain

Last updated on Oct 14, 2024

Frequently Asked Questions

I hope you can find an answer to your question here. If not - after you have made a serious attempt to figure it out by yourself - you can send an email to H. Hutter.

General Questions
Where can I find information on new features in GExplore?

To learn about the latest features and updates in each GExplore release, visit the What's new? section on our Help page.

Can I upload my favourite set of genes for analysis?

No, but you can paste even a large list of genes (I tested up to 5000) into the Gene Name search field. Simply generate a comma-delimited list of your favourite genes and paste them into the Gene Name search field.

Can I save the search results from one search (and use them as a starting point for additional searches)?

You can export the results as CSV or TXT files and save them for further processing.
However, The result of a search essentially is a list of genes fullfilling the search condition(s). This list of genes is always on the output page (in the search results sidebar). You can copy it from there and save it on your computer - or paste it into the Gene Name search field for combinatorial searches.

The table of abbreviations doesn't cover my favourite domain. How can I search for proteins containing this domain?

Find the SMART or Pfam identifier of your domain of interest and use it in the Domain search field.

The table of abbreviations collapses all the zinc-fingers (kinases, GTPases, etc). How can I retrieve only some zinc-finger proteins (kinases, GTPases)?

If you care about certain subgroups, you can always use the corresponding SMART or Pfam identifiers in your search. The domain will still be displayed as generic 'Znf' ('KIN', 'GTPase'). There are two reasons for 'collapsing' sometimes large numbers of interpro domains into a single abbreviation. First, this interface is aimed at the 'big picture' (actively ignoring subtle subdivisions) and second, SMART or Pfam domains sometimes do not have the discriminatory power to distinguish subgroups (EGF and IG are typical examples). However, currently some groups of domains might be collapsed too generously. If you are an expert in the field and can suggest a subdivision, please contact Dr. Harald Hutter.

I'm trying to get all transmembrane proteins, but a lot of proteins I got seem to have only a signal peptide (SP), but no transmembrane domain?

Try TM in the Domain Pattern field. Explanation: Signal peptides have a hydrophobic core, which triggers a hit with algorithms looking for hydrophobic stretches of amino acids to predict transmembrane domains. While searches in the Protein Domain field operate on the 'raw' domain annotations, searches in the Domain Pattern field operate on the domain pattern as you see it in the ouput, i.e. after the 'display rules' have been applied - and one of them suppresses display of transmembrane domains overlapping signal peptides.

Can I search for proteins that do not have any known domain?

Yes, all proteins not having any SMART or Pfam annotation are tagged with 'no_domain'. You can enter this in the Protein Domain search field to retrieve them. Note that signal peptides and transmembrane domains are not considered 'domains' here. If you want proteins that have neither signal sequene, nor transmembrane domain nor any known domain, use 'no_domain not (SP or TM)' in the Protein Domain search field (there are surprisignly few). Note also that you cannot use 'no_domain' as part of a search pattern in the Domain Pattern field. However, since entries in the Protein Domain search field and the Domain Pattern field and effectively combined as logical 'AND' search, you can get all single pass transmembrane domain proteins with no known protein domains by searching for 'no_domain' in the Protein Domain field and 'TM and no more TM' in the Domain Pattern field.

Text Searches and Boolean Logic in Searches
I tried something like body wall muscle AND neuron in the expression search field and got an error message. Why?

The short answer is, you cannot use multi-word search terms, because SPACE is used to separate search terms. You will not get an error message from just using 'body wall muscle', because it is translated into 'body OR wall OR muscle' - probably not what you wanted. As a work-around for the example above, you could use the following search strategy: search for body AND wall AND muscle, copy the list of genes from the output page, paste it into the Gene Name field of a new search page and search for neuron.

How does the autosuggestion feature work, and how can I navigate through the suggestions?

The autosuggestion feature helps you find the correct terms quickly by providing a list of suggestions as you type in input fields such as Protein Domains or Gene Ontology Term. You can navigate through the suggestions using the Up and Down arrow keys. To select a suggestion, press the Enter key. If you want to dismiss the suggestions list, press the Escape key. If the term you enter does not match any available suggestions, the message "No suggestions found" will be displayed. The function recognizes the , separator and will offer suggestions again for a second search term after you entered the separator. Note that the separator for the GO term field is;, since GO terms can contains commas.

Result Page Interface Features
Can I hide the sidebar to get more space for the results table?

Yes, you can hide the sidebar on the results page to get more space for the table. When the sidebar is open, a vertical line appears next to the active sidebar icon. To collapse the sidebar, simply click on the active icon. This will provide a more expansive view of the results table. If you click on a different sidebar icon while the sidebar is open, it will switch the content to reflect the new sidebar option you selected.

Sidebar Feature
Example of Sidebar Collapsing
The results page says: 'click on table headers to sort' - but nothing happens when I click?

There are three possible reasons: 1. only certain columns can be used for sorting (e.g. the domain organization column does not sort). 2. The output has more than 1000 genes, in which case this functionality is not available (it would be so slow, you wouldn't want to wait). 3. This requires that javascripts are enabled in your browser. Check your browser manual in this case to activate javascript.

Can I turn off this row-drag function? (it prevents me from copying text out of text columns)

Turning off the domain display (domain organization column) will turn off the row-drag functionality. It's main purpose is to allow you to drag proteins around to maybe group those with similar domain organization.

These checkboxes next to genes names on the output page: what exactly can I do with them ?

If you uncheck a gene and click 'Update Display' the unchecked gene will be removed from the display. You an use this to manually fine-tune the output and maybe eliminate false positives from a domain search or certain genes you are not interested in before continuing with the analysis. You can use the 'Back' button of your browser to 'undo' this quickly.

Last updated on Oct 14, 2024