gget Overview

gget is a command-line bioinformatics tool and Python package providing unified access to 20+ genomic databases and analysis methods. Query gene information, sequence analysis, protein structures, expression data, and disease associations through a consistent interface. All gget modules work both as command-line tools and as Python functions.

Important: The databases queried by gget are continuously updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary.

Installation

Install gget in a clean virtual environment to avoid conflicts:

Using uv (recommended)

uv uv pip install gget

Or using pip

uv pip install --upgrade gget

In Python/Jupyter

import gget

Quick Start

Basic usage pattern for all modules:

Command-line

gget [arguments] [options]

Python

gget.module(arguments, options)

Most modules return:

Command-line: JSON (default) or CSV with -csv flag Python: DataFrame or dictionary

Common flags across modules:

-o/--out: Save results to file -q/--quiet: Suppress progress information -csv: Return CSV format (command-line only) Module Categories 1. Reference & Gene Information gget ref - Reference Genome Downloads

Retrieve download links and metadata for Ensembl reference genomes.

Parameters:

species: Genus_species format (e.g., 'homo_sapiens', 'mus_musculus'). Shortcuts: 'human', 'mouse' -w/--which: Specify return types (gtf, cdna, dna, cds, cdrna, pep). Default: all -r/--release: Ensembl release number (default: latest) -l/--list_species: List available vertebrate species -liv/--list_iv_species: List available invertebrate species -ftp: Return only FTP links -d/--download: Download files (requires curl)

Examples:

List available species

gget ref --list_species

Get all reference files for human

gget ref homo_sapiens

Download only GTF annotation for mouse

gget ref -w gtf -d mouse

Python

gget.ref("homo_sapiens") gget.ref("mus_musculus", which="gtf", download=True)

gget search - Gene Search

Locate genes by name or description across species.

Parameters:

searchwords: One or more search terms (case-insensitive) -s/--species: Target species (e.g., 'homo_sapiens', 'mouse') -r/--release: Ensembl release number -t/--id_type: Return 'gene' (default) or 'transcript' -ao/--andor: 'or' (default) finds ANY searchword; 'and' requires ALL -l/--limit: Maximum results to return

Returns: ensembl_id, gene_name, ensembl_description, ext_ref_description, biotype, URL

Examples:

gget search -s human gaba gamma-aminobutyric

Find specific gene, require all terms

gget search -s mouse -ao and pax7 transcription

Python

gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens")

gget info - Gene/Transcript Information

Retrieve comprehensive gene and transcript metadata from Ensembl, UniProt, and NCBI.

Parameters:

ens_ids: One or more Ensembl IDs (also supports WormBase, Flybase IDs). Limit: ~1000 IDs -n/--ncbi: Disable NCBI data retrieval -u/--uniprot: Disable UniProt data retrieval -pdb: Include PDB identifiers (increases runtime)

Returns: UniProt ID, NCBI gene ID, primary gene name, synonyms, protein names, descriptions, biotype, canonical transcript

Examples:

Get info for multiple genes

gget info ENSG00000034713 ENSG00000104853 ENSG00000170296

Include PDB IDs

gget info ENSG00000034713 -pdb

Python

gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True)

gget seq - Sequence Retrieval

Fetch nucleotide or amino acid sequences for genes and transcripts.

Parameters:

ens_ids: One or more Ensembl identifiers -t/--translate: Fetch amino acid sequences instead of nucleotide -iso/--isoforms: Return all transcript variants (gene IDs only)

Returns: FASTA format sequences

Examples:

Get nucleotide sequences

gget seq ENSG00000034713 ENSG00000104853

Get all protein isoforms

gget seq -t -iso ENSG00000034713

Python

gget.seq(["ENSG00000034713"], translate=True, isoforms=True)

Sequence Analysis & Alignment gget blast - BLAST Searches

BLAST nucleotide or amino acid sequences against standard databases.

Parameters:

sequence: Sequence string or path to FASTA/.txt file -p/--program: blastn, blastp, blastx, tblastn, tblastx (auto-detected) -db/--database: Nucleotide: nt, refseq_rna, pdbnt Protein: nr, swissprot, pdbaa, refseq_protein -l/--limit: Max hits (default: 50) -e/--expect: E-value cutoff (default: 10.0) -lcf/--low_comp_filt: Enable low complexity filtering -mbo/--megablast_off: Disable MegaBLAST (blastn only)

Examples:

BLAST protein sequence

gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

BLAST from file with specific database

gget blast sequence.fasta -db swissprot -l 10

Python

gget.blast("MKWMFK...", database="swissprot", limit=10)

gget blat - BLAT Searches

Locate genomic positions of sequences using UCSC BLAT.

Parameters:

sequence: Sequence string or path to FASTA/.txt file -st/--seqtype: 'DNA', 'protein', 'translated%20RNA', 'translated%20DNA' (auto-detected) -a/--assembly: Target assembly (default: 'human'/hg38; options: 'mouse'/mm39, 'zebrafinch'/taeGut2, etc.)

Returns: genome, query size, alignment positions, matches, mismatches, alignment percentage

Examples:

Find genomic location in human

gget blat ATCGATCGATCGATCG

Search in different assembly

gget blat -a mm39 ATCGATCGATCGATCG

Python

gget.blat("ATCGATCGATCGATCG", assembly="mouse")

gget muscle - Multiple Sequence Alignment

Align multiple nucleotide or amino acid sequences using Muscle5.

Parameters:

fasta: Sequences or path to FASTA/.txt file -s5/--super5: Use Super5 algorithm for faster processing (large datasets)

Returns: Aligned sequences in ClustalW format or aligned FASTA (.afa)

Examples:

Align sequences from file

gget muscle sequences.fasta -o aligned.afa

Use Super5 for large dataset

gget muscle large_dataset.fasta -s5

Python

gget.muscle("sequences.fasta", save=True)

gget diamond - Local Sequence Alignment

Perform fast local protein or translated DNA alignment using DIAMOND.

Parameters:

Query: Sequences (string/list) or FASTA file path --reference: Reference sequences (string/list) or FASTA file path (required) --sensitivity: fast, mid-sensitive, sensitive, more-sensitive, very-sensitive (default), ultra-sensitive --threads: CPU threads (default: 1) --diamond_db: Save database for reuse --translated: Enable nucleotide-to-amino acid alignment

Returns: Identity percentage, sequence lengths, match positions, gap openings, E-values, bit scores

Examples:

Align against reference

gget diamond GGETISAWESQME -ref reference.fasta --threads 4

Save database for reuse

gget diamond query.fasta -ref ref.fasta --diamond_db my_db.dmnd

Python

gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4)

Structural & Protein Analysis gget pdb - Protein Structures

Query RCSB Protein Data Bank for structure and metadata.

Parameters:

pdb_id: PDB identifier (e.g., '7S7U') -r/--resource: Data type (pdb, entry, pubmed, assembly, entity types) -i/--identifier: Assembly, entity, or chain ID

Returns: PDB format (structures) or JSON (metadata)

Examples:

Download PDB structure

gget pdb 7S7U -o 7S7U.pdb

Get metadata

gget pdb 7S7U -r entry

Python

gget.pdb("7S7U", save=True)

gget alphafold - Protein Structure Prediction

Predict 3D protein structures using simplified AlphaFold2.

Setup Required:

Install OpenMM first

uv pip install openmm

Then setup AlphaFold

gget setup alphafold

Parameters:

sequence: Amino acid sequence (string), multiple sequences (list), or FASTA file. Multiple sequences trigger multimer modeling -mr/--multimer_recycles: Recycling iterations (default: 3; recommend 20 for accuracy) -mfm/--multimer_for_monomer: Apply multimer model to single proteins -r/--relax: AMBER relaxation for top-ranked model plot: Python-only; generate interactive 3D visualization (default: True) show_sidechains: Python-only; include side chains (default: True)

Returns: PDB structure file, JSON alignment error data, optional 3D visualization

Examples:

Predict single protein structure

gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

Predict multimer with higher accuracy

gget alphafold sequence1.fasta -mr 20 -r

Python with visualization

gget.alphafold("MKWMFK...", plot=True, show_sidechains=True)

Multimer prediction

gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20)

gget elm - Eukaryotic Linear Motifs

Predict Eukaryotic Linear Motifs in protein sequences.

Setup Required:

gget setup elm

Parameters:

sequence: Amino acid sequence or UniProt Acc -u/--uniprot: Indicates sequence is UniProt Acc -e/--expand: Include protein names, organisms, references -s/--sensitivity: DIAMOND alignment sensitivity (default: "very-sensitive") -t/--threads: Number of threads (default: 1)

Returns: Two outputs:

ortholog_df: Linear motifs from orthologous proteins regex_df: Motifs directly matched in input sequence

Examples:

Predict motifs from sequence

gget elm LIAQSIGQASFV -o results

Use UniProt accession with expanded info

gget elm --uniprot Q02410 -e

Python

ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")

Expression & Disease Data gget archs4 - Gene Correlation & Tissue Expression

Query ARCHS4 database for correlated genes or tissue expression data.

Parameters:

gene: Gene symbol or Ensembl ID (with --ensembl flag) -w/--which: 'correlation' (default, returns 100 most correlated genes) or 'tissue' (expression atlas) -s/--species: 'human' (default) or 'mouse' (tissue data only) -e/--ensembl: Input is Ensembl ID

Returns:

Correlation mode: Gene symbols, Pearson correlation coefficients Tissue mode: Tissue identifiers, min/Q1/median/Q3/max expression values

Examples:

Get correlated genes

gget archs4 ACE2

Get tissue expression

gget archs4 -w tissue ACE2

Python

gget.archs4("ACE2", which="tissue")

gget cellxgene - Single-Cell RNA-seq Data

Query CZ CELLxGENE Discover Census for single-cell data.

Setup Required:

gget setup cellxgene

Parameters:

--gene (-g): Gene names or Ensembl IDs (case-sensitive! 'PAX7' for human, 'Pax7' for mouse) --tissue: Tissue type(s) --cell_type: Specific cell type(s) --species (-s): 'homo_sapiens' (default) or 'mus_musculus' --census_version (-cv): Version ("stable", "latest", or dated) --ensembl (-e): Use Ensembl IDs --meta_only (-mo): Return metadata only Additional filters: disease, development_stage, sex, assay, dataset_id, donor_id, ethnicity, suspension_type

Returns: AnnData object with count matrices and metadata (or metadata-only dataframes)

Examples:

Get single-cell data for specific genes and cell types

gget cellxgene --gene ACE2 ABCA1 --tissue lung --cell_type "mucus secreting cell" -o lung_data.h5ad

Metadata only

gget cellxgene --gene PAX7 --tissue muscle --meta_only -o metadata.csv

Python

adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell")

gget enrichr - Enrichment Analysis

Perform ontology enrichment analysis on gene lists using Enrichr.

Parameters:

genes: Gene symbols or Ensembl IDs -db/--database: Reference database (supports shortcuts: 'pathway', 'transcription', 'ontology', 'diseases_drugs', 'celltypes') -s/--species: human (default), mouse, fly, yeast, worm, fish -bkg_l/--background_list: Background genes for comparison -ko/--kegg_out: Save KEGG pathway images with highlighted genes plot: Python-only; generate graphical results

Database Shortcuts:

'pathway' → KEGG_2021_Human 'transcription' → ChEA_2016 'ontology' → GO_Biological_Process_2021 'diseases_drugs' → GWAS_Catalog_2019 'celltypes' → PanglaoDB_Augmented_2021

Examples:

Enrichment analysis for ontology

gget enrichr -db ontology ACE2 AGT AGTR1

Save KEGG pathways

gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/

Python with plot

gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)

gget bgee - Orthology & Expression

Retrieve orthology and gene expression data from Bgee database.

Parameters:

ens_id: Ensembl gene ID or NCBI gene ID (for non-Ensembl species). Multiple IDs supported when type=expression -t/--type: 'orthologs' (default) or 'expression'

Returns:

Orthologs mode: Matching genes across species with IDs, names, taxonomic info Expression mode: Anatomical entities, confidence scores, expression status

Examples:

Get orthologs

gget bgee ENSG00000169194

Get expression data

gget bgee ENSG00000169194 -t expression

Multiple genes

gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression

Python

gget.bgee("ENSG00000169194", type="orthologs")

gget opentargets - Disease & Drug Associations

Retrieve disease and drug associations from OpenTargets.

Parameters:

Ensembl gene ID (required) -r/--resource: diseases (default), drugs, tractability, pharmacogenetics, expression, depmap, interactions -l/--limit: Cap results count Filter arguments (vary by resource): drugs: --filter_disease pharmacogenetics: --filter_drug expression/depmap: --filter_tissue, --filter_anat_sys, --filter_organ interactions: --filter_protein_a, --filter_protein_b, --filter_gene_b

Examples:

Get associated diseases

gget opentargets ENSG00000169194 -r diseases -l 5

Get associated drugs

gget opentargets ENSG00000169194 -r drugs -l 10

Get tissue expression

gget opentargets ENSG00000169194 -r expression --filter_tissue brain

Python

gget.opentargets("ENSG00000169194", resource="diseases", limit=5)

gget cbio - cBioPortal Cancer Genomics

Plot cancer genomics heatmaps using cBioPortal data.

Two subcommands:

search - Find study IDs:

gget cbio search breast lung

plot - Generate heatmaps:

Parameters:

-s/--study_ids: Space-separated cBioPortal study IDs (required) -g/--genes: Space-separated gene names or Ensembl IDs (required) -st/--stratification: Column to organize data (tissue, cancer_type, cancer_type_detailed, study_id, sample) -vt/--variation_type: Data type (mutation_occurrences, cna_nonbinary, sv_occurrences, cna_occurrences, Consequence) -f/--filter: Filter by column value (e.g., 'study_id:msk_impact_2017') -dd/--data_dir: Cache directory (default: ./gget_cbio_cache) -fd/--figure_dir: Output directory (default: ./gget_cbio_figures) -dpi: Resolution (default: 100) -sh/--show: Display plot in window -nc/--no_confirm: Skip download confirmations

Examples:

Search for studies

gget cbio search esophag ovary

Create heatmap

gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences

Python

gget.cbio_search(["esophag", "ovary"]) gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue")

gget cosmic - COSMIC Database

Search COSMIC (Catalogue Of Somatic Mutations In Cancer) database.

Important: License fees apply for commercial use. Requires COSMIC account credentials.

Parameters:

searchterm: Gene name, Ensembl ID, mutation notation, or sample ID -ctp/--cosmic_tsv_path: Path to downloaded COSMIC TSV file (required for querying) -l/--limit: Maximum results (default: 100)

Database download flags:

-d/--download_cosmic: Activate download mode -gm/--gget_mutate: Create version for gget mutate -cp/--cosmic_project: Database type (cancer, census, cell_line, resistance, genome_screen, targeted_screen) -cv/--cosmic_version: COSMIC version -gv/--grch_version: Human reference genome (37 or 38) --email, --password: COSMIC credentials

Examples:

First download database

gget cosmic -d --email user@example.com --password xxx -cp cancer

Then query

gget cosmic EGFR -ctp cosmic_data.tsv -l 10

Python

gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10)

Additional Tools gget mutate - Generate Mutated Sequences

Generate mutated nucleotide sequences from mutation annotations.

Parameters:

sequences: FASTA file path or direct sequence input (string/list) -m/--mutations: CSV/TSV file or DataFrame with mutation data (required) -mc/--mut_column: Mutation column name (default: 'mutation') -sic/--seq_id_column: Sequence ID column (default: 'seq_ID') -mic/--mut_id_column: Mutation ID column -k/--k: Length of flanking sequences (default: 30 nucleotides)

Returns: Mutated sequences in FASTA format

Examples:

Single mutation

gget mutate ATCGCTAAGCT -m "c.4G>T"

Multiple sequences with mutations from file

gget mutate sequences.fasta -m mutations.csv -o mutated.fasta

Python

import pandas as pd mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]}) gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df)

gget gpt - OpenAI Text Generation

Generate natural language text using OpenAI's API.

Setup Required:

gget setup gpt

Important: Free tier limited to 3 months after account creation. Set monthly billing limits.

Parameters:

prompt: Text input for generation (required) api_key: OpenAI authentication (required) Model configuration: temperature, top_p, max_tokens, frequency_penalty, presence_penalty Default model: gpt-3.5-turbo (configurable)

Examples:

gget gpt "Explain CRISPR" --api_key your_key_here

Python

gget.gpt("Explain CRISPR", api_key="your_key_here")

gget setup - Install Dependencies

Install/download third-party dependencies for specific modules.

Parameters:

module: Module name requiring dependency installation -o/--out: Output folder path (elm module only)

Modules requiring setup:

alphafold - Downloads ~4GB of model parameters cellxgene - Installs cellxgene-census (may not support latest Python) elm - Downloads local ELM database gpt - Configures OpenAI integration

Examples:

Setup AlphaFold

gget setup alphafold

Setup ELM with custom directory

gget setup elm -o /path/to/elm_data

Python

gget.setup("alphafold")

Common Workflows Workflow 1: Gene Discovery to Sequence Analysis

Find and analyze genes of interest:

1. Search for genes

results = gget.search(["GABA", "receptor"], species="homo_sapiens")

2. Get detailed information

gene_ids = results["ensembl_id"].tolist() info = gget.info(gene_ids[:5])

3. Retrieve sequences

sequences = gget.seq(gene_ids[:5], translate=True)

Workflow 2: Sequence Alignment and Structure

Align sequences and predict structures:

1. Align multiple sequences

alignment = gget.muscle("sequences.fasta")

2. Find similar sequences

blast_results = gget.blast(my_sequence, database="swissprot", limit=10)

3. Predict structure

structure = gget.alphafold(my_sequence, plot=True)

4. Find linear motifs

ortholog_df, regex_df = gget.elm(my_sequence)

Workflow 3: Gene Expression and Enrichment

Analyze expression patterns and functional enrichment:

1. Get tissue expression

tissue_expr = gget.archs4("ACE2", which="tissue")

2. Find correlated genes

correlated = gget.archs4("ACE2", which="correlation")

3. Get single-cell data

adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell")

4. Perform enrichment analysis

gene_list = correlated["gene_symbol"].tolist()[:50] enrichment = gget.enrichr(gene_list, database="ontology", plot=True)

Workflow 4: Disease and Drug Analysis

Investigate disease associations and therapeutic targets:

1. Search for genes

genes = gget.search(["breast cancer"], species="homo_sapiens")

2. Get disease associations

diseases = gget.opentargets("ENSG00000169194", resource="diseases")

3. Get drug associations

drugs = gget.opentargets("ENSG00000169194", resource="drugs")

4. Query cancer genomics data

study_ids = gget.cbio_search(["breast"]) gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type")

5. Search COSMIC for mutations

cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv")

Workflow 5: Comparative Genomics

Compare proteins across species:

1. Get orthologs

orthologs = gget.bgee("ENSG00000169194", type="orthologs")

2. Get sequences for comparison

human_seq = gget.seq("ENSG00000169194", translate=True) mouse_seq = gget.seq("ENSMUSG00000026091", translate=True)

3. Align sequences

alignment = gget.muscle([human_seq, mouse_seq])

4. Compare structures

human_structure = gget.pdb("7S7U") mouse_structure = gget.alphafold(mouse_seq)

Workflow 6: Building Reference Indices

Prepare reference data for downstream analysis (e.g., kallisto|bustools):

1. List available species

gget ref --list_species

2. Download reference files

gget ref -w gtf -w cdna -d homo_sapiens

3. Build kallisto index

kallisto index -i transcriptome.idx transcriptome.fasta

4. Download genome for alignment

gget ref -w dna -d homo_sapiens

Best Practices Data Retrieval Use --limit to control result sizes for large queries Save results with -o/--out for reproducibility Check database versions/releases for consistency across analyses Use --quiet in production scripts to reduce output Sequence Analysis For BLAST/BLAT, start with default parameters, then adjust sensitivity Use gget diamond with --threads for faster local alignment Save DIAMOND databases with --diamond_db for repeated queries For multiple sequence alignment, use -s5/--super5 for large datasets Expression and Disease Data Gene symbols are case-sensitive in cellxgene (e.g., 'PAX7' vs 'Pax7') Run gget setup before first use of alphafold, cellxgene, elm, gpt For enrichment analysis, use database shortcuts for convenience Cache cBioPortal data with -dd to avoid repeated downloads Structure Prediction AlphaFold multimer predictions: use -mr 20 for higher accuracy Use -r flag for AMBER relaxation of final structures Visualize results in Python with plot=True Check PDB database first before running AlphaFold predictions Error Handling Database structures change; update gget regularly: uv pip install --upgrade gget Process max ~1000 Ensembl IDs at once with gget info For large-scale analyses, implement rate limiting for API queries Use virtual environments to avoid dependency conflicts Output Formats Command-line Default: JSON CSV: Add -csv flag FASTA: gget seq, gget mutate PDB: gget pdb, gget alphafold PNG: gget cbio plot Python Default: DataFrame or dictionary JSON: Add json=True parameter Save to file: Add save=True or specify out="filename" AnnData: gget cellxgene Resources

This skill includes reference documentation for detailed module information:

references/ module_reference.md - Comprehensive parameter reference for all modules database_info.md - Information about queried databases and their update frequencies workflows.md - Extended workflow examples and use cases

For additional help:

Official documentation: https://pachterlab.github.io/gget/ GitHub issues: https://github.com/pachterlab/gget/issues Citation: Luebbert, L. & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836

安装

Using uv (recommended)

Or using pip

In Python/Jupyter

Command-line

Python

List available species

Get all reference files for human

Download only GTF annotation for mouse

Python

Search for GABA-related genes in human

Find specific gene, require all terms

Python

Get info for multiple genes

Include PDB IDs

Python

Get nucleotide sequences

Get all protein isoforms

Python

BLAST protein sequence

BLAST from file with specific database

Python

Find genomic location in human

Search in different assembly

Python

Align sequences from file

Use Super5 for large dataset

Python

Align against reference

Save database for reuse

Python

Download PDB structure

Get metadata

Python

Install OpenMM first

Then setup AlphaFold

Predict single protein structure

Predict multimer with higher accuracy

Python with visualization

Multimer prediction

Predict motifs from sequence

Use UniProt accession with expanded info

Python

Get correlated genes

Get tissue expression

Python

Get single-cell data for specific genes and cell types

Metadata only

Python

Enrichment analysis for ontology

Save KEGG pathways

Python with plot

Get orthologs

Get expression data

Multiple genes

Python

Get associated diseases

Get associated drugs

Get tissue expression

Python

Search for studies

Create heatmap

Python

First download database

Then query

Python

Single mutation

Multiple sequences with mutations from file

Python

Python

Setup AlphaFold

Setup ELM with custom directory

Python

1. Search for genes

2. Get detailed information

3. Retrieve sequences

1. Align multiple sequences

2. Find similar sequences

3. Predict structure

4. Find linear motifs