Skip to content

Gff Class

Gff

Gff(gff_file: str | Path, *, name: str | None = None, target_seqid: str | None = None)

GFF Parser Class

PARAMETER DESCRIPTION
gff_file

GFF file (*.gz, *.bz2, *.zip compressed file can be readable)

TYPE: str | Path

name

name (If None, file name is set)

TYPE: str | None DEFAULT: None

target_seqid

Target seqid to be extracted. If None, only first seqid record is extracted.

TYPE: str | None DEFAULT: None

name property

name: str

Name

seq_region property

seq_region: tuple[int, int]

GFF sequence-region start & end tuple

If ##sequence-region pragma is not found, seq_region=(0, max_coords_value)

records property

records: list[GffRecord]

GFF records (only target seqid)

all_records property

all_records: list[GffRecord]

All GFF records

target_seqid property

target_seqid: str

Target seqid

seqid_list property

seqid_list: list[str]

seqid list

genome_length property

genome_length: int

Genome length (target seqid record)

full_genome_length property

full_genome_length: int

Full genome length (concatenate all records)

get_seqid2size

get_seqid2size() -> dict[str, int]

Get seqid & complete/contig/scaffold genome size dict

By default, size is defined by ##sequence-region pragma of target seqid. If ##sequence-region is not found, size is defined by max coordinate size in target seqid features. This may differ from actual genome size.

RETURNS DESCRIPTION
seqid2size

seqid & genome size dict

TYPE: dict[str, int]

get_seqid2features

get_seqid2features(
    feature_type: str | list[str] | None = "CDS", target_strand: int | None = None
) -> dict[str, list[SeqFeature]]

Get seqid & features in target seqid genome dict

PARAMETER DESCRIPTION
feature_type

Feature type (CDS, gene, mRNA, etc...) If None, extract regardless of feature type.

TYPE: str | list[str] | None DEFAULT: 'CDS'

target_strand

Extract target strand. If None, extract regardless of strand.

TYPE: int | None DEFAULT: None

RETURNS DESCRIPTION
seqid2features

seqid & features dict

TYPE: dict[str, list[SeqFeature]]

extract_features

extract_features(
    feature_type: str | list[str] | None = "CDS",
    *,
    target_strand: int | None = None,
    target_range: tuple[int, int] | None = None
) -> list[SeqFeature]

Extract features

If target_seqid is specified when the Gff instance initialized, then the features of the target seqid are extracted. Otherwise, extract the features of the seqid in the first row.

PARAMETER DESCRIPTION
feature_type

Feature type (CDS, gene, mRNA, etc...) If None, extract regardless of feature type.

TYPE: str | list[str] | None DEFAULT: 'CDS'

target_strand

Extract target strand. If None, extract regardless of strand.

TYPE: int | None DEFAULT: None

target_range

Extract target range. If None, extract regardless of range.

TYPE: tuple[int, int] | None DEFAULT: None

RETURNS DESCRIPTION
features

Feature list

TYPE: list[SeqFeature]

extract_exon_features

extract_exon_features(
    feature_type: str = "mRNA",
    *,
    target_strand: int | None = None,
    target_range: tuple[int, int] | None = None
) -> list[SeqFeature]

Extract exon structure features

Extract exons based on parent feature and exon ID-Parent relation

PARAMETER DESCRIPTION
feature_type

Feature type (e.g. mRNA, ncRNA , etc...)

TYPE: str DEFAULT: 'mRNA'

target_strand

Extract target strand. If None, extract regardless of strand.

TYPE: int | None DEFAULT: None

target_range

Extract target range. If None, extract regardless of range.

TYPE: tuple[int, int] | None DEFAULT: None

RETURNS DESCRIPTION
features

Feature list

TYPE: list[SeqFeature]