Skip to content

Gff Class

Gff

Gff(
    gff_file: str | Path,
    name: str | None = None,
    target_seqid: str | None = None,
    min_range: int | None = None,
    max_range: int | None = None,
)

GFF Parser Class

PARAMETER DESCRIPTION
gff_file

GFF file (*.gz, *.bz2, *.zip compressed file can be readable)

TYPE: str | Path

name

name (If None, file name is set)

TYPE: str | None DEFAULT: None

target_seqid

Target seqid to be extracted. If None, only first seqid record is extracted.

TYPE: str | None DEFAULT: None

min_range

Min range to be extracted. If None, appropriate value is taken from GFF records.

TYPE: int | None DEFAULT: None

max_range

Max range to be extracted. If None, appropriate value is taken from GFF records.

TYPE: int | None DEFAULT: None

name property

name: str

Name

seq_region property

seq_region: tuple[int, int]

GFF sequence-region start & end tuple

If ##sequence-region pragma is not found, seq_region=(0, max_coords_value)

records property

records: list[GffRecord]

GFF records (target seqid only)

all_records property

all_records: list[GffRecord]

All GFF records

records_within_range property

records_within_range: list[GffRecord]

GFF records within min-max range

range_size property

range_size: int

Range size (max_range - min_range)

target_seqid property

target_seqid: str

Target seqid

seqid_list property

seqid_list: list[str]

seqid list

get_seqid2size

get_seqid2size() -> dict[str, int]

Get seqid & complete/contig/scaffold genome size dict

By default, size is defined by ##sequence-region pragma of target seqid. If ##sequence-region is not found, size is defined by max coordinate size in target seqid features. This may differ from actual genome size.

RETURNS DESCRIPTION
seqid2size

seqid & genome size dict

TYPE: dict[str, int]

get_seqid2features

get_seqid2features(
    feature_type: str | None = "CDS",
    target_strand: int | None = None,
    pseudogene: bool | None = False,
) -> dict[str, list[SeqFeature]]

Get seqid & features in target seqid genome dict

PARAMETER DESCRIPTION
feature_type

Feature type (CDS, gene, mRNA, etc...) If None, extract regardless of feature type.

TYPE: str | None DEFAULT: 'CDS'

target_strand

Extract target strand. If None, extract regardless of strand.

TYPE: int | None DEFAULT: None

pseudogene

If True, pseudo=, pseudogene= tagged record only extract. If False, pseudo=, pseudogene= not tagged record only extract. If None, extract regardless of pseudogene tag.

TYPE: bool | None DEFAULT: False

RETURNS DESCRIPTION
seqid2features

seqid & features dict

TYPE: dict[str, list[SeqFeature]]

extract_features

extract_features(
    feature_type: str | None = "CDS",
    target_strand: int | None = None,
    pseudogene: bool | None = False,
) -> list[SeqFeature]

Extract features within min-max range

PARAMETER DESCRIPTION
feature_type

Feature type (CDS, gene, mRNA, etc...) If None, extract regardless of feature type.

TYPE: str | None DEFAULT: 'CDS'

target_strand

Extract target strand. If None, extract regardless of strand.

TYPE: int | None DEFAULT: None

pseudogene

If True, pseudo=, pseudogene= tagged record only extract. If False, pseudo=, pseudogene= not tagged record only extract. If None, extract all regardless of pseudogene tag.

TYPE: bool | None DEFAULT: False

RETURNS DESCRIPTION
features

Feature list

TYPE: list[SeqFeature]

extract_exon_features

extract_exon_features(feature_type: str = 'mRNA') -> list[SeqFeature]

Extract exon structure features within min-max range

Extract exons based on parent feature and exon ID-Parent relation

PARAMETER DESCRIPTION
feature_type

Feature type (e.g. mRNA, ncRNA , etc...)

TYPE: str DEFAULT: 'mRNA'

RETURNS DESCRIPTION
features

Feature list

TYPE: list[SeqFeature]