Skip to content

Genbank Class

Genbank

Genbank(
    gbk_source: str | Path | TextIOWrapper,
    name: str | None = None,
    reverse: bool = False,
    min_range: int | None = None,
    max_range: int | None = None,
)

Genbank Parser Class

PARAMETER DESCRIPTION
gbk_source

Genbank file or source (*.gz, *.bz2, *.zip compressed file can be readable)

TYPE: str | Path | TextIOWrapper

name

name (If None, file name or record name is set)

TYPE: str | None DEFAULT: None

reverse

If True, reverse complement genome is used

TYPE: bool DEFAULT: False

min_range

Min range to be extracted (Default: 0)

TYPE: int | None DEFAULT: None

max_range

Max range to be extracted (Default: genome length)

TYPE: int | None DEFAULT: None

name property

name: str

Name

records property

records: list[SeqRecord]

Genbank records

full_genome_length property

full_genome_length: int

Full genome sequence length

genome_length property

genome_length: int

Range genome sequence length (Same as range_size)

range_size property

range_size: int

Range size (max_range - min_range)

full_genome_seq property

full_genome_seq: str

Full genome sequence

genome_seq property

genome_seq: str

Range genome sequence

calc_genome_gc_content

calc_genome_gc_content() -> float

Calculate genome GC content

calc_gc_skew

calc_gc_skew(
    window_size: int | None = None,
    step_size: int | None = None,
    *,
    seq: str | None = None
) -> tuple[np.ndarray, np.ndarray]

Calculate GC skew in sliding window

PARAMETER DESCRIPTION
window_size

Window size (Default: genome_size / 500)

TYPE: int | None DEFAULT: None

step_size

Step size (Default: genome_size / 1000)

TYPE: int | None DEFAULT: None

seq

Sequence for GCskew calculation (Default: self.genome_seq)

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
gc_skew_result_tuple

Position list & GC skew list

TYPE: tuple[ndarray, ndarray]

calc_gc_content

calc_gc_content(
    window_size: int | None = None,
    step_size: int | None = None,
    *,
    seq: str | None = None
) -> tuple[np.ndarray, np.ndarray]

Calculate GC content in sliding window

PARAMETER DESCRIPTION
window_size

Window size (Default: genome_size / 500)

TYPE: int | None DEFAULT: None

step_size

Step size (Default: genome_size / 1000)

TYPE: int | None DEFAULT: None

seq

Sequence for GCskew calculation (Default: self.genome_seq)

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
gc_content_result_tuple

Position list & GC content list

TYPE: tuple[ndarray, ndarray]

get_seqid2seq

get_seqid2seq() -> dict[str, str]

Get seqid & complete/contig/scaffold genome sequence dict

RETURNS DESCRIPTION
seqid2seq

seqid & genome sequence dict

TYPE: dict[str, int]

get_seqid2size

get_seqid2size() -> dict[str, int]

Get seqid & complete/contig/scaffold genome size dict

RETURNS DESCRIPTION
seqid2size

seqid & genome size dict

TYPE: dict[str, int]

get_seqid2features

get_seqid2features(
    feature_type: str | None = "CDS",
    target_strand: int | None = None,
    pseudogene: bool | None = False,
) -> dict[str, list[SeqFeature]]

Get seqid & features in target seqid genome dict

PARAMETER DESCRIPTION
feature_type

Feature type (CDS, gene, mRNA, etc...) If None, extract regardless of feature type.

TYPE: str | None DEFAULT: 'CDS'

target_strand

Extract target strand. If None, extract regardless of strand.

TYPE: int | None DEFAULT: None

pseudogene

If True, pseudo=, pseudogene= tagged record only extract. If False, pseudo=, pseudogene= not tagged record only extract. If None, extract regardless of pseudogene tag.

TYPE: bool | None DEFAULT: False

RETURNS DESCRIPTION
seqid2features

seqid & features dict

TYPE: dict[str, list[SeqFeature]]

extract_features

extract_features(
    feature_type: str = "CDS",
    target_strand: int | None = None,
    fix_position: bool = False,
    allow_partial: bool = False,
    pseudogene: bool = False,
) -> list[SeqFeature]

Extract features within min-max range

PARAMETER DESCRIPTION
feature_type

Extract feature type

TYPE: str DEFAULT: 'CDS'

target_strand

Extract target strand

TYPE: int | None DEFAULT: None

fix_position

If True, fix feature start & end position by specified min_range parameter (fixed_start = start - min_range, fixed_end = end - min_range)

TYPE: bool DEFAULT: False

allow_partial

If True, allow extraction of features that are partially included in range

TYPE: bool DEFAULT: False

pseudogene

If True and feature_type='CDS', only extract CDS features with /pseudo or /pseudogene qualifiers.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
features

Extracted features

TYPE: list[SeqFeature]

write_cds_fasta

write_cds_fasta(
    fasta_outfile: str | Path,
    seqtype: str = "protein",
    fix_position: bool = False,
    allow_partial: bool = False,
)

Write CDS protein features fasta file

PARAMETER DESCRIPTION
fasta_outfile

CDS fasta file

TYPE: str | Path

seqtype

Sequence type (protein|nucleotide)

TYPE: str DEFAULT: 'protein'

fix_position

If True, fix feature start & end position by specified min_range parameter (fixed_start = start - min_range, fixed_end = end - min_range)

TYPE: bool DEFAULT: False

allow_partial

If True, features that are partially included in range are also extracted

TYPE: bool DEFAULT: False

write_genome_fasta

write_genome_fasta(outfile: str | Path) -> None

Write genome fasta file

PARAMETER DESCRIPTION
outfile

Output genome fasta file

TYPE: str | Path