Skip to content

Genbank Class

Genbank

Genbank(
    gbk_source: str | Path | TextIOWrapper | list[SeqRecord], *, name: str | None = None
)

Genbank Parser Class

PARAMETER DESCRIPTION
gbk_source

Genbank file or source (*.gz, *.bz2, *.zip compressed file can be readable)

TYPE: str | Path | TextIOWrapper | list[SeqRecord]

name

name (If None, file name or record name is set)

TYPE: str | None DEFAULT: None

name property

name: str

Name

records property

records: list[SeqRecord]

Genbank records

genome_seq property

genome_seq: str

Genome sequence (only first record)

genome_length property

genome_length: int

Genome length (only first record)

full_genome_seq property

full_genome_seq: str

Full genome sequence (concatenate all records)

full_genome_length property

full_genome_length: int

Full genome length (concatenate all records)

calc_genome_gc_content

calc_genome_gc_content(seq: str | None = None) -> float

Calculate genome GC content

PARAMETER DESCRIPTION
seq

Sequence for GC content calculation (Default: self.genome_seq)

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
gc_content

GC content

TYPE: float

calc_gc_skew

calc_gc_skew(
    window_size: int | None = None,
    step_size: int | None = None,
    *,
    seq: str | None = None
) -> tuple[NDArray[np.int64], NDArray[np.float64]]

Calculate GC skew in sliding window

PARAMETER DESCRIPTION
window_size

Window size (Default: genome_size / 500)

TYPE: int | None DEFAULT: None

step_size

Step size (Default: genome_size / 1000)

TYPE: int | None DEFAULT: None

seq

Sequence for GCskew calculation (Default: self.genome_seq)

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
pos_list

Position list

TYPE: NDArray[int64]

gc_skew_list

GC skew list

TYPE: NDArray[float64]

calc_gc_content

calc_gc_content(
    window_size: int | None = None,
    step_size: int | None = None,
    *,
    seq: str | None = None
) -> tuple[NDArray[np.int64], NDArray[np.float64]]

Calculate GC content in sliding window

PARAMETER DESCRIPTION
window_size

Window size (Default: genome_size / 500)

TYPE: int | None DEFAULT: None

step_size

Step size (Default: genome_size / 1000)

TYPE: int | None DEFAULT: None

seq

Sequence for GC content calculation (Default: self.genome_seq)

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
pos_list

Position list

TYPE: NDArray[int64]

gc_content_list

GC content list

TYPE: NDArray[float64]

get_seqid2seq

get_seqid2seq() -> dict[str, str]

Get seqid & complete/contig/scaffold genome sequence dict

RETURNS DESCRIPTION
seqid2seq

seqid & genome sequence dict

TYPE: dict[str, str]

get_seqid2size

get_seqid2size() -> dict[str, int]

Get seqid & complete/contig/scaffold genome size dict

RETURNS DESCRIPTION
seqid2size

seqid & genome size dict

TYPE: dict[str, int]

get_seqid2features

get_seqid2features(
    feature_type: str | list[str] | None = "CDS", target_strand: int | None = None
) -> dict[str, list[SeqFeature]]

Get seqid & features in target seqid genome dict

PARAMETER DESCRIPTION
feature_type

Feature type (CDS, gene, mRNA, etc...) If None, extract regardless of feature type.

TYPE: str | list[str] | None DEFAULT: 'CDS'

target_strand

Extract target strand. If None, extract regardless of strand.

TYPE: int | None DEFAULT: None

RETURNS DESCRIPTION
seqid2features

seqid & features dict

TYPE: dict[str, list[SeqFeature]]

extract_features

extract_features(
    feature_type: str | list[str] | None = "CDS",
    *,
    target_strand: int | None = None,
    target_range: tuple[int, int] | None = None
) -> list[SeqFeature]

Extract features (only first record)

PARAMETER DESCRIPTION
feature_type

Feature type (CDS, gene, mRNA, etc...) If None, extract regardless of feature type.

TYPE: str | list[str] | None DEFAULT: 'CDS'

target_strand

Extract target strand. If None, extract regardless of strand.

TYPE: int | None DEFAULT: None

target_range

Extract target range. If None, extract regardless of range.

TYPE: tuple[int, int] | None DEFAULT: None

RETURNS DESCRIPTION
features

Extracted features

TYPE: list[SeqFeature]

write_cds_fasta

write_cds_fasta(outfile: str | Path) -> None

Write CDS fasta file

PARAMETER DESCRIPTION
outfile

Output CDS fasta file

TYPE: str | Path

write_genome_fasta

write_genome_fasta(outfile: str | Path) -> None

Write genome fasta file

PARAMETER DESCRIPTION
outfile

Output genome fasta file

TYPE: str | Path