Inovirus genomes and associated data from "Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes"

Schematic representation of the Inovirus genome diversity.
Schematic representation of the Inovirus genome diversity. The 6 proposed families are highlighted in colors and associated with their main host group(s). Groups included isolates are indicated with a bold outline.


Filamentous single-stranded DNA viruses from the Inoviridae family (inoviruses) are known to infect a limited range of bacterial hosts. Their unique “chronic infection” strategy enables them to propagate with minimal negative impacts on their hosts, while still modulating host cell physiology and pathogenicity. While only 56 distinct genomes are currently availabe for members of the Inoviridae family, here we provide a database of 10,295 inovirus-like genomes identified from microbial (meta-)genomes using a machine learning approach (

To identify these putative inovirus genomes, a set of reference protein clusters was built from the known Inoviridae genomes. These protein clusters are available as a tar.gz archive: Ref_PCs_inoviruses.tar.gz. The archive includes for each protein custer a fasta file of all sequences, fasta file of the mutiple alignment, and the corresponding HMM profile. Annotated genomes are available as genbank files in the Gb_files_inoviruses.tar.gz archive. Genbank files are organized in different folders by proposed family and subfamily. Finally, the larger set of protein families derived from the extended genome catalog is available as a tar.gz : iPFs_inoviruses.tar.gz. As for the reference protein clusters, the archive includes for each protein family a fasta file of all sequences, fasta file of the mutiple alignment, and the corresponding HMM profile.


Collectively, these represent six distinct proposed inovirus families infecting both bacteria and archaea across virtually every ecosystem. We proposed a classification of inovirus diversity into 6 families based on gene content with coherent host ranges and specific genome features, which strongly suggests these represent ecologically and evolutionarily meaningful units. We also identified an expansive diversity of toxin-antitoxin systems for maintenance of the viral genome in the host population, alongside evidence of both synergistic (CRISPR evasion) and antagonistic (superinfection exclusion) interactions with co-infecting viruses. Capturing this previously obscured component of the global virosphere sparks new avenues for microbial manipulation approaches and innovative biotechnological applications.