Skip to main content

Supported file formats

CategoryFormats
Documentsdoc, docx, pdf, pptx, txt, md
Spreadsheetscsv, xls, xlsx
ImagesJPEG, PNG, GIF, BMP, TIFF, WebP
AudioMP3, WAV, OGG, FLAC, AAC, M4A, WMA
VideoMP4, MOV, AVI, WMV, FLV, MPEG, MKV, WebM
DataJSON
ArchivesZIP (automatically extracted and processed)

Embedding models

Models are available from multiple providers: OpenAI, VoyageAI, Cohere, and Google. The default is openai/text-embedding-3-small, which works well for most use cases. The full list of available models is shown in the dropdown during knowledge base creation.

Processing models

Choose the model that best handles your content type:
ModelBest for
DefaultGeneral purpose text extraction
Llama ParseStructured documents with complex layouts
TextractForms and tables (AWS-powered)
DoclingLayout-aware document understanding
Mistral OCRScanned documents and images with text
Contextual AIContext-aware document processing
ReductoHigh-fidelity document parsing with layout understanding
UnstructuredFlexible extraction for a wide range of unstructured document types

Splitter methods

Choose the method that matches your content structure:
MethodHow it works
SentenceSplits at sentence boundaries — best for unstructured text like emails and transcripts
MarkdownSplits based on Markdown structure (headings, paragraphs, lists) — best for well-structured docs
DynamicAdapts its splitting strategy to the content — best for mixed or varied formats
For code files (Python, JavaScript, TypeScript, Go, Rust, SQL, YAML, Dockerfiles, and 100+ other code formats), VectorShift automatically applies a dedicated Code splitter regardless of the default splitter setting. This ensures code is split along meaningful boundaries like functions and classes.

Document statuses

StatusWhat it means
SuccessReady to search — fully processed and indexed
ProcessingIn progress — being chunked, embedded, and indexed
FailedSomething went wrong — retry from the document list
WarningPartial issues — shown for folders when one or more child items failed to index

Available integrations

Suggested Apps OneDrive, Sharepoint, Google Drive, Box All Available Integrations
IntegrationIntegration
AirtableCopper
DiscordGmail
Google CalendarGoogle Drive
Google DocsGoogle Sheets
Google BigQueryHubSpot
LinearOneDrive
NotionSalesforce
SlackSugarCRM
TypeformDropbox
Dropbox TeamsAWS S3
Confluence CloudConfluence Data Center
ZendeskSharepoint
Supabase S3Outlook
Azure Blob StorageTeams
GHLClickup
BoxTrello
monday.comShopify