Skip to main content
The Table class is the SDK surface for VectorShift’s managed structured store. Define columns with typed formats, fill cells by hand or via an AI generator (a Pipeline or an Agent), then query, aggregate, import, and export — all with the same fluent filter dataclasses.
Prerequisites: Installed SDK · API key set · Python 3.10+.

Mental model

  • A Table is a named, schema’d collection of rows. Each column has a typed ColumnFormat — string, number, timestamp, select, file, image, knowledge-base, and a few list variants.
  • A column can carry a ColumnGenerator — either a PipelineGenerator or an AgentGenerator — and table.run(columns=[...]) fills those cells in bulk. This is how a Table becomes a workflow surface, not just a data store.
  • Queries take a CompoundFilter: groups of FilterCondition joined with AND/OR, with order_by / paginate chained on. The same filter object drives read_rows, update_rows, delete_rows, aggregate, run, and export.
  • Long-running operations (run, import_file) return a task handle (TableRunTask / TableImportTask) you poll, or use the _and_wait variant which blocks until terminal status.
  • Every method has an async variant (anew, ainsert_rows, aread_rows, arun_and_wait, ascroll, …).
AI-generated columns are the headline feature. A column with a PipelineGenerator or AgentGenerator attached gets its cells filled by running the bound Pipeline/Agent on the other column values in that row. See PipelineGenerator and AgentGenerator in the reference.

Quick start

from vectorshift.table import (
    ColumnSpec, NumberFormat, NumberKind, StringFormat, Table,
)

# Create with a typed schema.
t = Table.new(
    name="Vendor scorecard",
    columns=[
        ColumnSpec(name="vendor", format=StringFormat()),
        ColumnSpec(name="region", format=StringFormat()),
        ColumnSpec(name="amount", format=NumberFormat(number_kind=NumberKind.FLOAT)),
    ],
)

# Insert rows.
t.insert_rows([
    {"vendor": "Acme",  "region": "EU", "amount": 1200.5},
    {"vendor": "Beta",  "region": "EU", "amount": 49.9},
    {"vendor": "Gamma", "region": "US", "amount": 980.0},
])

# Read all rows.
page = t.read_rows()
print(f"{page['total']} rows")

Reading rows at scale

read_rows(...) returns one page of results (about 1,000 rows by default). For tables larger than that, iterate with scroll (or ascroll for async) — it pages through every matching row using the same filters and columns arguments:
for chunk in t.scroll(page_size=500, filters=my_filter):
    process(chunk["rows"])
Reach for read_rows when you know the result set is small and you want a RowsPage in one call. Reach for scroll when you need every matching row, or when you don’t know how large the table is.

Pipeline and Agent generated columns

The generation field on a ColumnSpec binds a column to a Pipeline or an Agent. table.run(columns=[...]) then fills those cells row-by-row using the other column values in each row as inputs. In input_mapping, keys are the Pipeline/Agent input names and values are the table-column names the engine reads for each row. So {"vendor_name": "vendor"} means “send the value of the vendor column into the vendor_name input.”
from vectorshift.pipeline import Pipeline
from vectorshift.table import (
    ColumnSpec, PipelineGenerator, StringFormat, Table,
)

summarizer = Pipeline.fetch(name="vendor-summarizer")

t = Table.new(
    name="Vendor summaries",
    columns=[
        ColumnSpec(name="vendor", format=StringFormat()),
        ColumnSpec(name="notes",  format=StringFormat()),
        ColumnSpec(
            name="summary",
            format=StringFormat(),
            generation=PipelineGenerator(
                pipeline=summarizer,
                input_mapping={"vendor_name": "vendor", "raw_notes": "notes"},
                output_name="summary",
            ),
        ),
    ],
)

t.insert_rows([
    {"vendor": "Acme",  "notes": "fast shipping, premium pricing"},
    {"vendor": "Beta",  "notes": "slow but cheap"},
])

# Fire the generator for the 'summary' column on every row.
task = t.run_and_wait(columns=["summary"])
print(task.status, task.rows_processed)
See PipelineGenerator and AgentGenerator in the reference for every field.

Column formats

Each column carries a ColumnFormat — pick the variant that matches the data:
VariantUse for
StringFormatFree-text
BoolFormatTrue / false flags
NumberFormatint / float / currency / percent via NumberKind
TimestampFormatDates / datetimes with a display format
SingleSelectFormat / MultiSelectFormatEnum-style choices with SelectOption
FileFormat / AudioFormat / ImageFormatSingle file uploads
KnowledgeBaseFormatCell points at a KnowledgeBase
ListOfFilesFormat / ListOfStringsFormatLists of files / strings
add_columns also accepts a shorthand {name: kind_string} dict — handy for quick smoke tests; the typed ColumnSpec path is recommended for real schemas.

Filtering, ordering, paginating

Every query verb takes the same CompoundFilter. Build one with CompoundFilter.of(...) for a single-group AND/OR, or construct nested groups directly:
from vectorshift.table import (
    CompoundFilter, FilterCondition, FilterGroup,
    RelationalOperator, TableFilterOperator,
)

# (region == "EU") AND (priority > 2 OR status == "closed")
filt = CompoundFilter(
    groups=[
        FilterGroup(conditions=[
            FilterCondition("region", TableFilterOperator.EQ, "EU"),
        ]),
        FilterGroup(
            conditions=[
                FilterCondition("priority", TableFilterOperator.GT, 2),
                FilterCondition("status", TableFilterOperator.EQ, "closed"),
            ],
            logical_operator=RelationalOperator.OR,
        ),
    ],
    group_logical_operator=RelationalOperator.AND,
).order_by("priority", desc=True).paginate(limit=10)

page = t.read_rows(filters=filt)
See the TableFilterOperator reference for every operator and what it matches, or the Common filter recipes example for copy-paste snippets per use case.

Long-running operations

run and import_file are async on the server. Two surfaces:
VerbReturns immediatelyBlocks until done
run (fill generator columns)TableRunTaskrun_and_wait
import_file (CSV / XLSX)TableImportTaskimport_file_and_wait
Poll the bare variant with run_status(task_id) / import_file_status(task_id), or just call _and_wait with a timeout and poll_interval.

What’s next

Reference

Every public method, grouped by topic.

End-to-end walkthrough

Build a vendor scorecard from schema to export.

Common filter recipes

One snippet per operator, grouped by use case.