Walkthrough

Get a bird's eye view of the SDK functionality via an explained example.

Let's say we want to build and run a simple pipeline using the Python SDK. This walkthrough will give you a way to construct and view a pipeline, while also introducing the major building blocks of the SDK along the way.

The ultimate pipeline we'll build is a copy of the Personalized Email Generator introduced as an example pipeline under the no-code documentation. As a brief overview, the pipeline takes in a text input of a company's website URL, queries a VectorDB to get information about the company, and passes the information into an LLM to generate a personalized email for outreach to the company (as a consulting firm looking to improve their operations).

Defining Pipeline Nodes

VectorShift pipelines are created using nodes that represent units of computation and edges which define how outputs of nodes are fed as inputs to others. At a high-level, in the SDK we similarly create a Pipeline object by composing together nodes of different classes and parameters. There are several classes of nodes that correspond closely with the various nodes available in the no-code editor. Once initialized, each node object has one or more outputs, which we can pass into later nodes' constructors.

In essence, most nodes will expect to take in the outputs of other nodes when initialized, which adds an edge in the computation graph between the nodes. So it makes the most sense to build the pipeline in order, starting with inputs and sequentially feeding node outputs as inputs into later nodes.

Input

Our pipeline takes in one input, which is of type text (the URL). Correspondingly, there's an InputNode class that we can use to represent this input, which requires a name and data type.

The data type is more than a constructor argument here. Behind the scenes, node outputs are tagged with different types (e.g. LLMs produce textual output), which can help catch issues with pipelines before they're saved to the VectorShift platform. We list the expected types of different nodes' inputs and outputs in node-specific documentation.

import vectorshift
from vectorshift.node import InputNode, URLLoaderNode, VectorQueryNode

input_node = InputNode(name="input_1", input_type="text")

Each node has one or more outputs with different names that can be fed as inputs into later nodes. This is represented for all node objects by a dictionary of names to output objects (NodeOutputs), given by the method outputs(). The majority of nodes, however, only have one output, in which case we can call the output() method directly to get the output object.

There's a node class that can load the contents of a URL and return the data retrieved, which seems useful here. We'll take the output of input_node and feed it into the constructor, which expects a url_input argument:

url_loader = URLLoaderNode(url_input=input_node.output())

The semantic meaning of the line above is that the output of the overall pipeline input gets fed as the input to the URL loader node.

Querying the VectorDB

The output of url_loader can be used to query a VectorDB. As in the no-code walkthrough, we have a corresponding VectorQueryNode that takes one or more queries and one or more document inputs. This node will work by first loading the content given by the URL into a temporary VectorDB and then using the string query to perform a semantic search over the VectorDB to retrieve the relevant contents of what we loaded from the URL.

question_text = TextNode(text="How can this company grow?")

vector_query = VectorQueryNode(
    query_input=[question_text.output()], 
    documents_input=[url_loader.output()]
)

Let's say we want to combine the query output with the question we used in the query. To do this, we can introduce another TextNode that includes the outputs of these two nodes as variables inside the node text. The functionality is the same as the no-code platform: each variable is indicated by double brackets {{}}, and we expect input to be passed in corresponding to each variable name.

prompt_text = TextNode(
    text="Company Context: {{Context}}\n Question: {{Question}}",
    text_inputs={
        "Context": vector_query.output(),
        "Question": question_text.output()
    }
)

We introduced two variables, Context and Question, so we correspondingly pass in a text_inputs argument of previous nodes' outputs keyed by those variable names.

Generating Text with the LLM

Let's use GPT-4 to generate a customized sentence for the email. In this case, we can use an OpenAILLMNode. These nodes also support a system input prompt, which we can seed with some contextual text. (Note: the system text is shortened from the original tutorial.)

from vectorshift.node import OpenAILLMNode

system_text_raw = """You are a sentence generator for a consulting
firm. You take in data from a website and generate a sentence
that explains how the firm can help this company."""

system_text = TextNode(text=system_text_raw)

llm = OpenAILLMNode(
    model="gpt-4", 
    system_input=system_text.output(), 
    prompt_input=prompt_text.output()
)

Composing the Email

The prompt for the LLM node was to generate a sentence, not an entire email. We can write up some custom text ourselves to fill in the rest of the email and insert the generated sentence as a variable as above.

output_text_raw = """Hello,
We are XYZ consulting, specializing in crafting growth strategies.

{{Personalized_Message}}

Are you available anytime later this week to chat?

Best,
XYZ"""

output_text = TextNode(
    text=output_text_raw,
    text_inputs={
        "Personalized_Message": llm.output()
    }
)

Output

The output of the entire pipeline should be the text of the email, which is created by the output_text node. We can just take that node's output() and package it in an OutputNode, which determines the overall returned value of the pipeline.

Remember that OutputNode is a node that represents, in the pipeline's computation graph, the final value produced. We pass in the output() of output_text, which is a NodeOutput, as the input to that node. OutputNodes are a kind of node; NodeOutputs define what a node returns.

from vectorshift.node import OutputNode

output = OutputNode(
    name="output_1", 
    output_type="text", 
    input=output_text.output()
)

These are all the nodes we need! The overall structure of the nodes closely follows that of the no-code example. Each node block in the no-code editor became its own object in Python, and each edge between nodes has been represented by the output of one node being passed into the constructor of another.

Creating and Deploying the Pipeline

Once nodes have been defined, creating a pipeline object is fairly simple, since the node objects themselves already encode the edges between them.

A Pipeline object can be initialized by passing in a list of all nodes, a name, and a description. The list of nodes can be passed in in any order.

from vectorshift.pipeline import Pipeline

email_gen_pipeline_nodes = [
    input_node, url_loader, question_text, vector_query,
    prompt_text, system_text, llm, output_text, output
]

email_gen_pipeline = Pipeline(
    name="Email Generator",
    description="Generate personalized emails for outreach",
    nodes=email_gen_pipeline_nodes
)

There are a few nifty methods that a Pipeline object has. Printing it gives a representation of its constituent nodes—and if you want to generate code that represents how you could construct the object, there's a method for that too (that assigns generated IDs as variable names for each node).

print(email_gen_pipeline)
print(email_gen_pipeline.construction_str())

To save the pipeline to the VectorShift platform, we can either pass in our API keys to the save() method of the object:

response = email_gen_pipeline.save(
    api_key=YOUR_API_KEY, 
)

Or we could create a Config object and then pass the pipeline object in:

config = vectorshift.deploy.Config(
    api_key=YOUR_API_KEY,
)

response = config.save_new_pipeline(email_gen_pipeline)

Finally, you can run a Pipeline as well, given a dict that maps the input names the pipeline expects to string representations of the inputs. For instance, since we defined email_gen_pipeline above as having a text input called input_1, we could run the pipeline as follows:

pipeline = Pipeline.fetch(pipeline_name='Email Generator')

response = email_gen_pipeline.run(
    {"input_1": "https://www.vectorshift.ai/"}
)

Next Steps

This walkthrough should give you the basic tools needed to use the SDK. You'll likely want to know about how to use all the nodes available to you, as well as what typing constraints are present for node inputs and outputs, for which it'll make sense to look at node-specific documentation.

We also support ways to work with other objects using the SDK. These objects correspond to other features of the VectorShift platform—e.g. Agents and VectorStores—and follow much of the same design patterns as the above. Take a look at their documentation pages for references on how to construct and interact with them.

Last updated