Node Classes

The SDK maintains a close correspondence between no-code pipeline nodes and Python classes.

This page contains documentation for all node classes available in the SDK, which match up closely with the nodes available in the no-code pipeline builder. The organization of sections below loosely follows how different nodes are organized into tabs in the no-code editor.

All nodes are classes initialized with parameters depending on the node. We document constructor arguments and outputs. Arguments are listed under input and parameter sections. Inputs denote arguments that are NodeOutputs from earlier nodes passed in, while parameters denote arguments that modify the other properties of the node. Outputs are listed by output name. For instance, if a node n has an output called output_name, then the documentation provides details for output_name, and the way to access this output in Python would be via n.outputs()["output_name"].

While we provide setters to modify specific parameters and inputs of nodes, we do not currently have individual getter methods. However, each node class comes with built-in methods to display attributes, which are displayed when the node is printed. Each node also has a construction_strs() method that may be called to return a list of the arguments that can replicate the node using its constructor.

Inputs and Outputs

InputNode

vectorshift.node.InputNode(
    name: str, 
    input_type: str,
    process_files: bool = True
)

Represents the inputs (start points) to a pipeline. Your pipelines should always start with these.

Inputs:

None. This node represents what is passed into the pipeline when it is run.

Parameters:

  • name: A string representing the input name, e.g. "text_input". Should only contain alphanumeric characters and underscores.

  • input_type: A string representing the input type. Each input type corresponds with a specific data type for the outputs of the node. The string must be one of the following, and an error is thrown otherwise:

    • "text": The input to the pipeline should be (one or more pieces of) text. Corresponds to the Text data type (List[Text] for multiple inputs).

    • "file": The input to the pipeline should be one or more files. Corresponds to the File data type (List[File] for multiple inputs).

  • process_files: If input_type is "file", sets whether or not to automatically process the files into text. (If set to True, this node essentially also includes the functionality of FileLoaderNode.) Ignored if input_type is not "file".

Outputs:

  • value: The NodeOutput representing the pipeline's input. The output data type is specified by the input_type parameter above.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_name(name: str)
set_input_type(input_type: str)

Setters for the node parameters.

OutputNode

vectorshift.node.OutputNode(
    name: str, output_type: str, 
    input: NodeOutput
)

Represents the outputs (end points) to a pipeline. Your pipelines should always end with these.

Inputs:

  • input: The NodeOutput to be used as the pipeline output, whose data type should match input_type.

Parameters:

  • name: A string representing the name of the pipeline's overall output, e.g. "text_output". Should only contain alphanumeric characters and underscores.

  • input_type: A string representing the input type. Each input type corresponds with a specific output data type for the outputs of the node. The string must be one of the following, and an error is thrown otherwise:

    • "text": The input to the pipeline should be (one or more pieces of) text. Corresponds to the Text data type (List[Text] for multiple inputs).

    • "formatted_text": The input to the pipeline should be (one or more pieces of) text. Same as text, but formatted as Markdown.

    • "file": The input to the pipeline should be one or more files. Corresponds to the File data type (List[File] for multiple inputs).

    • "image":

Outputs:

None. This node represents what the pipeline produces when it is run.

set_name(name: str)
set_output_type(input_type: str)
set_input(input: NodeOutput)

Setters for the node parameters and inputs.

Text and File Data

TextNode

vectorshift.node.TextNode(
    text: str,
    text_inputs: dict[str, NodeOutput] = {},
    format_text: bool = True
)

Represents a block of text. The text may include text variables, which are placeholders for text produced earlier on in the pipeline expected to be supplied as additional inputs, and notated within a text block using double curly brackets {{}}. For instance, the text block

Here is our response: {{response}}

would expect one text variable, response. When the pipeline is run, the earlier output is substituted into the place of {{response}} to create the actual text.

Inputs:

  • text_inputs: A map of text variable names to NodeOutputs expected to produce the text for the variables. Each NodeOutput should have data type Text. text_inputs may contain a superset of the variables in text. However, each text variable in text should be included as a key in text_inputs. When the pipeline is run, each NodeOutput's contents are interpreted as text and substituted into the variable's places. If text contains no text variables, this can be empty.

Parameters:

  • text: The string representing the text block, wrapping text variables with double brackets. The same variable can be used in more than one place.

  • format_text: A flag for whether or not to auto-format text.

Outputs:

  • output: The NodeOutput representing the text, of data type Text.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_text(text: str)
set_format_text(format_text: bool)

Setters for the text and format_text flag. If the new text in set_text contains text variables, they must already be in the text inputs of the node.

set_text_input(text_var: str, input: NodeOutput)
remove_text_input(text_var: str)
set_text_inputs(text_inputs: dict[str, NodeOutput])

Methods to set and remove text inputs. Variables added via set_text_input do not necessarily have to be in the current text. However, variables removed via remove_text_input cannot be in the text.

FileNode

vectorshift.node.FileNode(
    file_names: list[str] = [],
    process_files: bool = True,
    chunk_size: int = 400, 
    chunk_overlap: int = 0,
    api_key: str = None,
)

Represents one or more files in a pipeline. Files should already be stored within the VectorShift platform. An API call is made upon initialization to retrieve relevant file data, so an API key is required.

Inputs:

None. This node expects to retrieve files via an API call to the VectorShift platform.

Parameters:

  • file_names: A list of file names stored on the VectorShift platform to be loaded by this node.

  • process_files: Whether or not to automatically process the files into text. (If set to True, this node essentially also includes the functionality of FileLoaderNode.)

  • chunk_size, chunk_overlap: How files should be loaded if process_files is True. Resulting strings will be of length at most chunk_size and overlap with chunk_overlap.

  • api_key: The VectorShift API key to make calls to retrieve the file data.

Outputs:

  • files: The NodeOutput representing the files, of data type List[File] if process_files is set to False, and List[Document] otherwise.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_file_names(file_names: list[str])
set_process_files(process_files: bool)
set_chunk_size(chunk_size: int)
set_chunk_overlap(chunk_overlap: int)
set_api_key(api_key: str)

Setters for the node parameters.

StickyNoteNode

vectorshift.node.StickyNoteNode(
    text: str
)

A sticky note with no functionality.

Inputs:

None.

Parameters:

  • text: The text in the sticky note.

Outputs:

None.

set_text(text: str)

Setter for the sticky note text.

FileSaveNode

vectorshift.node.FileSaveNode(
    name_input: NodeOutput,
    files_input: list[NodeOutput]
)

Represent the saving of one or more files to the VectorShift platform.

Inputs:

  • name_input: A NodeOutput representing the name under which the file should be saved. The output of a TextNode can be used if the desired file name is known and fixed. Should have data type String.

  • files_input : One or more NodeOutputs representing files to be saved. They should have output data type File.

Parameters:

None.

Outputs:

None. This node represents saving files.

set_name_input(name_input: NodeOutput)
set_files_input(files_input: list[NodeOutput])

Setters for the node inputs.

User-Created VectorShift Objects

PipelineNode

vectorshift.node.PipelineNode(
    pipeline_id: str = None,
    pipeline_name: str = None,
    inputs: dict[str, NodeOutput] = {},
    username: str = None,
    org_name: str = None,
    batch_mode: bool = False,
    api_key: str = None, 
)

Represent a nested Pipeline, which will be run as a part of the overall Pipeline. When the node is executed, the pipeline it represents is executed with the supplied inputs, and the overall pipeline's output becomes the node's output. The Pipeline must already exist on the VectorShift platform, so that it can be referenced by its ID or name. If the ID or name are not provided, the node represents a generic nested Pipeline whose details must be provided before it is run. If an ID or name are provided, an API call is made upon initialization to retrieve relevant Pipeline data, meaning an API key is required.

It is also possible to construct PipelineNodes from pipeline objects. See the method from_pipeline_obj below.

Inputs:

  • inputs: A map of input names to NodeOutputs, which depends on the specific Pipeline. In essence, the NodeOutputs passed in are interpreted as inputs to the Pipeline represented by the PipelineNode. They should match up with the expected input names of the pipeline. For instance, if the Pipeline has input names input_1 and input_2, then the dictionary should contain those strings as keys.

Parameters:

  • pipeline_id: The ID of the Pipeline being represented.

  • pipeline_name: The name of the Pipeline being represented. At least one of pipeline_id and pipeline_name should be provided. If both are provided, pipeline_id is used to search for the Pipeline. If both are omitted, a generic Pipeline node will be saved, and details must be provided before the pipeline including the node is run.

  • username: The username of the user owning the Pipeline.

  • org_name: The organization name of the user owning the Pipeline, if applicable.

  • batch_mode: A flag to set whether or not the pipeline can run batched inputs.

  • api_key: The VectorShift API key to make calls to retrieve the Pipeline data.

Outputs:

Outputs are determined from the pipeline represented. Since each pipeline returns one or more named outputs that are either of File or Text data type, the keys of the outputs dictionary are the named outputs of the pipeline, with the values given the appropriate data type.

vectorshift.node.PipelineNode.from_pipeline_obj(
    pipeline_obj: vectorshift.pipeline.Pipeline,
    inputs: dict[str, NodeOutput],
    api_key: str = None,
)

A static method to construct a pipeline node from a pipeline object, to avoid the manual action on part of the programmer of saving the pipeline object. The pipeline will automatically be saved to the VectorShift platform when the method is run.

Arguments:

  • pipeline_obj: The pipeline object to be represented by the node.

  • inputs: A map of expected pipeline input names to NodeOutputs. As above, the map keys should match the expected input names of the pipeline object.

  • api_key: The API key to be used when saving the pipeline to the VectorShift platform.

set_pipeline(
    pipeline_id: str = None,
    pipeline_name: str = None,
    inputs: dict[str, NodeOutput] = {},
    username: str = None,
    org_name: str = None
)
set_batch_mode(batch_mode: bool)
set_input(input_name: str, input: NodeOutput)
set_inputs(self, inputs:dict[str, NodeOutput])
set_api_key(api_key: str)

Setters for the node's parameters and inputs.

AgentNode

vectorshift.node.AgentNode(
    agent_id: str = None,
    agent_name: str = None,
    inputs: dict[str, NodeOutput],
    username: str = None,
    org_name: str = None
    api_key: str = None,
)

Represent an agent. The agent must already exist on the VectorShift platform, so that it can be referenced by its ID or name. An API call is made upon initialization to retrieve relevant agent data, meaning an API key is required.

It is also possible to construct AgentNodes from agent objects. See the method from_agent_obj below.

Inputs:

  • inputs: A map of input names to NodeOutputs, which depends on the specific agent. In essence, the NodeOutputs passed in are interpreted as inputs to the pipeline represented by the AgentNode. They should match up with the expected input names of the agent. For instance, if the agent has input names input_1 and input_2, then the dictionary should contain those strings as keys.

Parameters:

  • agent_id: The ID of the agent being represented.

  • agent_name: The name of the agent being represented. At least one of agent_id and agent_name should be provided. If both are provided, agent_id is used to search for the agent object.

  • username: The username of the user owning the agent.

  • org_name: The organization name of the user owning the agent, if applicable.

  • api_key: The VectorShift API key to make calls to retrieve the agent data.

Outputs:

Outputs are determined from the agent represented. Since each agent returns one or more named outputs that are either of File or Text data type, the keys of the outputs dictionary are the named outputs of the agent, with the values given the appropriate data type.

vectorshift.node.AgentNode.from_agent_obj(
    agent_obj: vectorshift.pipeline.Agent,
    inputs: dict[str, NodeOutput],
    api_key: str = None,
)

A static method to construct an agent node from an agent object, to avoid the manual action on part of the programmer of saving the agent object. The agent will automatically be saved to the VectorShift platform when the method is run.

Arguments:

  • agent_obj: The agent object to be represented by the node.

  • inputs: A map of expected agent input names to NodeOutputs. As above, the map keys should match the expected input names of the agent object.

  • api_key: The API key to be used when saving the agent to the VectorShift platform.

set_input(input_name: str, input: NodeOutput)
set_inputs(inputs: dict[str, NodeOutput])
set_api_key(api_key: str)

Setters for the node's parameters and inputs. The node currently does not support changing the agent itself; to do this, a new replacement node should be created.

IntegrationNode

vectorshift.node.IntegrationNode(
    integration_type: str,
    integration_id: str = None,
    action: str = None,
    inputs: dict[str, list[NodeOutput]] = {},
    api_key: str = None,
    **kwargs
)

Represents a particular action taken from a VectorShift integration (e.g. the "save files" action from a Google Drive integration). The integration should already exist on the VectorShift platform, so that it can be referenced by its name. If the integration ID or action are not specified, the node represents a generic, incomplete integration whose details must be provided before it is run. The particular actions available depend on the integration. Some actions may require additional arguments passed into the constructor.

If this node contains information about the specific integration to use, an API call is made when a pipeline containing this node is saved to retrieve relevant integration data, meaning an API key is required.

See below for a list of integrations, actions and their corresponding expected inputs/outputs.

Inputs:

  • inputs: A map of input names to lists of NodeOutputs, which depends on the specific integration. (If there is only one NodeOutput, a singleton list should be used as the value.) The inputs should match the expected names and data types of the specific integration and function.

Parameters:

  • integration_type: A string denoting the type of integration.

  • integration_id: The name of the integration ID being represented. If not provided, then the node represents a generic integration that needs to be set up before the pipeline is run.

  • action: The name of the specific action to be used with the integration. If not provided, the node represents a generic integration whose action should be later specified.

  • api_key: The API key to be used when retrieving integration data from the VectorShift platform.

Outputs:

Outputs are determined from the specific integration action. They are currently given data type Any.

Supported Integration Actions and Parameters:

set_integration_id(integration_id: str)
set_integration(
    integration_type: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
)
set_input(input_name: str, input: list[NodeOutput])
set_inputs(inputs: dict[str, list[NodeOutput]]):
set_api_key(api_key: str)

Setters for the node's parameters and inputs.

Specific Integration Nodes

vectorshift.node.SalesforceIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Salesforce'.

vectorshift.node.GoogleDriveIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Google Drive'.

vectorshift.node.GmailIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Gmail'.

vectorshift.node.GoogleSheetsIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Google Sheets'.

vectorshift.node.GoogleDocsIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Google Docs'.

vectorshift.node.GoogleCalendarIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Google Calendar'.

vectorshift.node.NotionIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    database_id: str = None,
    database_fields: list[str] = None,
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Notion'.

vectorshift.node.AirtableIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
    **kwargs
)

Akin to an IntegrationNode with integration_type = 'Airtable'.

vectorshift.node.HubSpotIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'HubSpot'.

vectorshift.node.SugarCRMIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'SugarCRM'.

vectorshift.node.LinearIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Linear'.

vectorshift.node.SlackIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Slack'.

vectorshift.node.DiscordIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Discord'.

vectorshift.node.CopperIntegrationNode(
    integration_id: str,
    action: str,
    inputs: dict[str, list[NodeOutput]],
    api_key: str = None,
)

Akin to an IntegrationNode with integration_type = 'Copper'.

TransformationNode

vectorshift.node.TransformationNode(
    transformation_name: str,
    inputs: dict[str, NodeOutput],
    api_key: str = None,
)

Represent a user-created transformation. The transformation must already exist on the VectorShift platform, so that it can be referenced by its name. An API call is made upon initialization to retrieve relevant transformation data, meaning an API key is required.

Inputs:

  • inputs: A map of input names to strings of NodeOutputs, which depends on the specific transformation. The inputs should match the expected names and data types of the specific integration and function. There are currently no checks on the input, so it is up to your discretion to ensure that the NodeOutputs you provide to the transformation node are compatible with the transformation.

Parameters:

  • transformation_name: The name of the user-created transformation being represented. Must be provided.

  • api_key: The API key to be used when retrieving information about the transformation from the VectorShift platform.

Outputs:

Outputs are determined from the specific transformation. They are currently given data type Any.

set_input(input_name: str, input: NodeOutput)
set_inputs(inputs: dict[str, NodeOutput])
set_api_key(api_key: str)

Setters for the node's parameters and inputs. The node currently does not support changing the transformation itself; to do this, a new replacement node should be created.

Models

OpenAILLMNode

vectorshift.node.OpenAILLMNode(
    model: str,
    system_input: str|NodeOutput,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024,
    temperature: float = 1.0,
    top_p: float = 1.0,
    stream_response: bool = False,
    json_response: bool = False,
    personal_api_key: str = None
)

Represents an OpenAI LLM. These models take in two main inputs: one "system" input that describes the background or context for generating the text, and one input for the prompt itself. For instance, the system input could be used for telling the model that it is an assistant for a specific task, and the prompt input could be a task-related question.

Optionally, text variables can be inserted into the system and prompt in an analogous manner to TextNode.

Inputs:

  • system_input: The output corresponding to the system prompt. Should have data type Text. Can also be a string.

  • prompt_input: The output corresponding to the prompt. Should have data type Text. Can also be a string.

  • text_inputs: A map of text variable names to NodeOutputs expected to produce the text for the system and prompt, if they are strings containing text variables. Each NodeOutput should have data type Text. Each text variable in system_input and prompt_input, if they are strings, should be included as a key in text_inputs. When the pipeline is run, each NodeOutput's contents are interpreted as text and substituted into the variable's places.

Parameters:

  • model: The specific OpenAI model to use. We currently support the following models:

    • gpt-3.5-turbo supporting up to 4096 tokens

    • gpt-3.5-turbo-instruct supporting up to 4096 tokens

    • gpt-3.5-turbo-16k supporting up to 16384 tokens

    • gpt-4 supporting up to 8192 tokens

    • gpt-4-32k supporting up to 32768 tokens

    • gpt-4-turbo supporting up to 128000 tokens

    • gpt-4-turbo-preview supporting up to 128000 tokens

  • max_tokens: How many tokens the model should generate at most. Note that the number of tokens in the provided system and prompt are included in this number. They should be below the model-specific constraints listed above.

  • temperature: The temperature used by the model for text generation. Higher temperatures generate more diverse but possibly irregular text.

  • top_p: If top-p sampling is used, controls the threshold probability. Under standard text generation, only the most probable next token is used to generate text; under top-p sampling, the choice is made randomly among all tokens (if they exist) with predicted probability greater than the provided parameter p. Should be between 0 and 1.

  • stream_response: A flag setting whether or not to return the model output as a stream or one response.

  • json_response: A flag setting whether or not to return the model output in JSON format.

  • personal_api_key: An optional parameter to provide if you have a personal OpenAI account and wish to use your API key.

Outputs:

  • response: The generated text, with data type Text.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_system(system_input: str|NodeOutput)
set_prompt(prompt_input: str|NodeOutput)
set_text_input(text_var: str, input: NodeOutput)
remove_text_input(text_var: str)
set_text_inputs(text_inputs: dict[str, NodeOutput])
set_model(model: str)
set_max_tokens(max_tokens: int)
set_temperature(temperature: float)
set_top_p(top_p: float)

Setters (and removers) for model parameters and inputs. The function of setters for text inputs is analogous to those for TextNode.

PromptLLMNode

vectorshift.node.PromptLLMNode(
    llm_family: str,
    model: str,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024,
    temperature: float = 1.0,
    top_p: float = 1.0
)

A general class for LLMs which take in a single text prompt input (unlike OpenAILLMNode, which expects two inputs). Optionally, text variables can be inserted into the prompt input in an analogous manner to TextNode.

We categorize LLMs into families (e.g. by Anthropic, Meta, etc.), denoted by the llm_family parameter. Each family comes with different models. Each specific model has its own max_tokens limit. See below for a list of model families, offered models, and their corresponding token limits.

Inputs:

  • prompt_input: The output corresponding to the prompt. Should have data type Text. Can also be a string.

  • text_inputs: A map of text variable names to NodeOutputs expected to produce the text for the variables. Each NodeOutput should have data type Text. Each text variables in text should be included as a key in text_inputs. When the pipeline is run, the NodeOutput's contents are interpreted as text and substituted into the variable's places. If text contains no text variables, this can be empty.

Parameters:

  • llm_family: The overall family of LLMs to use.

  • model: The specific model within the family of models to use.

  • max_tokens: How many tokens the model should generate at most. Note that the number of tokens in the provided system and prompt are included in this number. They should be below the model-specific limits listed below.

  • temperature: The temperature used by the model for text generation. Higher temperatures generate more diverse but possibly irregular text.

  • top_p: If top-p sampling is used, controls the threshold probability. Under standard text generation, only the most probable next token is used to generate text; under top-p sampling, the choice is made randomly among all tokens (if they exist) with predicted probability greater than the provided parameter p. Should be between 0 and 1.

Outputs:

  • response: The generated text, with data type Text.

Note: This node returns a single output, so it can be accessed directly via the output() method.

Supported LLMs

set_prompt(prompt_input: str|NodeOutput)
set_text_input(text_var: str, input: NodeOutput)
remove_text_input(text_var: str)
set_text_inputs(text_inputs: dict[str, NodeOutput])
set_model(model: str)
set_max_tokens(max_tokens: int)
set_temperature(temperature: float)
set_top_p(top_p: float)

Setters (and removers) for model parameters and inputs. The function of setters for text inputs is analogous to those for TextNode.

Specific Prompt LLM Nodes

vectorshift.node.AnthropicLLMNode(
    model: str,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024,
    temperature: float = 1.0,
    top_p: float = 1.0
)

Represents an Anthropic LLM. Akin to a PromptLLMNode with llm_family = 'anthropic'.

vectorshift.node.CohereLLMNode(
    model: str,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024,
    temperature: float = 1.0,
    top_p: float = 1.0
)

Represents a Cohere LLM. Akin to a PromptLLMNode with llm_family = 'cohere'.

vectorshift.node.AWSLLMNode(
    model: str,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024,
    temperature: float = 1.0,
    top_p: float = 1.0
)

Represents an AWS (Amazon) LLM. Akin to a PromptLLMNode with llm_family = 'aws'.

vectorshift.node.MetaLLMNode(
    model: str,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024,
    temperature: float = 1.0,
    top_p: float = 1.0
)

Represents a Meta LLM. Akin to a PromptLLMNode with llm_family = 'meta'.

vectorshift.node.OpenSourceLLMNode(
    model: str,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024,
    temperature: float = 1.0,
    top_p: float = 1.0
)

Represents an open-source LLM. Akin to a PromptLLMNode with llm_family = 'open_source'.

vectorshift.node.GoogleLLMNode(
    model: str,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024,
    temperature: float = 1.0,
    top_p: float = 1.0
)

Represents a Google LLM. Akin to a PromptLLMNode with llm_family = 'google'.

ImageGenNode

vectorshift.node.ImageGenNode(
    model: str,
    image_size: int|tuple[int, int],
    num_images: int,
    prompt_input: str|NodeOutput,
    text_inputs: dict[str, NodeOutput] = {}
)

Represents a text-to-image generative model.

Inputs:

  • prompt_input: The text prompt for generating the image(s). Should have data type Text.

Parameters:

  • model: The specific text-to-image model used. We currently support the following models:

    • DALL-E 2: supported image sizes 256, 512, and 1024, can generate 1-5 images

    • Stable Diffusion XL: supported image size 512, can generate 1 image

    • DALL-E 3: supported image sizes 1024, (1024, 1792) and (1792, 1024), can generate 1 image

  • image_size: The size of the image (e.g. if this is set to 512, then 512 x 512 images will be generated; if set to a tuple (a, b), then a x b images will be generated). Must be one of the valid sizes for the model as listed above.

  • num_images: The number of images to generate. Must be one of the valid numbers for the model as listed above.

Outputs:

  • images: The generated image(s), with data type List[ImageFile].

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_prompt(prompt_input: str|NodeOutput)
set_text_input(text_var: str, input: NodeOutput)
remove_text_input(text_var: str)
set_text_inputs(text_inputs: dict[str, NodeOutput])
set_model_params(
    model: str, 
    image_size: int|tuple[int, int], 
    num_images: int
)

Setters (and removers) for model parameters and inputs. The function of setters for text inputs is analogous to those for TextNode.

SpeechToTextNode

vectorshift.node.SpeechToTextNode(
    model: str, 
    audio_input: NodeOutput
)

Represents a speech-to-text generative model.

Inputs:

  • audio_input: The audio file to be converted to text. Should have data type AudioFile.

Parameters:

  • model: The specific speech-to-text model to use. We currently only support the model OpenAI Whisper.

Outputs:

  • output: The transcribed text, with data type Text.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_model(model: str)
set_audio_input(audio_input: NodeOutput)

Setters for model parameters and inputs.

ImageToTextNode

vectorshift.node.ImageToTextNode(
    model: str, 
    system_input: str|NodeOutput, 
    prompt_input: str|NodeOutput,
    image_input: NodeOutput,
    text_inputs: dict[str, NodeOutput] = {},
    max_tokens: int = 1024, 
    temperature: float = 1.0,
    top_p: float = 1.0,
    stream_response: bool = False, 
    json_response: bool = False, 
    personal_api_key: str = None
)

Represents an (OpenAI) image-to-text LLM. These models take in three main inputs: a system and prompt input analogous to OpenAILLMNode, and an image input. Text variables can be optionally inserted.

Inputs:

  • system_input: The output corresponding to the system prompt. Should have data type Text. Can also be a string.

  • prompt_input: The output corresponding to the prompt. Should have data type Text. Can also be a string.

  • text_inputs: A map of text variable names to NodeOutputs expected to produce the text for the system and prompt, if they are strings containing text variables. Each NodeOutput should have data type Text. Each text variable in system_input and prompt_input, if they are strings, should be included as a key in text_inputs. When the pipeline is run, each NodeOutput's contents are interpreted as text and substituted into the variable's places.

  • image_input: The output corresponding to the image. Should have data type ImageFile.

Parameters:

  • model: The specific OpenAI model to use. We currently only support the model gpt-4-vision-preview.

  • max_tokens: How many tokens the model should generate at most. Note that the number of tokens in the provided system and prompt are included in this number. Should be no larger than 4096.

  • temperature: The temperature used by the model for text generation. Higher temperatures generate more diverse but possibly irregular text.

  • top_p: If top-p sampling is used, controls the threshold probability. Under standard text generation, only the most probable next token is used to generate text; under top-p sampling, the choice is made randomly among all tokens (if they exist) with predicted probability greater than the provided parameter p. Should be between 0 and 1.

  • stream_response: A flag setting whether or not to return the model output as a stream or one response.

  • json_response: A flag setting whether or not to return the model output in JSON format.

  • personal_api_key: An optional parameter to provide if you have a personal OpenAI account and wish to use your API key.

Outputs:

  • response: The generated text, with data type Text.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_system(system_input: str|NodeOutput)
set_prompt(prompt_input: str|NodeOutput)
set_image(image_input: NodeOutput)
set_text_input(text_var: str, input: NodeOutput)
remove_text_input(text_var: str)
set_text_inputs(text_inputs: dict[str, NodeOutput])
set_model(model: str)
set_max_tokens(max_tokens: int)
set_temperature(temperature: float)
set_top_p(top_p: float)

Setters (and removers) for model parameters and inputs. The function of setters for text inputs is analogous to those for TextNode.

Data Loaders

DataLoaderNode

vectorshift.node.DataLoaderNode(
    loader_type: str, 
    inputs: dict[str, list[str | NodeOutput]],
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

A general-purpose node representing the retrieval of data from a third-party source. The names and data types of inputs and outputs are dependent on the specific loader (the loader type). Inputs can either be string parameters or NodeOutputs from earlier nodes. See below for a list of loader types and their corresponding inputs/outputs.

For most data loaders, the output is a list of documents. The optional parameters chunk_size and chunk_overlap then determine how those documents are formed, specifying the size and stride in tokens of each document. For instance, if the total data size is 1000 tokens, a size and overlap of 500 and 0 will give 2 documents (tokens 1-500, 501-1000), while an overlap of 250 gives 3 (tokens 1-500, 250-750, 501-1000).

Inputs:

  • inputs: A map of input names to lists of either strings or NodeOutputs, which depends on the specific loader. (If the input field is known, a string can directly be supplied.)

Parameters:

  • loader_type: The specific data loader. Should be one of the valid data loader types listed below.

  • chunk_size: The maximum size of each document in tokens, if the node returns a List[Document].

  • chunk_overlap: The amount of overlap between documents in tokens, if the node returns a [List[Document].

Outputs:

  • output: The data loaded by the node. The data type depends on the specific loader.

Note: This node returns a single output, so it can be accessed directly via the output() method.

Supported Data Loader Types

set_chunk_size(chunk_size: int)
set_chunk_overlap(chunk_overlap: int)
set_input(input_name: str, input: str|list[NodeOutput])
set_inputs(inputs: dict[str, list[str|NodeOutput]])

Setters for node parameters and inputs.

Specific Data Loader Nodes

vectorshift.node.FileLoaderNode(
    files_input: list[NodeOutput],
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

Akin to a DataLoaderNode with loader_type = 'File' and inputs being {'file': files_input}.

vectorshift.node.CSVQueryLoaderNode(
    query_input: str | NodeOutput,
    csv_input: NodeOutput,
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

Akin to a DataLoaderNode with loader_type = 'CSV Query' and inputs being {'query': [query_input], 'csv': [csv_input]}.

vectorshift.node.URLLoaderNode(
    url_input: str | NodeOutput,
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

Akin to instantiating a DataLoaderNode with loader_type = 'URL' and inputs being {'url': [url_input]}.

vectorshift.node.WikipediaLoaderNode(
    query_input: str | NodeOutput,
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

Akin to instantiating a WikipediaLoaderNode with loader_type = 'Wikipedia' and inputs being {'query': [query_input]}.

vectorshift.node.YouTubeLoaderNode(
    url_input: str | NodeOutput,
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

Akin to instantiating a DataLoaderNode with loader_type = 'YouTube' and inputs being {'url': [url_input]}.

vectorshift.node.ArXivLoaderNode(
    query_input: str | NodeOutput,
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

Akin to a DataLoaderNode with loader_type = 'Arxiv' and inputs being {'query': [query_input]}.

vectorshift.node.SerpAPILoaderNode(
    api_key_input: str | NodeOutput,
    query_input: NodeOutput,
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

Akin to a DataLoaderNode with loader_type = 'SerpAPI' and inputs being {'apiKey': [api_key_input], 'query': [query_input]}.

vectorshift.node.GitLoaderNode(
    repo_input: str | NodeOutput,
    chunk_size: int = 400,
    chunk_overlap: int = 0
)

Akin to a DataLoaderNode with loader_type = 'Git' and inputs being {'repo': [repo_input]}.

ApiLoaderNode

vectorshift.node.ApiLoaderNode(
    method: str, 
    url: str, 
    headers: list[tuple[str, str]], 
    param_type: str, 
    params:list[tuple[str, str]]
)

A node which executes an API call and returns its results. Constructor inputs essentially define the parameters of the API call and should all be strings.

Inputs:

None.

Parameters:

  • method: The API method. Should be one of 'GET', 'POST', 'PUT', 'DELETE', or 'PATCH'.

  • url: The API endpoint to call.

  • headers: A list of tuples of strings, representing the headers as key-value pairs.

  • param_type: The types of API parameters, either 'Body' or 'Query'.

  • params: A list of tuples of strings, representing the parameters as key-value pairs.

Outputs:

  • output: The data returned from the API call, of data type Text.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_url(url: str)
set_method(method: str)
set_param_type(param_type: str)
set_headers(headers: list[tuple[str, str]])
set_params(self, params: list[tuple[str, str]])

Setters for node attributes.

Search and Knowledge Bases

KnowledgeBaseNode

vectorshift.node.KnowledgeBaseNode(
    query_input: NodeOutput,
    base_id: str = None,
    base_name: str = None,
    username: str = None,
    org_name: str = None,
    max_docs_per_query: int = 2,
    enable_filter: bool = False,
    filter_input: str|NodeOutput = None,
    rerank_documents: bool = False, 
    alpha: float = 0.5,
    api_key: str = None,
)

vectorshift.node.VectorStoreNode(
    ...
)

References a particular permanent Knowledge Base, queries it, and returns the results. The Knowledge Base should already exist on the VectorShift platform, so that it can be referenced by its ID or name. If the ID or name are not provided, the node represents a generic Knowledge Base whose details need to be provided before it is run. If the ID is provided, an API call is made when a pipeline containing this node is saved to retrieve relevant data, meaning an API key is required.

Knowledge Bases are representations of Vector Stores within pipelines; this class is synonymous with a VectorStoreNode. However, the VectorStoreNode name is deprecated.

It is also possible to construct KnowledgeBaseNodes from Vector Store objects. See the method from_obj below.

Inputs:

  • query_input: The query to the Knowledge Base, which should have data type Text.

Parameters:

  • base_id: The ID of the Knowledge Base being represented.

  • base_name: The name of the Knowledge Base being represented. At least one of base_id and base_name should be provided. If both are provided, base_id is used to search for the Knowledge Base object. If both are omitted, the node represents a generic Knowledge Base object which must be set up before being run.

  • username: The username of the user owning the Knowledge Base.

  • org_name: The organization name of the user owning the Knowledge Base, if applicable.

  • max_docs_per_query: The maximum number of documents from the Knowledge Base to return from the query.

  • enable_filter: Flag for whether or not to add an additional filter to query results.

  • filter_input: The additional filter for results if enable_filter is True. Should be a NodeOutput of type Text or a string.

  • rerank_documents: Flag for whether or not to rerank documents.

  • alpha: The value of alpha for performing searches (weighting between dense and sparse indices). Ignored if the Knowledge Base is not hybrid.

  • api_key: The API key to be used when retrieving information about the Knowledge Base from the VectorShift platform.

Outputs:

  • results: The documents returned from the Knowledge Base, with data type List[Document].

vectorshift.node.KnowledgeBaseNode.from_obj(
    obj: vectorshift.vectorstore.VectorStore,
    query_input: NodeOutput,
    api_key: str = None,
)

A static method to construct a Knowledge Base node from a Vector Store object. The Knowledge Base will automatically be saved to the VectorShift platform when the method is run.

Arguments:

  • obj: The Knowledge Base to be represented by the node.

  • query_input: The query to the Knowledge Base, which should have data type Text.

  • api_key: The API key to be used when saving the Knowledge Base to the VectorShift platform.

set_knowledge_base(
    base_id: str, 
    base_name:str, 
    username: str, 
    org_name: str
)
set_enable_filter(enable_filter: bool)
set_rerank_documents(rerank_documents: bool)
set_max_docs_per_query(max_docs_per_query: int)
set_query_input(self, query_input: NodeOutput)
set_filter_input(filter_input: str|NodeOutput)
set_alpha(alpha: float)
set_api_key(api_key: str)

Setters for node parameters and inputs.

VectorDBLoaderNode

vectorshift.node.VectorDBLoaderNode(
    documents_input: list[NodeOutput]
)

Load text documents into a new temporary vector database that can be later queried. Once the pipeline finishes running, the database is deleted.

Note: Deprecated in favor of SemanticSearchNode.

Inputs:

  • documents_input: A list of one or more NodeOutputs to be loaded into the vector database. Each NodeOutput should have data type Text.

Parameters:

None.

Outputs:

  • database: The resulting vector database, with data type VectorDB.

Note: This node returns a single output, so it can be accessed directly via the output() method.

VectorDBReaderNode

vectorshift.node.VectorDBLoaderNode(
    query_input: NodeOutput,
    database_input: NodeOutput,
    max_docs_per_query: int = 1
)

Query a temporary vector database and return its results, similar to a data loader node.

Note: Deprecated in favor of VectorQueryNode.

Inputs:

  • query_input: The query to the vector database, which should have data type Text.

  • database_input: The vector database to query, which should have data type VectorDB.

  • max_docs_per_query: The maximum number of documents from the vector database to return from the query.

Parameters:

None.

Outputs:

  • results: The documents returned from the vector database, with data type List[Document].

Note: This node returns a single output, so it can be accessed directly via the output() method.

SemanticSearchNode

vectorshift.node.SemanticSearchNode(
    query_input: list[NodeOutput],
    documents_input: list[NodeOutput],
    max_docs_per_query: int = 2,
    enable_filter: bool = False, 
    filter_input: str|NodeOutput = None, 
    rerank_documents: bool = False
)

vectorshift.node.VectorQueryNode(
    ...
)

Create a new temporary vector database, load documents into it, run one or more queries, and return the results. Once the pipeline finishes running, the database is deleted. Akin to chaining together a VectorDBLoaderNode and VectorDBReaderNode.

This class is synonymous with a VectorQueryNode. However, the VectorQueryNode name is deprecated.

Inputs:

  • query_input: The query/queries to the vector database. Each NodeOutput should have data type Text.

  • documents_input: A list of one or more NodeOutputs to be loaded into the vector database. Each NodeOutput should have data type Text.

  • max_docs_per_query: The maximum number of documents from the vector database to return from the query.

  • enable_filter: Flag for whether or not to add an additional filter to query results.

  • filter_input: The additional filter for results if enable_filter is True. Should be a NodeOutput of type Text or a string.

  • rerank_documents: Flag for whether or not to rerank documents.

Outputs:

  • result: The documents returned from the vector database, with data type List[Document].

set_enable_filter(enable_filter: bool)
set_rerank_documents(rerank_documents: bool)
set_max_docs_per_query(max_docs_per_query: int)
set_query_input(self, query_input: list[NodeOutput])
set_documents_input(self, documents_input: list[NodeOutput])
set_filter_input(filter_input: str|NodeOutput)

Setters for node parameters and inputs.

Logic

LogicConditionNode

vectorshift.node.LogicConditionNode(
    inputs: dict[str, NodeOutput],
    conditions: list[tuple[str, str]],
    else_value: str
)

This node allows for simple control flow. It takes in one or more inputs, which are given labels (akin to variable names). It also takes in a list of conditions. Each condition is a tuple of two strings, a predicate that can reference the labels and a resulting label to be outputted by the node if the predicate is True. The predicate must be a string representing a boolean statement in Python; more information on predicates is located here.

The node has multiple outputs: one output corresponding to each of the conditions, along with an else output. If a predicate evaluates to True then that condition's output will emit the NodeOutput whose label is given by the predicate's corresponding label. If a predicate has evaluated to True, further predicates are not evaluated (i.e. the node only activates the first path that evaluates to True.) Otherwise, the output is not produced and downstream nodes from that output will not be executed. The outputs are labeled output-0, output-1, etc. for each of the conditions, and output_else.

For example, a simple example of using this condition node would be through composing the following nodes:

input_node = InputNode(name='input', input_type='text')
text1 = TextNode(text='text 1') 
text2 = TextNode(text='text 2') 
text3 = TextNode(text='text 3') 
cond_node = LogicConditionNode(
    inputs={ 
        'i': input_node.output(), 
        't1': text1.output(), 
        't2': text2.output(),
        't3': text3.output()
    }, 
    conditions=[
        ('i=="hello"', 't1'), 
        ('i=="goodbye"', 't2')
    ], 
    else_value='t3'
) 

where cond_node has outputs output-0, output-1, and output_else, which will forward the outputs of text1, text2, and text3 respectively. output-0 is only emitted if the input is "hello", and output-1 is only emitted if the input is "goodbye".

Inputs:

  • inputs: A map of output labels to NodeOutputs. Identifies each NodeOutput with a label. Can have any data type.

Parameters:

  • conditions: A list of conditions. As explained above, each condition is comprised of a predicate, which should be a string expressing a Python boolean statement, and output label. The predicates are evaluated in order of the list. The first predicate that evaluates to True will return the NodeOutput identified by the associated label. If no predicates evaluate to True, the NodeOutput identified by else_value is returned.

  • else_value: The label of the NodeOutput to emit in the else case.

Outputs:

  • Outputs named output-0, output-1, ..., output-n where n is one less than the total number of conditions. output-i equals the NodeOutput identified by the label in the ith (0-indexed) condition in the list, and is only produced if the ith predicate evaluates to True. The data type is the same as the original NodeOutput's data type.

  • An output named output-else, which emits the NodeOutput whose label is given by else_value. The data type is the same as the original NodeOutput's data type.

set_input(input_name: str, input: NodeOutput)
set_conditions(conditions: list[tuple[str, str]])
set_else_value(else_value: str)

Setters for node parameters and inputs.

output_index(i: int)

A method to get the NodeOutput corresponding to the ith (0-indexed) condition, i.e. outputs()['output-i'].

output_else()

A method to get the NodeOutput corresponding to the else condition, i.e. outputs()['output-else'].

LogicMergeNode

vectorshift.node.LogicMergeNode(
    inputs: list[NodeOutput]
)

This node merges together conditional branches that may have been produced by a LogicConditionNode, returning the output that is the first in the list to have been computed. As above, the documentation on conditional logic may provide helpful context.

Inputs:

  • inputs: Different outputs from conditional branches to combine.

Parameters:

None.

Outputs:

  • output: The merged output, of data type Union[ts], where ts represent the data types of all input NodeOutputs.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_inputs(inputs: list[NodeOutput])

Setter for the node inputs.

SplitTextNode

vectorshift.node.SplitTextNode(
    delimiter: str, 
    text_input: NodeOutput
)

Splits text into multiple strings based on a delimiter.

Inputs:

  • text_input: An output containing text to split, which should have data type Text.

Parameters:

  • delimiter: The string on which to split the text. If the text is foo, bar, baz and delimiter is ',', then the result corresponds to the strings 'foo', ' bar', and ' baz'.

Outputs:

  • output: All the split strings, of data type List[Text].

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_delimiter(delimiter: str)
set_text_input(input: NodeOutput)

Setters for the node parameters and inputs.

TimeNode

vectorshift.node.TimeNode(
    timezone: str, 
    delta: float, 
    delta_unit: str, 
    output_format: str
)

Outputs a time in text form given a time zone and optional offset.

Inputs:

None.

Parameters:

  • timezone: The timezone, which should be in pytz.

  • delta: The value of a time offset.

  • delta_unit: The units of a time offset. Should be one of 'seconds', 'minutes', 'hours', 'days', or 'weeks'.

  • output_format: The string format in which to output the time. Should be one of 'Timestamp', 'DD/MM/YYYY', or'DD-MM-YYYY / HH:MM:SS'.

Outputs:

  • output: The string representing the time.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_timezone(timezone: str)
set_delta(delta: float)
set_delta_unit(delta_unit: str)
set_output_format(output_format: str)

Setters for the node attributes.

Chat

ChatMemoryNode

vectorshift.node.ChatMemoryNode(
    memory_type: str,
    memory_window: int
)

Represents the chat memory for chatbots, i.e. the chat history that the chatbot can reference when generating messages.

Inputs:

None.

Parameters:

  • memory_type: The particular type of chat memory to use. Options:

    • 'Full - Formatted': The full chat history with formatting to indicate different messages. If this is selected, memory_window should not be provided.

    • 'Full - Raw': The full chat history without formatting. If this is selected, memory_window should not be provided.

    • 'Message Buffer': The last memory_window messages. If memory_window is not specified, defaults to 10.

    • 'Token Buffer': The last memory_window tokens. If memory_window is not specified, defaults to 2048.

Outputs:

  • value: The chat history, of data type Text if memory_type is 'Full - Formatted' and List[Dict] otherwise.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_memory_type(memory_type: str)
set_memory_window(memory_window: int)

Setters for node parameters.

DataCollectorNode

vectorshift.node.DataCollectorNode(
    input: NodeOutput,
    prompt: str = '', 
    fields: list[dict[str, str]] = []
)

Prompts a LLM to search the chat history based on a prompt and one or more example fields, returning a summary of the relevant information.

Inputs:

  • input: A NodeOutput which should represent the chat memory. Should be of data type Text or List[Dict] (coming from a ChatMemoryNode).

Parameters:

  • prompt: The string prompt to guide the kind of data from the chat history to collect.

  • fields: A list of dictionaries, each indicating a data field to collect. Each dictionary should contain the following fields: field, containing the name of the field; description, describing the field; and example, giving a text example of what to search for.

Outputs:

  • output: A selection of relevant information from the chat history, of data type Text.

Note: This node returns a single output, so it can be accessed directly via the output() method.

set_prompt(prompt: str)
add_field(field: dict[str, str])
set_fields(fields: list[dict[str, str]])
set_input(input: NodeOutput)

Setters for node parameters. Arguments for fields should follow the structure as described above.

Last updated