Data Types

To help minimize errors when running pipelines, we tag node inputs and outputs with types.

To ensure that nodes operate with each other in well-formed ways, each node's inputs and outputs are tagged to expect specific types. We refer to these types as datatypes, as they describe the kind of data that moves between pipeline nodes. Typechecking can help catch errors before pipelines are saved and run. For instance, it isn't well-formed to have a language model node take in image data.

The following discussion is a bit more technical—the TL;DR is that you may receive errors when connecting nodes where the data types don't match. If so, you'll receive a descriptive error message explaining where the errors are.

Type Hierarchy

Some datatypes are subtypes of others: any integer value passed around in a pipeline, for instance, can be treated as a float, and other datatypes can be cast into string representations. We say t2 is a subtype of t1 if we can represent any value of type t2 as a t1 within the context of our pipeline operations. Warnings are printed if a supertype of an expected datatype is passed in, for "casting" to the subtype may not be well-formed.

The VectorShift platform currently supports the following basic datatypes, with levels of indentation indicating subtypes.

  • Text

    • Float

      • Int

    • Dict

      • Document

    • URL

  • VectorDB

  • File

    • TextFile

      • CSVFile

    • ImageFile

    • AudioFile

(The inclusion of floats and ints as subtypes of text is done here as we can simply cast to string representations.)

In addition, VectorShift also has compound data types:

  • List[t]s represent lists of other datatypes. If t' is a subtype of t, then List[t'] is a subtype of List[t]. (We consider a type to be a subtype of itself.) If t is a subtype of Text, then List[t]is a subtype of Text (converting list elements into their string representations).

  • Union[t1,...,tn] represent union types. It has as subtypes t1 through tn.

  • Any represents any datatype. It has as subtypes every other datatype.

Last updated