Data Types
To help minimize errors when running pipelines, we tag node inputs and outputs with types.
To ensure that nodes operate with each other in well-formed ways, each node's inputs and outputs are tagged to expect specific types. We refer to these types as datatypes, as they describe the kind of data that moves between pipeline nodes. Typechecking can help catch errors before pipelines are saved and run. For instance, it isn't well-formed to have a language model node take in image data.
The following discussion is a bit more technical—the TL;DR is that you may receive errors when connecting nodes where the data types don't match. If so, you'll receive a descriptive error message explaining where the errors are.
Type Hierarchy
Some datatypes are subtypes of others: any integer value passed around in a pipeline, for instance, can be treated as a float, and other datatypes can be cast into string representations. We say t2
is a subtype of t1
if we can represent any value of type t2
as a t1
within the context of our pipeline operations. Warnings are printed if a supertype of an expected datatype is passed in, for "casting" to the subtype may not be well-formed.
The VectorShift platform currently supports the following basic datatypes, with levels of indentation indicating subtypes.
Text
Float
Int
Dict
Document
URL
VectorDB
File
TextFile
CSVFile
ImageFile
AudioFile
(The inclusion of floats and ints as subtypes of text is done here as we can simply cast to string representations.)
In addition, VectorShift also has compound data types:
List[t]
s represent lists of other datatypes. Ift'
is a subtype oft
, thenList[t']
is a subtype ofList[t]
. (We consider a type to be a subtype of itself.) Ift
is a subtype ofText
, thenList[t]
is a subtype ofText
(converting list elements into their string representations).Union[t1,...,tn]
represent union types. It has as subtypest1
throughtn
.Any
represents any datatype. It has as subtypes every other datatype.
Last updated