Welcome to PixelPipes framework documentation
PixelPipes provides a framework for creating infinite streams of data samples. As it is clear from the its name, PixelPipes is primarily intendend to be used on visual data with the emphasis on deep learning techniques.
Most deep models require a large amound of samples to be processed in a training phase. These samples have to be sampled from a dataset and bundled into batches that can be processed by the model. Besides sampling, another important concept in deep learning for computer vision is data augmentation where each real sample can be expanded in potentially infinite collection of modified samples.
All these steps involve a lot of work that is usually accomplished with many different libraries and is tied to a specific machine learning framework (e.g. PyTorch, TensorFlow). This makes the portability of the generation algorithm is limited. The sequence of training data is usually also not repeatable due to different pseudorandom generators involved, making debugging and experiment reproducably more challenging.
PixelPipes takes a different approach, it combines both sampling and augmentation into a single data processing pipeline of more or less compex operations. This way each sample in a virtually infinite sequence of samples is referenced only by its index (i.e. position) in the stream. The pipeline is constructed from a directed-acyclic graph (DAG) description and is optimized to reduce redundancy. This concept has the following benefits (some of which have not yet been implemented at this state of development, but are possible):
Because of the way the pipeline is conditioned on a single index, the data stream is easily repeatable, the training or testing procedure reproducably end easy to debug.
The pipeline is written in C++, making it fast and keeping the memory footprint reasonable. It also exploits the fact that individual samples are generated independently making synchronization easier.
Each pipeline can be saved and loaded from a file. When loaded, the pipeline can be run from C++ directly, making the stream accessible from other languages besides Python.
External file dependencies can also be tracked clearly, making data stream easily transferable.
Despite a C++ core, the framework is extendable with new operations.
The source code for the PixelPipes framework is availabe on GitHub. Contributions to the project are welcome, please read the development <development.html> document on how to get started.
Overview
Installation and quick-start
The source-code for the PixelPipes framwork can be downloaded from Github, but the most convenient way of testing it is by installing a prebuilt version. The package can be installed as a Python wheel package, currently from a testing PyPi compatible repository located here.
`
> pip install pixelpipes -i https://data.vicos.si/lukacu/pypi/
`
Simple example
To demonstrate a very simple example of using a PixelPipes pipeline (without any image operations), lets sample random numbers from a pre-defined list and display them.
Note that this sequence will be the same every time you run the script with the same sample indices. This example is very simple, but shows how a pipleine is built. More complex examples with image operations are presented here.
Architecture and concepts
This document describes the concepts of the PixelPipes framework in extensive technical details. If you are more interested in practical examples, check out the collection of tutorials.
Similar to some machine learning frameworks, most notably TensorFlow, a PixelPipes stream is formalized as a computational directed acyclic graph (DAG). This graph is constructed in Python. The graph is then transformed into a sequence of operations that are executed in C++. These operations can be very primitive, e.g. summing up two numbers, or they can be very specialized, e.g. a specific image operation. Most users will only work with Python frontend to describe a stream, but it can be beneficial to know what lies beneath.
The framework is therefore divided into two parts with a binding API bridge:
a C++ core containing all low-level operations together with some binding code,
a Python frontend that provides a high-level way of describing a computational graph.
TODO: image overview
Python
As already said, the Python part of the framework is just a frontent that makes assembling a pipeline easy. It is not involved in the execution in any way and once a pipeline is assembled, it can be executed from C++ directly (or from any other language). The frontend makes graph description easier by introducing organizational concepts like macros and resources and by allowing the user to leverage abundance of Python tools to import data and transform it into pipeline primitives.
Below is a list of Python-level concepts that are important to know when you start using the framework:
Node: represents an high-level operaton with zero or more inputs that produces a single output
Graph: a collection of connected nodes, a graph describes dependencies between operations, each operation, represented as a node in a graph can accept zero or more inputs and produces a single output. There are two types of nodes, operations and macros.
Operation: special nodes that map directly to individual operations in the final pipeline.
Macro: A macro is a combination of operations that are frequently used together and are represented as a single node. During compilation macros are expanded to their inner subgraph until only the basic operation nodes remain. Macros are written in Python and combined normal Python language together with DAG generation, they can therefore base generation on input types, use loops and conditional statements.
Constant: Special type of operation node, used to introduce constant values into the pipeline. Constats can be scalars, but also tensors and lists.
Compilation: a process of iterativelly reducing macro nodes in a graph to basic operations, removing duplicates where possible and ordering operations according to dependencies.
Resource: Special type of macros that handle multi-field type abstractions, can be used to make the graph more organized, but get dissolved during compilation.
C++
The native part of the framework is written in C++, this part is less accessible when only using the framework, but it is important for
Operation: A stateless algorithm that accepts zero or more inputs and produces a single output. Direct mapping of the operation node concept.
Token: Tokens are data units passed from operation to operation.
Type: Describes type trait of a token.
Pipeline: A sequence of operations.
Module: Operations are organized into modules. Some modules are a part of the core framework and the separation is less noticeable. Other modules can be written as extensions to the framework, providing additional operations.
Tutorials
Below is a collection of tutorials covering different concepts and using different well-known datasets. The tutorials cover topics linearly, so each of them bases on the terminology used in the previous tutorials.
Note
Note that many examples use OpenCV python package for sample visualization purposes. The package is not a requirement for the PixelPipes package and has to be installed manually.
Building your first graph
A graph can be constructed by leveraging Python concepts like context and operator overloading (it can also be created manually by adding every node to a graph, but this is a much less readable approach). Lets start by extending the initial example of sampling numbers from a list. We can insert a list of lists, sample one of them then sample two elements from it and sum them together.
Note
Be careful when using graph contexts, use only one context in normal situations. The context mechanism supports also hierarchical contexts however, nodes will only be added to the most recent context opened.
Utility decorators wrap even more boilerplate for frequently used cases, the example above can be slightly modified so that entire function is wrapped in a graph context.
To compile the pipeline use
Sampling image patches
In this tutorial we will first sample random images from a directory, then we will cut rectangular patches from them at random positions. As you will see, all this can be achieved using a few operations, since suitable image manipulation macros are already available as standalone nodes.
For this example we will be using example images, located in examples/images directory. The entire code for the example is available here, lets look at the pipeline.
The pipeline first generates a list of images from a given directory. Then a random image entry is selected from a list. Images are resources,
to actually load an image from a file, you have to access the image field. The sampling of a patch can be achieved using a
combination of pixelpipes.image.geometry.RandomPatchView
and pixelpipes.image.geometry.ViewImage
macros.
Getting started with MNIST
MNIST is something like a hello-world dataset of deep learning community. In this example we will look at how to get real data into the pipeline and how to sample it. The full code for the example is available <mnist.py>, we will just commend on the core parts.
Sampling MNIST data
CIFAR and resource lists
CIFAR is one of the most well known datasets for computer vision, espectially for representation learning. This tutorial will show how to generate a stream of random samples from CIFAR-10 collection. In contract to the previous tutorial on MNIST, we will introduce the concept of resources and resource lists as a high-level way of structuring data into datasets. The full code for the example is available here, we will just commend on the core parts.
Resources
First, a few words about resources. Resources are an organizational tool that helps with complex data. They allow us to group several data connections between nodes and group them together into a structure-like connections. Individual fields can then be queried and manipulated.
It is important to know that resources are built upon macros and are therefore not really used in the final pipeline, they are dissolved during compilation and all the their fields that are not required to produce stream output are stripped away.
Define a resource list
Lets define a macro that will produce a dataset resource. This can be done manually by overloading the Macro class, but it is recommended to use a ResourceListSource as a base since it makes this easier.
Note
This example assumes that you have downloaded the Python version of the CIFAR dataset from the dataset website and that you have extracted the files to the example directory.
The ResourceListSource expects subclasses to implement the load method that generates the fields, each field is expected to be a list or a NumPy array, they are also expected to be of equal length (for NumPy arrays this means the number of rows). We can also have virtual fields that generate a snippet once their content is requested, but this is a topic for another tutorial where such fields are needed.
Since we have defined the dataset as a resource list, the final graph is now quite simple:
Notice that special macros are available to process resource lists efficiently. In this case a random resource is sampled from a list and its two fields are returned as output.
Augmenting images
Augmenting images can be viewed as generating multiple different samples from a same image by transforming it with image processing operations that change its pixel-level congent, but not its semantic meaning.
VOT and segmented resource lists
Visual Object Tracking (VOT) datasets are sequence datasets for visual tracking performance evaluation. Usually, these kind of datasets would not be used for training, however, other similar datasets exist and the example can be adapted for them. In this example we will introduce a segmented resource sequence, a data abstraction for storing many sequences of resources into a single list. We will also use the official toolkit of the VOT challenge to retrieve data. Note that the toolkit is not required anymore once a pipeline is constructed (and serialized).
Batching data for training
Simply generating data sequentially is ok for testing pipeline output, but for training deep models using SGD or related optimization methods, we would like to efficiently generate batches of samples utilizing multiple cores. Since the execution of pipeline is done in C++, this is possible to do from Python using a thread pool. But the frameworks also provides helper classes called sinks that do this. Special sinks are provided for the most popular deep learning frameworks that anable easy integration.
One important thing that sinks assume is that all outputs are scalars or have a fixed size for every sample. This allows stacking into tensors that are necessary for efficient deep learning.
For the bervity of examples below we will be using the MNIST pipeline that we have created in the MNIST tutorial.
NumPy sink
The default sink generates a tuple of
PyTorch sink
A simple example on how to download, prepare and convert PyTorch MNIST dataset into acceptable type for injecting it into pixelpipes graph.
TensorFlow sink
A simple example on how to download, prepare and convert TensorFlow MNIST dataset into acceptable type for injecting it into pixelpipes graph.
List of nodes
Despite being Python classes, nodes follow a consistent logic are documented separately from the rest of the API. Below is a list of nodes included in the PixelPipes core.
Core nodes
- operation pixelpipes.graph.Constant(value)
Generates a constant in the pipeline
value [any]:
- operation pixelpipes.graph.Debug(source, prefix)
Debug operation enables low-level terminal output of the content that is provided to it. The token content will usually not be printed entierly, only its shape, the value will only be displayed for simple scalar types as well as strings.
Note that tese nodes will be passed to the pipeline only if the compiler is configered with debug flag, otherwise they will be stripped from the graph.
source [Token [None]]: Result of which node to print
prefix [str = ]: String that is prepended to the output
- operation pixelpipes.graph.Output(output, label)
Output node that accepts a single input, enables outputting tokens from the final pipeline. Tokens are returned as a tuple, their order is determined by the order of adding output nodes to the graph. Additionally you may also label outputs with non-unique lables that can be used to resolve outputs.
output [Token [None]]: Output token
label [str = default]: Nonunique label of the output
- operation pixelpipes.graph.RandomSeed
Returns a pseudo-random number, useful for initializing pseudo-random operations. The seed itself is sampled from a pseudo-random generator that produces the same sequence of seeds for a specific position in the data sequence. This is the corner-stone of repeatability of the pipeline.
- operation pixelpipes.graph.ReadFile(filename)
Read file from disk to memory buffer. File is read in binary mode.
filename [Token [char, None]]: Path to the image file
- operation pixelpipes.graph.SampleIndex
Returns current sample index. This information can be used instead of random seed to initialize random generators where sequential consistentcy is required.
- node pixelpipes.compiler.Variable(name, default)
Variable placeholder that can be overriden later
name [str]:
default []:
Scalar operations
- operation pixelpipes.numbers.Add(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.Ceil(source)
Ceil number and convert to integer.
source [Token [float]]: Number on which ceil operation is performed
- operation pixelpipes.numbers.Divide(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.Equal(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.Floor(source)
Floor number and convert to integer.
source [Token [float]]: Number to be rounded
- operation pixelpipes.numbers.Greater(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.GreaterEqual(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.Lower(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.LowerEqual(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.Maximum(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.Minimum(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.Modulo(a, b)
a [Token [int]]:
b [Token [int]]:
- operation pixelpipes.numbers.Multiply(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.NotEqual(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.Power(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- macro pixelpipes.numbers.RandomBoolean(seed)
Samples a boolean value with equal probability
seed [Token [int] = @[random]]:
- operation pixelpipes.numbers.Round(source)
Round number to closest integer and convert to integer type.
source [Token [float]]: Number to be rounded
- operation pixelpipes.numbers.SampleNormal(mean, sigma, seed)
Samples values between from normal distribution.
mean [Token [float] = 0]: Mean value of normal distribution
sigma [Token [float] = 1]: Standard deviation
seed [Token [int] = @[random]]:
- operation pixelpipes.numbers.SampleUnform(min, max, seed)
Samples random value between min and max value.
min [Token [float]]: Minimun value
max [Token [float]]: Maximum value
seed [Token [int] = @[random]]:
- operation pixelpipes.numbers.Subtract(a, b)
a [Token [float]]: First operand
b [Token [float]]: Second operand
- operation pixelpipes.numbers.TensorAdd(a, b, saturate)
a [Token [None]]: First operand
b [Token [None]]: Second operand
saturate [bool = False]: Saturate cast
- operation pixelpipes.numbers.TensorDivide(a, b, saturate)
Divides image with another image or scalar (per-element multiplication).
a [Token [None]]: First operand
b [Token [None]]: Second operand
saturate [bool = False]: Saturate cast
- operation pixelpipes.numbers.TensorMultiply(a, b, saturate)
Multiplies image with another image or scalar (per-element multiplication).
a [Token [None]]: First operand
b [Token [None]]: Second operand
saturate [bool = False]: Saturate cast
- operation pixelpipes.numbers.TensorSubtract(a, b, saturate)
Subtracts two images with same size and number of channels or an image and a number.
a [Token [None]]: First operand
b [Token [None]]: Second operand
saturate [bool = False]: Saturate cast
- macro pixelpipes.expression.Expression(source, variables)
Numeric expression with variables
Macro that expands into an arithmetic expression parsed from an input string.
- Inputs:
source: text representation of arithmetic expression
variables: a map of inputs that are inserted into the expression
Category: arithmetic, macro
source [str]:
variables []:
Flow control
- operation pixelpipes.flow.Conditional(true, false, condition)
Node that executes conditional selection, output of branch “true” will be selected if the “condition” is not zero, otherwise output of branch “false” will be selected. Note that the inferred type of these two branches should match as much as possible, otherwise the inferred type of this node will cause problems with dependent nodes.
true [Token [None]]: Use this data if condition is true
false [Token [None]]: Use this data if condition is false
condition [Token [int]]: Condition to test
- macro pixelpipes.flow.Switch(inputs, weights, seed)
Random switch between multiple branches, a macro that generates a tree of binary choices based on a random variable. The probability of choosing a defined branch
inputs []: Two or more input branches
weights []: Corresponing branch weights
seed [Token [int] = @[random]]:
Lists
- operation pixelpipes.list.CompareEqual(a, b)
a [Token [float, None]]:
b [Token [float, None]]:
- operation pixelpipes.list.CompareGreater(a, b)
a [Token [float, None]]:
b [Token [float, None]]:
- operation pixelpipes.list.CompareGreaterEqual(a, b)
a [Token [float, None]]:
b [Token [float, None]]:
- operation pixelpipes.list.CompareLower(a, b)
a [Token [float, None]]:
b [Token [float, None]]:
- operation pixelpipes.list.CompareLowerEqual(a, b)
a [Token [float, None]]:
b [Token [float, None]]:
- operation pixelpipes.list.CompareNotEqual(a, b)
a [Token [float, None]]:
b [Token [float, None]]:
- operation pixelpipes.list.Concatenate(inputs)
inputs []: Two or more input lists
- operation pixelpipes.list.FileList(list)
String list of file patchs. Use this operation to inject file dependencies into the pipeline.
list []:
- operation pixelpipes.list.FilterSelect(parent, filter)
Generate a sublist based on values from a filter list
parent [Token [None]]:
filter [Token [int, None]]:
- operation pixelpipes.list.GetElement(parent, index)
Returns an element from a list for a given index
parent [Token [None]]:
index [Token [int]]:
- macro pixelpipes.list.GetRandom(source, seed)
source [Token [None]]:
seed [Token [int] = @[random]]:
- operation pixelpipes.list.Length(parent)
Returns a list length
parent [Token [None]]:
- operation pixelpipes.list.ListAsTable(parent, row)
Transform list to table
parent [Token [None, None]]: Source list
row [Token [int]]: Row size, total length of list must be its multiple
- operation pixelpipes.list.LogicalAnd(a, b)
a [Token [bool, None]]:
b [Token [bool, None]]:
- operation pixelpipes.list.LogicalNot(a)
a [Token [bool, None]]:
- operation pixelpipes.list.LogicalOr(a, b)
a [Token [bool, None]]:
b [Token [bool, None]]:
- operation pixelpipes.list.MakeList(inputs)
Builds list from inputs. All inputs should be of the same type as the first input, it determines the type of a list.
inputs []:
- operation pixelpipes.list.Modulo(a, b)
a [Token [int, None]]:
b [Token [int, None]]:
- operation pixelpipes.list.Permutation(length, seed)
Generates a list of numbers from 0 to length in random order.
length [Token [int]]:
seed [Token [int] = @[random]]:
- operation pixelpipes.list.Permute(source, seed)
Randomly permutes an input list
source [Token [None]]: Input list
seed [Token [int] = @[random]]:
- operation pixelpipes.list.Range(start, end, length, round)
Generates a list of numbers from start to end of a given length
start [Token [float]]:
end [Token [float]]:
length [Token [int]]:
round [Token [bool] = False]:
- operation pixelpipes.list.Remap(source, indices)
Maps elements from source list to a result list using indices from indices list.
source [Token [None]]:
indices [Token [int, None]]:
- operation pixelpipes.list.Repeat(source, length)
Repeat list element a number of times
source [Token [None]]: Element to repeat
length [Token [int]]: Number of repetitions
- operation pixelpipes.list.SublistSelect(parent, begin, end)
Selects a range from the source list as a new list.
parent [Token [None]]: Source list
begin [Token [int]]: Start index
end [Token [int]]: End index
- macro pixelpipes.list.Table(source)
Constant Table
- Inputs:
source: Table type
Category: list
source []:
Geometery
Images
Resources
- macro pixelpipes.resource.AppendField(source, name, value)
Produce a resource from an input resource and another field. Essentially just node renaming.
source [Resource ()]: Original resource
name [str]: Name of new field
value [Token [None]]: Value for new field
- macro pixelpipes.resource.ConditionalResource(true, false, condition)
Node that executes conditional selection, output of branch “true” will be selected if the “condition” is not zero, otherwise output of branch “false” will be selected.
true [Resource ()]: Use this data if condition is true
false [Resource ()]: Use this data if condition is false
condition [Token [int]]: Condition to test
- macro pixelpipes.resource.GetField(source, element)
This macro exposes only selected field of an input structure as an output, enabling processing of that data.
source [Resource ()]: Input resource
element [str]: Name of the structure field
- macro pixelpipes.resource.MakeResource(inputs)
Macro that generates a resource from given inputs
inputs [ = {}]: A map of inputs that are inserted into the expression
- macro pixelpipes.resource.list.GetLastResource(resources)
resources [Resource (__list_length)]:
- macro pixelpipes.resource.list.GetResource(resources, index)
resources [Resource (__list_length)]:
index [Token [int]]:
- macro pixelpipes.resource.list.GetResourceListLength(resources)
resources [Resource (__list_length)]:
- macro pixelpipes.resource.list.ListInterval(resources, begin, end)
resources [Resource (__list_length)]:
begin [Token [int]]:
end [Token [int]]:
- macro pixelpipes.resource.list.PermuteResourceSegments(resources, seed)
resources [Resource (__list_length, __list_seg_begin, __list_seg_end)]:
seed [Token [int] = @[random]]:
- macro pixelpipes.resource.list.PermuteResources(resources)
Randomly permutes the resource list
resources [Resource (__list_length)]:
- macro pixelpipes.resource.list.RandomResource(resources, seed)
Select a random resource from an input list of resources
resources [Resource (__list_length)]:
seed [Token [int] = @[random]]:
- macro pixelpipes.resource.list.RandomResourceSegment(resources, seed)
resources [Resource (__list_length, __list_seg_begin, __list_seg_end)]:
seed [Token [int] = @[random]]:
- macro pixelpipes.resource.list.RepeatResource(resource, length)
Returns a list of resources where an input resource is repeated a number of times
resource [Resource ()]: Resource to repeat
length [Token [int]]: Number of repetitions
- macro pixelpipes.resource.list.ResourceSegment(resources, index)
resources [Resource (__list_length, __list_seg_begin, __list_seg_end)]:
index [Token [int]]:
- macro pixelpipes.resource.list.SegmentCount(resources)
resources [Resource (__list_length, __list_seg_begin, __list_seg_end)]:
API documentation
The reference API documentation for the Python wrapper as well as the C++ exported symbols. Note that the wrapper documentation does not contain node classes that are instead documented separately as a list of available nodes..
Python API
Pipeline
- class pixelpipes.LazyLoadEnum(name)
Bases:
Mapping
Special enum class used to load mappings from the core library when they are needed for the first time.
- class pixelpipes.Pipeline(data: Iterable[PipelineOperation], optimize=True)
Bases:
object
Wrapper for the C++ pipeline object, includes additional metadata. This wrapper should be used instead of interacting with the C++ object directly.
- property metadata: mappingproxy
Accesses the pipeline metadata storage.
- Returns:
MappingProxyType: A string to string key-value storage.
- property outputs: List[str]
Returns labels for individual elements of the output tuple.
- run(index: int) Tuple[np.ndarray]
Executes the pipeline for a given index and resturns result
- Args:
index (int): Index of sample to generate. Starts with 1.
- Returns:
Tuple[np.ndarray]: Generated sample, a sequence of NumPy objects.
- class pixelpipes.PipelineOperation(id, name, arguments, inputs)
Bases:
tuple
- property arguments
Alias for field number 2
- property id
Alias for field number 0
- property inputs
Alias for field number 3
- property name
Alias for field number 1
- pixelpipes.include_dirs() List[str]
Returns a list of directories with C++ header files for pixelpipes core library. Useful when building pixelpipes modules.
- Returns:
List[str]: List of directories
- pixelpipes.link_dirs() List[str]
Returns a list of directories where a pixelpipe library can be found. Useful when building pixelpipes modules.
- Returns:
List[str]: List of directories
- pixelpipes.load_module(name) bool
- pixelpipes.read_pipeline(filename: str)
- pixelpipes.write_pipeline(filename: str, pipeline: Pipeline, compress: Optional[bool] = True) None
Serializes pipeline to a file with optional compression
- Args:
filename (str): Filename to use. pipeline (Pipeline): Pipeline to serialize. compress (Optional[bool], optional): Use GZIP compression or not. Defaults to True.
Graph
- operation pixelpipes.graph.Constant(value)
Generates a constant in the pipeline
value [any]:
- node pixelpipes.graph.Copy(source)
source [Token [None]]:
- operation pixelpipes.graph.Debug(source, prefix)
Debug operation enables low-level terminal output of the content that is provided to it. The token content will usually not be printed entierly, only its shape, the value will only be displayed for simple scalar types as well as strings.
Note that tese nodes will be passed to the pipeline only if the compiler is configered with debug flag, otherwise they will be stripped from the graph.
source [Token [None]]: Result of which node to print
prefix [str = ]: String that is prepended to the output
- class pixelpipes.graph.EnumerationInput(options, default=None, description='')
Bases:
Input
- coerce(value, _)
- dump(value)
- class pixelpipes.graph.Graph(prefix: Optional[Union[str, Reference]] = '')
Bases:
object
- static add_default(node: Node, name: str) bool
- commit()
- copy()
- static has_default()
- nodes()
- pipeline(fixedout=False, variables=None, output=None)
- reference(node: Node)
- class pixelpipes.graph.InferredReference(ref: str, typ: Data)
Bases:
Reference
,OperationProxy
A node reference with type already inferred. Using during compilation in macro expansion.
- property type
- class pixelpipes.graph.Input(reftype: Data, default: Optional[Union[str, float, int]] = None, description: Optional[str] = '')
Bases:
Attribute
- coerce(value, _)
- dump(value)
- reftype()
- macro pixelpipes.graph.Macro
- node pixelpipes.graph.Node
Base class for all nodes in a computation graph.
- exception pixelpipes.graph.NodeException(*args, node: Optional[Node] = None)
Bases:
Exception
- property node
- nodestack()
- print_nodestack()
- class pixelpipes.graph.NodeOperation(value)
Bases:
Enum
An enumeration.
- ADD = 4
- DIVIDE = 7
- EQUAL = 10
- GREATER = 14
- GREATER_EQUAL = 15
- INDEX = 2
- LENGTH = 3
- LOWER = 12
- LOWER_EQUAL = 13
- MODULO = 9
- MULIPLY = 6
- NEGATE = 1
- NOT_EQUAL = 11
- POWER = 8
- SUBTRACT = 5
- operation pixelpipes.graph.Operation
Base class of all atomic nodes that generate pipeline operations
- class pixelpipes.graph.OperationProxy
Bases:
object
- static query_operation(operation: NodeOperation, *qargs: Data)
- static register_operation(operation: NodeOperation, generator: Callable, *args: Data)
- operation pixelpipes.graph.Output(output, label)
Output node that accepts a single input, enables outputting tokens from the final pipeline. Tokens are returned as a tuple, their order is determined by the order of adding output nodes to the graph. Additionally you may also label outputs with non-unique lables that can be used to resolve outputs.
output [Token [None]]: Output token
label [str = default]: Nonunique label of the output
- operation pixelpipes.graph.RandomSeed
Returns a pseudo-random number, useful for initializing pseudo-random operations. The seed itself is sampled from a pseudo-random generator that produces the same sequence of seeds for a specific position in the data sequence. This is the corner-stone of repeatability of the pipeline.
- operation pixelpipes.graph.ReadFile(filename)
Read file from disk to memory buffer. File is read in binary mode.
filename [Token [char, None]]: Path to the image file
- class pixelpipes.graph.Reference(ref: Union[str, Reference])
Bases:
object
- property name
- static parse(value)
- operation pixelpipes.graph.SampleIndex
Returns current sample index. This information can be used instead of random seed to initialize random generators where sequential consistentcy is required.
- exception pixelpipes.graph.ValidationException(*args, node: Optional[Node] = None)
Bases:
NodeException
- pixelpipes.graph.outputs(*inputs, label='default')
- pixelpipes.graph.wrap_pybind_enum(bindenum)
Compiler
A compiler converts a graph to a sequence of operations.
- class pixelpipes.compiler.Compiler(fixedout=False, debug=False)
Bases:
object
Compiler object contains utilities to validate a graph and compiles it to a pipeline (a sequence of operations, written in native code) that can be executed to obtain output variables.
- build(graph: Graph, variables: Optional[Mapping[str, Number]] = None, output: Optional[Union[Container, Callable]] = None, optimize=True) Pipeline
Compiles the graph and builds a pipeline from it in one function.
- Args:
graph (Graph): _description_ variables (typing.Optional[typing.Mapping[str, numbers.Number]], optional): _description_. Defaults to None. output (typing.Optional[typing.Union[Container, typing.Callable]], optional): _description_. Defaults to None. optimize (bool, optional): Optimize conditional operations by inserting jumps into the pipeline.
- Returns:
Pipeline: Pipeline object
- static build_graph(graph: Union[Graph, Mapping[str, Node]], variables: Optional[Mapping[str, Number]] = None, output: Optional[str] = None, fixedout: bool = False) Pipeline
- compile(graph: Graph, variables: Optional[Mapping[str, Number]] = None, output: Optional[Union[Container, Callable]] = None) Iterable[PipelineOperation]
Compile a graph into a pipeline of native operations.
- Args:
graph (Graph): Graph representation
- Raises:
CompilerException: raised if graph is not valid
- Returns:
engine.Pipeline: resulting pipeline
- validate(graph: Union[Graph, Mapping[str, Node]])
Validates graph by interring input and output types for all nodes. An exception will be thrown if dependencies cannot be resolved or if output of a node is not compatible with an input specification of a dependant node.
- Args:
graph (typing.Mapping or Graph): Graph representation
- Raises:
ValidationException: Different validation errors share this exception type
- Returns:
dict: resolved types of all nodes
- exception pixelpipes.compiler.CompilerException
Bases:
Exception
- node pixelpipes.compiler.Variable(name, default)
Variable placeholder that can be overriden later
name [str]:
default []:
- pixelpipes.compiler.infer_type(node: Union[Reference, str], graph: Optional[Graph] = None, type_cache: Optional[Mapping[str, Data]] = None) Data
Computes output type for a given node by recursively computing types of its dependencies and calling validate method of a node with the information about their computed output types.
- Args:
node (typing.Union[Reference, typing.Type[Node]]): Reference of the node or raw value graph (Graph): Mapping of all nodes in the graph type_cache (typing.Mapping[str, types.Type], optional): Optional cache for already computed types. Makes repetititve calls much faster. Defaults to None.
- Raises:
ValidationException: Contains information about the error during node validation process.
- Returns:
types.Type: Computed type for the given node.
- pixelpipes.compiler.toposort(data)
Dependencies are expressed as a dictionary whose keys are items and whose values are a set of dependent items. Output is a list of sets in topological order. The first set consists of items with no dependences, each subsequent set consists of items that depend upon items in the preceeding sets.
Sinks
Sink is a utility class that execute a pipeline in multiple threads and stack sample outputs to batches.
- class pixelpipes.sink.AbstractDataLoader(batch: int, workers: Optional[Union[int, WorkerPool]] = None, offset: int = 0)
Bases:
object
- class _BatchIterator(commit, size: int, offset: int = 0)
Bases:
BatchIterator
- benchmark(n=100)
- class pixelpipes.sink.BatchIterator(commit, size: int, offset: int = 0)
Bases:
object
Abstract batch iterator base with most functionality for consumer agnostic multithreaded batching of samples.
- class pixelpipes.sink.PipelineDataLoader(pipeline: Pipeline, batch: int, workers: Optional[Union[int, WorkerPool]] = None, offset: Optional[int] = 0)
Bases:
AbstractDataLoader
- class _BatchIterator(commit, size: int, offset: int = 0)
Bases:
BatchIterator
- property pipeline
- class pixelpipes.sink.WorkerPool(max_workers: int = 1)
Bases:
ThreadPoolExecutor
Utilities
Utilites for more efficient common usecases.
- class pixelpipes.utilities.Counter
Bases:
object
Object based counter, each time it is called it returns a value greater by 1
- class pixelpipes.utilities.PersistentDict(root: str)
Bases:
object
A dictionary interface to a folder, with memory caching.
- pixelpipes.utilities.collage(pipeline: Pipeline, index: int, rows: int, columns: int, offset: Optional[int] = 0) ndarray
- pixelpipes.utilities.find_nodes(module=None)
- pixelpipes.utilities.graph(constructor)
- pixelpipes.utilities.pipeline(variables=None, fixedout=False, debug=False)
Types
Token type representation wrapper.
- pixelpipes.types.Boolean()
- pixelpipes.types.BooleanList(length=None)
- pixelpipes.types.Buffer(length=None)
- pixelpipes.types.Char()
- class pixelpipes.types.Data
Bases:
object
Abstract type base, represents description of token types accepted or returned by nodes.
- pixelpipes.types.Float()
- pixelpipes.types.FloatList(length=None)
- pixelpipes.types.Image(width: Optional[int] = None, height: Optional[int] = None, channels: Optional[int] = None, depth: Optional[str] = None)
Represents an image type. This type can be specialized with image width, height, number of channels as well as bit-depth.
- pixelpipes.types.Integer()
- pixelpipes.types.IntegerList(length=None)
- pixelpipes.types.List(element=None, length=None)
Type that represents a list of elements.
- pixelpipes.types.Point()
- pixelpipes.types.Points(length=None)
- pixelpipes.types.Rectangle()
- pixelpipes.types.Short()
- pixelpipes.types.String(length=None)
- class pixelpipes.types.Token(element=None, *shape)
Bases:
Data
- common(typ: Data) Data
Merge two types by finding their common type. By default this just looks if one type is castable into the other.
- property element
- pop()
- push(length=None)
- property rank
- squeeze()
- exception pixelpipes.types.TypeException
Bases:
Exception
- class pixelpipes.types.Union(*args: Data)
Bases:
Data
Denotes type that accepts any of the given inputs. Do not nest unions.
- pixelpipes.types.UnsignedChar()
- pixelpipes.types.UnsignedShort()
- pixelpipes.types.View()
- pixelpipes.types.cast_element(source: str, destination: str)
- pixelpipes.types.convert_element(element)
C++ API
Extending
PixelPipes contains a lot of operations used in data loading and augmentation in computer vision. Still, sometimes additional functionality is needed. Simple cases can be easily implemented by Writing new macros, more complex cases require writing new C++ operations wrapped in a new custom module.
Writing macros
A macro is a combination of operations that are frequently used together. It is written in Python and combined normal Python language together with DAG generation. Macros can change generated subgraph based on input type inferrence. Macros can also use other macros within them. During compilation all macros are reduced down to primitive operations. For this example lets write a macro that
Creating custom operations
Frequently used or complex operations can be included into the pipeline by crating and building a PixelPipes module. A module is a dynamic library written in C++ that contains operations. Operations are functions that are exposed in a special manner and can be integrated in operation pipeline.
Compiling and development
PixelPipes is a hybrid source-code project, it contains C++ and Python code. Its main build framework is CMake which is wrapped in distutils.
The C++ library does not require any external dependencies during runtime, internally dependencies (like OpenCV) are pinned to a fixed version, compiled as static libraries and linked into the binary library. The C++ code requires a fairly recent compiler, supporting C++17. Compilation processed was tested on GCC 10, Clang ?? and MSVC ??.
The Python C++ wrapper requires Pybind11 and Numpy. It also uses some other Python packages that are installed via Pip. A PyBind11 header library is used to generate Python bindings for the C++ core, it is installed as a Pip dependency.
For development and testing purposes, the libraries can be compiled inplace using the following commands:
1pip install cmake pybind11
2pip install -r requirements.txt
3python setup.py build_lib --inplace
4python setup.py build_ext --inplace
Submitting issues and patches
Note
At the moment there are no specific rules on submitting issues and patches, just use Github issue tracker.
Credits
PixelPipes is an open-source project, but the
Funding
The development of this package was supported by Sloveninan research agency (ARRS) projects Z2-1866, J2-316 and J7-2596.