Cook Recipes

You can use the code editor to build projects using datasets that you have stored in your Graphext workspace.

Using the code editor is like assembling code to execute your project and gives you more control over the configuration of your network and the transformations made to your dataset.

‍

‍

"Type a few lines of code, you create an organism."

- Richard Powers

‍

Projects are formed using nothing more than a number of steps. These steps are functions that accept some data and output new, transformed or enriched data. A recipe can have an arbitrary number of steps and can generate an arbitrary number of intermediate datasets. But, the output must always be a single dataset that serves as the basis for your project's network visualization.


‍

Opening the Code Editor

You can open the code editor at any point after selecting a dataset and before executing the project. As you choose options from the project sidebar, functions are added to your code editor.

To access the code editor, select a dataset from your Graphext workspace then click Open text editor from the bottom of the right sidebar window that appears.

‍

‍

How to Open the Code Editor

  1. Start from the 'Datasets' panel of your Graphext workspace.
  2. Select a dataset to start working with.
  3. A sidebar will appear on the right of your screen.
  4. You can open the code editor at any point by selecting Open text editor from the bottom of the sidebar.
  5. Start building your analysis using the sidebar visual editor.
  6. When you are ready, select Open text editor.
  7. Edit the code inside the code editor to customize your project setup.
  8. Consult the documentation on steps for more information on steps that you can add or customize.
  9. Return to the visual editor by selecting Open visual editor from the bottom of the right sidebar.
  10. Done ... Continue and execute your project when you have finished editing.

‍


‍

Connecting Data

When you open the code editor, you will have already chosen a dataset that serves as the main input for the recipe. This dataset is made available by default with the name ds, and so the simplest possible recipe is simply written as follows.

create_project(ds)

‍

‍

This builds a project with a single step called create_project, which accepts a dataset as input and has no output. This is a special case. In practice, you'll almost always want to somehow transform or enrich your dataset so adding other steps will almost always be necessary.

‍


What are Steps?

Steps are functions that get applied to your data or your project and affect one or the other in some specific and unique way.

‍

‍

Syntax

In general, the syntax for adding a step is very simple and always written as follows.

step_name(inputs, ..., {params}) -> (outputs, ...)

The inputs may be either specific columns of a dataset, a dataset itself, or a model. Details about the expected types of inputs depend on the specific step in question. Documentation on the different steps available in the code editor is available here.

‍

A step's inputs may be either specific columns of a dataset, a dataset itself, or a model.

‍

As the step's last argument in parenthesis, you can provide parameters to configure how the step will transform the input data. Finally, in another set of parenthesis (and separated by ->), you provide names for the outputs that the step will generate. Again, the outputs may be one or more columns or datasets.

‍

Read the documentation on Graphext steps to see which steps you can use to build projects.

‍

To differentiate between input datasets and columns, column names need to be prefixed with the name of the dataset it belongs to, while datasets can be referred to by their name only. In other words, ds refers to the dataset with the name ds and to pick out a specific column you'd use either ds.my_column or ds["my_column"]. The two forms are generally interchangeable, but the latter is required if a column name contains spaces.

‍

An Example Step

A simple step that splits the texts in a given column at the first comma, might be written as follows.

split_string(ds.text, {"pattern": ","}) -> (ds.left_part, ds.right_part)

The result of the split will be two new columns named left_part and right_part in the dataset ds. The columns resulting from the split will now be included in the project that is created.

‍

‍

Autocompleting Step Names

Usually, when you start typing the beginning of a step's name in the Recipe editor, the rest of the step's signature will autocomplete, including the default names of any outputs it creates. So you only need to change the names if you don't like the default ones or if they clash with other outputs you may have generated already.

‍


Parameters

Parameters let you configure how a step will process its inputs. The syntax of parameters corresponds to a valid json object, for those familiar with json or javascript. For those who are not, it's simply a number of quoted parameter names and corresponding values in between curly braces.

{"pattern": ","}

‍

Each step's individual documentation will describe its valid parameters. Invalid parameters will be highlighted by the Code Editor.

‍

In this example, "pattern" is the parameter's name and "," represents its value. In general, all parameter names must be quoted strings, while values may also be quoted strings as well as numbers, lists of numbers or strings or another nested object in curly braces, following the above rules.

‍


Stages of a Project Setup

In the context of building a project, steps combine in a sequence to help construct different stages of the project. Steps can be grouped together with regard to the stage of the project setup that they should be used in.

‍

‍

For instance, steps related to both filtering and enrichment belong to the stage of the project setup wherein your dataset is modified. The stages of setting up a project are as follows:

‍

Modify Data

During this stage, you will modify your original dataset by adding columns, training models or enriching your data. We classify these functions as; Transform - Enrichment - Aggregate, Join & Combine - Filtering & Sampling -Embedding - Models & Inference.

 # Reduce the dataset to a N-dimensional numeric vector embedding.
 
  embed_dataset(ds) -> (ds.embedding)

Create Graph

Some steps help us to create a network from your dataset. These steps have a dataset as their output - links. This new links dataset will have three columns; Source - Target - Weight. Each row in this dataset represents a link in your Graph - or network visualization.

 # Create network links calculating the similarity of embeddings (vectors)
 
  link_embeddings(ds.embedding, {
	  "metric": "euclidean",
	  "n_nearest": 15
  }) -> (links)

Cluster

Next follows a choice of steps used to construct clusters in your dataset.

 # Identify clusters in the network
 
  cluster_network(links) -> (ds["umap_cluster"])

Layout

In this stage of building your project, the series of steps help to create coordinates mapping your data points on the Graph. Each row in your data is given an x and a y coordinate. Importantly, these variables must be named x and y.

 # Reduce the dataset to 2 dimensions that can be mapped to x/y node positions
 
  layout_dataset(ds) -> (ds.x, ds.y)

Create Project

This is the final stage of setting up your project. There are two possible steps involved with this stage of execution:

  • create_project(ds) - Will build a project without links. This might have issues since it is currently experimental.
  • data_export(links,ds) - Will export your dataset to the project including links to plot your network.
 # Prepare project using the final dataset
 
  data_export(links, ds[!["embedding"]])

Configure

This stage is executed after Graphext has built your project and configures aspects of your project including the size, labels and color of nodes. Steps applied during this stage will also configure any other customized specifications you made whilst building the project including the order of variables in your project. There is no output of steps in this stage.

 # Configures the column that is used for coloring the nodes by default
 
  configure_node_color(ds["column_name"])

Create Insights

This stage adds automatic insights to your project. As with the previous stage, steps here have no output.

 # Create a new insight from the Graph section.

  create_graph_insight({
	   "title": "Graph colored by column_name",
	   "colorColumn": "column_name",
	   "label":"",
	   "relative":true
  })

Export to SQL

This final stage is not always required. The purpose of the steps here is to write your dataset output to a database. This is very useful for projects that are representing large amounts of data and can help to speed up loading times.

 # Export a given dataset to a specified SQL database.
 
  export_to_sql(ds: dataset, {"param": value})

Ordering Steps

The order of your steps should follow the structure of the stages set out above.

Whilst it is not always essential to follow this structure, working with datasets or dataset variables as inputs and outputs of your steps can quickly cause things to get messy if you aren't careful about the order of your steps.

For instance the first snippet below will embed the original dataset, and then will add a new column with the languages of texts.

embed_dataset(ds) -> (ds.embedding)
 infer_language(ds.text) -> (ds.language)

However, the second version will use the original dataset and the column language to calculate the embeddings.

infer_language(ds.text) -> (ds.language)
 embed_dataset(ds) -> (ds.embedding)

Errors

The Code Editor will highlight any errors in your code. It does this by continuously validating the steps that you write.

‍

‍

Code validation checks:

  • The existence of columns, datasets or steps you use.
  • That steps have the correct inputs - number of inputs and type of inputs.
  • That your steps have the correct outputs.
  • That a steps required parameters are used.
  • That you have the required API integrations.

If the transpiler finds errors, the recipe will not be valid and you will not be able to execute the project. In order to find out what the problem is, open the Code Editor, look for the mistake - highlighted in red - and hover over it. You will see a popup giving you more information about the error.

‍

Need Something Different?

We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.