Meet Ultipa Manager: Toolkits for Data Scientists

Meet Ultipa Manager: Toolkits for Data Scientists

In this series on Ultipa Manager, our Graph Database Management System (GDBMS, or Graph DBMS), we’d like to discuss some core values it may bring to your business. We started with graph visualization last week. This week let’s look at how Ultipa Manger can serve as a wonderful toolkit for data scientists. There will be a 3rd and final post talking about easy data migration next week.

No alt text provided for this image

Data scientist is a role usually equipped with many analytical and statistical skills, they reply on a wide range of tools, to name a few, data visualization and analytics platforms (e.g., Tableau, PowerBI), machine learning tools (e.g., TensorFlow, h2o.ai), programming languages (e.g., Python, R) as well as SQL and Excel. This long list may seem quite daunting, though no need to be, because no one has the capacity to master all of the expertise. The work of data science in one company can be very different from the other, so as the set of tools. From many experiences, the daily tools can be narrowed down to just a few, and Ultipa Manager is carefully designed to be the core toolkit for data scientists to use on daily basis while dealing with graph analytics work. 

Query Intuitively

It’s quite acceptable that the mastery of SQL is listed at the top of skills expected for data scientists. As more businesses have shifted from relational database to graph database, the knowledge of GQL (Graph Query Language) becomes essential. GQL of Ultipa, called UQL, is fully integrated into Ultipa Manager. Many commonly used UQL commands are designed as clickable buttons and links in the interface of Ultipa Manager for quick use; meanwhile, it offers a powerful yet easy-to-consume UQL editor for maximum flexibility.

No alt text provided for this image
Quickly View Edges of an Edge Schema

As this example shows, clicking the view (eye) icon next to the name of an edge schema, the corresponding UQL command is filled into the UQL editor automatically:

find().edges({@Transaction}) as edges return edges{*} limit 10        

Pressing the Enter key to run this command immediately, 10 results are presented below the UQL editor. In case you want to retrieve more edges, simply change the number after limit in the command.

UQL can do many different things, similar to SQL, it can be divided into some sub-categories:

  • DQL (Data Query Language): Commands to query data (nodes, edges and paths) from the database, such as find().nodes, find().edges, khop(), ab(), autonet().
  • DDL (Data Definition Language): Commands to define data structure (schema and property) in the database, such as create().node_schema(), create().edge_schema(), alter().node_property(), alter().edge_property().
  • DML (Data Manipulation Language): Commands to create, edit or delete data in the database, such as insert(), update(), delete().
  • DCL (Data Control Language): Commands to administrate the database itself, such as create().policy(), create().user(), grant().user(), revoke().user().

One typical query in UQL is path template query, i.e., use nodes and edges to assemble a path template. Let’s dive into it by looking at several examples.

Firstly, we prepare a movie community graph. The graph has node schemas @user, @movie and edge schemas @like, @follow. Their relationships can be depicted in the following model: @like edge points from @user to @movie, @follow edge exists between @user nodes. For each schema, we created some properties respectively to describe it, such as properties name and age for @user, and property name for @movie.

No alt text provided for this image
A Simple Data Model

Example 1: Compare the average age of users who like movie To Live and The Capernaum.

In UQL, a path template is formed with n() representing node and e() representing edge, in bracket is where to put filtering conditions. More precisely, re() means right (outward) edges, and le() left (inward) edges. With this knowledge, we write UQL in Ultipa Manager for this query and get the result:

No alt text provided for this image
UQL and Result of Example 1

In the query, aliases u1 and u2 are defined to get the corresponding @user nodes; in return, use the avg() function to aggregately calculate average user age, and use the table() function to organize the results in a table.

Example 2: Find paths that describe who like movies To Live and Capernaum both.

This question can be interpreted into the following path template, the two @movie nodes are known:

No alt text provided for this image
Path Template of Example 2

Write UQL in Ultipa Manager for this query and get the result:

No alt text provided for this image
UQL and Result of Example 2

Example 3: In order to recommend movies for user UUID = 27, find who this user is following and what movies they like. Count the number of users who like each movie.

Abstract this question into a path template, the first @user node is known:

No alt text provided for this image
Path Template of Example 3

Write UQL in Ultipa Manager for this query and get the result:

No alt text provided for this image
UQL and Result of Example 3

Among the users who user 27 is following, it’s reasonable to assume that the movies they like have overlaps, thus we group these movies by their names (m1.name). In the end, return these groups and the count of movies in each group.

It’s clearly shown by these examples that the path template of UQL offers high flexibility to data scientists to achieve almost any query they have in mind, in the previous post about graph visualization, we also demonstrated its ability to conduct deep penetration. UQL contains other query commands for various purposes, such as khop() to find neighbor nodes that a node can reach in K hops the shortest, autonet() to form the network of multiple specified nodes, etc. Read the documentation of UQL for more details.

SQL is deemed easy to learn, however it contains some operations just contrary to the logic of human brain, such as nested statement, table-join, etc. UQL, as many learners would agree (even those with very limited programming background), is a skill fairly easy to build thanks to its simplicity and intuitive syntax. Most importantly, graph is a high-dimensional way to model after real-world business or problem.

Rich Collection of Graph Algorithms

Algorithms are no stranger to data scientists. An algorithm can be viewed as a packaged step-by-step procedure to solve a problem. Data Scientists don’t necessarily know the underlying implementations and optimizations of the algorithms, though the knowledge of mathematics helps them choosing the right algorithms for certain questions.

Ultipa has publicly released nearly 50 graph algorithms, from the classic centrality, similarity and connectivity, to the modern community detection, label propagation and link prediction, even including the recent blazing star, graph embedding, or machine learning types of algorithms. All of the algorithms are well written in C++ with great accuracy, robustness, and most importantly, efficiency. These algorithms can be applied in the tasks of ranking, recommendation, classification, clustering, fraud detection, prediction, optimal solution, and so on.

No alt text provided for this image
Graph Algorithm Library in Ultipa Manager

Giving an example of running the K-means algorithm in Ultipa Manager. K-means algorithm classifies nodes in the graph into K groups; node is more similar to nodes in the same group than those in other groups. The criterion of classification is properties of nodes. Each node is considered as a vector formed by several numeric properties; the algorithm iteratively puts nodes that are closer to each other into the same group. One of two methods can be selected to calculate the distance between nodes, Euclid distance or cosine similarity.

We will apply K-means algorithm on a delivery graph. Below is an illustration. Destinations are abstracted as nodes, and give two properties (location_x, location_y) to the node to describe its geographic location. Import this graph into Ultipa Manager.

No alt text provided for this image
Illustration of a Delivery Graph

The goal is to use K-means algorithm to divide all destinations into 3 groups, which will be allocated to 3 delivery men, so that each delivery man’s routes are minimized by assuring the assigned destinations are the closest to each other.

Below is the algorithm execution interface of Ultipa Manger. Fill in the parameters on the top section then run the algorithm, result is then given in the table below. If you’re interested in the meaning of each parameter, please read the documentation of K-means algorithm.

No alt text provided for this image
K-means Algorithm Execution Interface of Ultipa Manager

Draw the 3 groups in the illustration. The division is consistent with intuition.

No alt text provided for this image
Illustration of a Delivery Graph – K-means Grouping Result

For those who want to learn more about graph algorithms, we recommend to read Ultipa Graph Analytics & Algorithms documentation. If you like videos, we have a YouTube channel that is constantly being updated.

Closing Remarks

Next time when you look for a tool to augment your daily work, from data processing to data visualization, and to data analysis, remember to try out Ultipa Manager. The easiest way to get to it is by accessing Ultipa Cloud, the DBaaS (Database as a service) platform by Ultipa.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics