graphxplore.Dashboard package

With this subpackage you can generate plotly distribution plots based on MetaData or a BaseGraph stored in a Neo4J database. As visualizations, pie and stacked bar charts, as well as histogram, scatter and box plots are used. The suitable type of visualization is automatically detected based on the type of data to plot. These plots are used in the GraphXplore application, but you could use them e.g. for a publication or custom dashboard.

The MetadataDistributionPlotter class can be used to plot data type and value distributions contained in a MetaData object. Data type distributions are plotted as pie charts, value distributions as box plots for metric variables and as pie charts for categorical variables. Note that the plotted data is already represented in the MetaData and does not need to be queried from the dataset.

The DashboardBuilder class retrieves univariate (one variable) and bivariate (two variables that are analyzed together) distributions from a Neo4J database and plots them. Here, the combined strength of the detailed MetaData objects and efficient (potentially multi-table) joins in Neo4J Cypher are leverage to create plots on the fly. Additionally, you can define subgroups within your dataset using the GroupSelector class to combine and compare distributions in different groups in one plot. Based on the distribution and data type the following plots are generated:

Univariate distributions:
- Categorical variables: One or multiple pie charts (one per group)
- Metric variables: One or multiple overlaid histograms (one per group)
Bivariate distributions:
- Two categorical variables: One or multiple stacked bar charts (one per group)
- One categorical and one metric variable: One or multiple subplots (one per group) each with multiple box plots (one per category)
- Two metric variables: One ore multiple overlaid scatter plots (one per group)

Code might look like

>>> from graphxplore.Dashboard import DashboardBuilder, MetadataDistributionPlotter, HistogramYScaleType
>>> from graphxplore.MetaDataHandling import MetaData
>>> from graphxplore.GraphDataScience import GroupSelector
>>> from graphxplore.DataMapping.Conditionals import StringOperator, StringOperatorType
>>>
>>> meta = MetaData.load_from_json(filepath='/path/meta.json')
>>> metric = meta.get_variable(table='table', variable='metric_variable')
>>> categorical = meta.get_variable(table='other_table', variable='categorical_variable')
# data type distributions are always plotted with pie charts
>>> data_type_plot = MetadataDistributionPlotter.plot_data_type_distribution(metric)
# plot value distributions
>>> value_dist_box_plot = MetadataDistributionPlotter.plot_value_distribution(metric)
>>> value_dist_pie_chart = MetadataDistributionPlotter.plot_value_distribution(categorical)
# define subgroups for plots
>>> apple_condition = StringOperator(table='table', variable='food', value='apple', compare=StringOperatorType.Equals)
>>> pear_condition = StringOperator(table='table', variable='food', value='pear', compare=StringOperatorType.Equals)
>>> subgroups = {'apples' : GroupSelector(group_table='table', meta=meta, group_filter=apple_condition),
>>>              'pears' : GroupSelector(group_table='table', meta=meta, group_filter=pear_condition)}
# define the builder, add the subgroups and exclude the full table 'table' as a group
>>> builder = DashboardBuilder(meta=meta, main_table='table', base_graph_database='mydb', full_table_group=False,
>>>                            groups=subgroups, address='bolt://localhost:7687', auth=('my_user', 'my_password'))
# query metric variable and plot two overlaid histograms. Use fraction (instead of count) for y-scale
>>> hist_plot = builder.get_variable_dist_plot(table='table', variable='metric_variable',
>>>                                            y_scale_type=HistogramYScaleType.Fraction)
# query categorical variable and get two pie charts
>>> pie_plot = builder.get_variable_dist_plot(table='other_table', variable='categorical_variable')
# query both variables and plot as two subplots (per group) with box plots (one per category of 'categorical_variable')
# notice how the two variables can originate from different tables (must both be reachable via foreign table relations from 'table')
>>> bivariate_box_plot = builder.get_correlation_plot(first_table='table', first_var='metric_variable',
>>>                                                   second_table='other_table', second_var='categorical_variable')

Module contents

class graphxplore.Dashboard.DashboardBuilder(meta: MetaData, main_table: str, base_graph_database: str, full_table_group: bool = True, groups: Dict[str, GroupSelector] | None = None, address: str = 'bolt://localhost:7687', auth: Tuple[str, str] = ('neo4j', ''))[source]

Bases: object

This class generates plots univariate and bivariate distributions by querying a BaseGraph stored in a Neo4J database. The plots are generated using the plotly package. Additionally, subgroups of main_table primary keys can be defined to jointly plot and compare distributions of groups.

Parameters:

meta (MetaData) – The metadata of the BaseGraph
main_table (str) – The origin table of primary keys used for the plot
base_graph_database (str) – The name of the BaseGraph Neo4J database
full_table_group (bool) – If True, all primary keys of main_table are used as a group. Defaults to True
groups (Dict[str, GroupSelector] | None) – Dictionary of name and GroupSelector for the defined subgroups. Must have main_table as their group table. Defaults to None
address (str) – The address of the Neo4J DBMS
auth (Tuple[str, str]) – User and password of the Neo4J DBMS

get_correlation_plot(first_table: str, first_var: str, second_table: str, second_var: str) → Figure[source]

Generates a plotly.graph_objects.Figure for the bivariate distribution of first_variable and second_variable. For two metric variables a scatter plot is generated, for a pair of metric and categorical variables multiple box plots are generated, and for two categorical variables stacked bar plots are used. All necessary data is queried from the Neo4J database

Parameters:

first_table (str) – The table of first_var
first_var (str) – The first variable for the distribution
second_table (str) – The table of second_var
second_var (str) – The second variable for the distribution

Returns:

Returns the plotted figure which can e.g. be used in streamlit or notebooks

Return type:

Figure

get_variable_dist_plot(table: str, variable: str, y_scale_type: HistogramYScaleType | None = None) → Figure[source]

Generates a plotly.graph_objects.Figure for the univariate distribution of variable. If variable is metric, a plot of multiple histograms (one for each group) is generated. If variable is categorical, multiple pie charts are generated and combined into one plot. All necessary data is queried from the Neo4J database

Parameters:

table (str) – the table of the variable
variable (str) – The variable for the distribution plot
y_scale_type (HistogramYScaleType | None) – The y-scale type. If group sizes are very imbalanced, HistogramYScaleType.Fraction should be preferred

Returns:

Returns the plotted figure which can e.g. be used in streamlit or notebooks

Return type:

Figure

class graphxplore.Dashboard.HistogramYScaleType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Count = 'Count'

Fraction = 'Fraction'

class graphxplore.Dashboard.MetadataDistributionPlotter[source]

Bases: object

static plot_data_type_distribution(var_info: VariableInfo) → Figure[source]

Plots the data type distribution of a variable as a pie chart

Parameters:: var_info (VariableInfo) – The variable info for the distribution to plot
Returns:: Returns the plotted figure which can e.g. be used in streamlit or notebooks
Return type:: Figure

static plot_value_distribution(var_info: VariableInfo) → Figure[source]

Plots the value distribution of a variable. For metric variables, a box plot is generated. For categorical variables, a pie plot is generated

Parameters:: var_info (VariableInfo) – The variable info for the distribution to plot
Returns:: Returns the plotted figure which can e.g. be used in streamlit or notebooks
Return type:: Figure