graphxplore.Dashboard package
With this subpackage you can generate plotly distribution plots based on
MetaData or a BaseGraph stored in a Neo4J
database. As visualizations, pie and stacked bar charts, as well as histogram, scatter and box plots are used. The
suitable type of visualization is automatically detected based on the type of data to plot. These plots are used in the
GraphXplore application, but you could use them e.g. for a publication or custom dashboard.
The MetadataDistributionPlotter class can be used to plot data type and value
distributions contained in a MetaData object. Data type distributions are
plotted as pie charts, value distributions as box plots for metric variables and as pie charts for categorical
variables. Note that the plotted data is already represented in the MetaData and
does not need to be queried from the dataset.
The DashboardBuilder class retrieves univariate (one variable) and bivariate (two
variables that are analyzed together) distributions from a Neo4J database and plots them. Here, the combined strength
of the detailed MetaData objects and efficient (potentially multi-table) joins
in Neo4J Cypher are leverage to create plots on the fly. Additionally, you can define subgroups within your dataset
using the GroupSelector class to combine and compare distributions in different
groups in one plot. Based on the distribution and data type the following plots are generated:
Univariate distributions:
Categorical variables: One or multiple pie charts (one per group)
Metric variables: One or multiple overlaid histograms (one per group)
Bivariate distributions:
Two categorical variables: One or multiple stacked bar charts (one per group)
One categorical and one metric variable: One or multiple subplots (one per group) each with multiple box plots (one per category)
Two metric variables: One ore multiple overlaid scatter plots (one per group)
Code might look like
>>> from graphxplore.Dashboard import DashboardBuilder, MetadataDistributionPlotter, HistogramYScaleType
>>> from graphxplore.MetaDataHandling import MetaData
>>> from graphxplore.GraphDataScience import GroupSelector
>>> from graphxplore.DataMapping.Conditionals import StringOperator, StringOperatorType
>>>
>>> meta = MetaData.load_from_json(filepath='/path/meta.json')
>>> metric = meta.get_variable(table='table', variable='metric_variable')
>>> categorical = meta.get_variable(table='other_table', variable='categorical_variable')
# data type distributions are always plotted with pie charts
>>> data_type_plot = MetadataDistributionPlotter.plot_data_type_distribution(metric)
# plot value distributions
>>> value_dist_box_plot = MetadataDistributionPlotter.plot_value_distribution(metric)
>>> value_dist_pie_chart = MetadataDistributionPlotter.plot_value_distribution(categorical)
# define subgroups for plots
>>> apple_condition = StringOperator(table='table', variable='food', value='apple', compare=StringOperatorType.Equals)
>>> pear_condition = StringOperator(table='table', variable='food', value='pear', compare=StringOperatorType.Equals)
>>> subgroups = {'apples' : GroupSelector(group_table='table', meta=meta, group_filter=apple_condition),
>>> 'pears' : GroupSelector(group_table='table', meta=meta, group_filter=pear_condition)}
# define the builder, add the subgroups and exclude the full table 'table' as a group
>>> builder = DashboardBuilder(meta=meta, main_table='table', base_graph_database='mydb', full_table_group=False,
>>> groups=subgroups, address='bolt://localhost:7687', auth=('my_user', 'my_password'))
# query metric variable and plot two overlaid histograms. Use fraction (instead of count) for y-scale
>>> hist_plot = builder.get_variable_dist_plot(table='table', variable='metric_variable',
>>> y_scale_type=HistogramYScaleType.Fraction)
# query categorical variable and get two pie charts
>>> pie_plot = builder.get_variable_dist_plot(table='other_table', variable='categorical_variable')
# query both variables and plot as two subplots (per group) with box plots (one per category of 'categorical_variable')
# notice how the two variables can originate from different tables (must both be reachable via foreign table relations from 'table')
>>> bivariate_box_plot = builder.get_correlation_plot(first_table='table', first_var='metric_variable',
>>> second_table='other_table', second_var='categorical_variable')
Module contents
- class graphxplore.Dashboard.DashboardBuilder(meta: MetaData, main_table: str, base_graph_database: str, full_table_group: bool = True, groups: Dict[str, GroupSelector] | None = None, address: str = 'bolt://localhost:7687', auth: Tuple[str, str] = ('neo4j', ''))[source]
Bases:
objectThis class generates plots univariate and bivariate distributions by querying a
BaseGraphstored in a Neo4J database. The plots are generated using the plotly package. Additionally, subgroups ofmain_tableprimary keys can be defined to jointly plot and compare distributions of groups.- Parameters:
main_table (str) – The origin table of primary keys used for the plot
base_graph_database (str) – The name of the
BaseGraphNeo4J databasefull_table_group (bool) – If
True, all primary keys ofmain_tableare used as a group. Defaults toTruegroups (Dict[str, GroupSelector] | None) – Dictionary of name and
GroupSelectorfor the defined subgroups. Must havemain_tableas their group table. Defaults to Noneaddress (str) – The address of the Neo4J DBMS
auth (Tuple[str, str]) – User and password of the Neo4J DBMS
- get_correlation_plot(first_table: str, first_var: str, second_table: str, second_var: str) Figure[source]
Generates a
plotly.graph_objects.Figurefor the bivariate distribution offirst_variableandsecond_variable. For two metric variables a scatter plot is generated, for a pair of metric and categorical variables multiple box plots are generated, and for two categorical variables stacked bar plots are used. All necessary data is queried from the Neo4J database- Parameters:
first_table (str) – The table of
first_varfirst_var (str) – The first variable for the distribution
second_table (str) – The table of
second_varsecond_var (str) – The second variable for the distribution
- Returns:
Returns the plotted figure which can e.g. be used in streamlit or notebooks
- Return type:
Figure
- get_variable_dist_plot(table: str, variable: str, y_scale_type: HistogramYScaleType | None = None) Figure[source]
Generates a
plotly.graph_objects.Figurefor the univariate distribution ofvariable. Ifvariableis metric, a plot of multiple histograms (one for each group) is generated. Ifvariableis categorical, multiple pie charts are generated and combined into one plot. All necessary data is queried from the Neo4J database- Parameters:
table (str) – the table of the variable
variable (str) – The variable for the distribution plot
y_scale_type (HistogramYScaleType | None) – The y-scale type. If group sizes are very imbalanced,
HistogramYScaleType.Fractionshould be preferred
- Returns:
Returns the plotted figure which can e.g. be used in streamlit or notebooks
- Return type:
Figure
- class graphxplore.Dashboard.HistogramYScaleType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
str,Enum- Count = 'Count'
- Fraction = 'Fraction'
- class graphxplore.Dashboard.MetadataDistributionPlotter[source]
Bases:
object- static plot_data_type_distribution(var_info: VariableInfo) Figure[source]
Plots the data type distribution of a variable as a pie chart
- Parameters:
var_info (VariableInfo) – The variable info for the distribution to plot
- Returns:
Returns the plotted figure which can e.g. be used in streamlit or notebooks
- Return type:
Figure
- static plot_value_distribution(var_info: VariableInfo) Figure[source]
Plots the value distribution of a variable. For metric variables, a box plot is generated. For categorical variables, a pie plot is generated
- Parameters:
var_info (VariableInfo) – The variable info for the distribution to plot
- Returns:
Returns the plotted figure which can e.g. be used in streamlit or notebooks
- Return type:
Figure