Analyzing the Data#

Now for the fun to start! Let’s create our first analysis script. Create a new file in the analysis folder called noise_stats.py. Copy and paste the following code into the file.

Listing 11 noise_stats.py#
 1# import packages
 2import pandas as pd
 3from thot import ThotProject
 4
 5# initialize thot project
 6db = ThotProject()
 7
 8# get noise data from asset
 9noise_data = db.find_asset( { 'type': 'noise-data' } )
10
11# import noise data into a pandas data frame
12df = pd.read_csv( noise_data.file, header = 0, index_col = 0, names = [ 'trial', 'volume' ] )
13
14# compute statistics
15stats = df.describe()
16
17# create a new asset for the statistics
18stats_properties = {
19	'name': 'Noise Statistics',
20	'type': 'noise-stats',
21	'file': 'noise-stats.csv'
22}
23
24stats_path = db.add_asset( stats_properties, 'noise_stats' )
25
26# export the statistics to the new asset
27stats.to_csv( stats_path ) 

Let’s go through and break down what each chunk of code is doing.

  • lines 2-3: Import the packages we’re goin to use, namely Pandas and Thot. In this case we only need to use a small part: ThotProject.

  • line 6: Initialize the Thot project, giving us access to all the data stored within it.

  • line 9: Find the Noise Data Asset we made for each batch by search for Assets who have a type of ‘noise-data’.

  • line 12: Load the noise data into a Pandas DataFrame.

  • line 15: Compute statistics on the noise data.

  • lines 18-24: Create a new Asset to store the noise statistics in. Notice that the stats_properties dictionary we pass in mimics exactly the structure of the _asset.json files we created earlier. db.add_asset() accepts as its second argument an _id for the new Asset.

  • line 27: Saves the statistics to the new Asset.

Now we need to tell Thot which Containers to run this script from. This is done by creating Script Associations.

Again, we’ll see how to add Script Associations from both the Project and File Tree views.

From the Project view right click on Recipe A > Batch 1 and select Edit Scripts. This open the Script Associations dialog. Click the Add Script button at the top of the dialog and select the noise_stats.py Script we just created. We can ensure the Script Association was created successfully by changing the preview from Assets to Scripts.

Script Associations dialog.

Fig. 27 Script Associations dialog.#

For Recipe A > Batch 2 lets assign the same Script, but instead of right clicking on the Container to open the Script Associations dialog, double click on the Scripts preview of the Container. At the moment this is (none) for the Container.

Now switch to the File Tree view and select Recipe B > Batch 1. Click the Add Scripts button, and perform the same steps.

Add Scripts will add Scripts to the selected Container, while Set Scripts will remove any previously associated Scripts, and set them to match what is submitted.

Let’s run our first analysis! From the Project view switch the Assets preview so we can see our new Assets being created. Then, when you’re ready, click the Analyze button in the upper right of the workspace.

Warning

Running the analysis by pressing the Analyze button may give you an error. If this occurs please attempt to run the analysis from the command line.

To do this open up a terminal (Anaconda prompt on Windows) navigate to teh project root (data folder) and run thot run.

More information is available in the cli tab of this section.

Analyze button.

Fig. 28 Analyze button.#

When the analysis is running we can continue to work on our project, and when the analysis is complete we will get a notification pop up and the new Assets will appear in our preview.

Click here to download this project step.

Moving On Up#

Now that we have the statistics for each of our batches we can move up one level in our project tree to compile the statistics for each recipe. Let’s first make the analysis script calling it recipe_stats.py.

Listing 13 recipe_stats.py#
 1# include packages
 2import pandas as pd
 3from thot import ThotProject
 4
 5# initialize thot
 6db = ThotProject()
 7
 8# get recipe container
 9recipe = db.find_container( { '_id': db.root } )
10
11# get noise statistics data
12noise_stats = db.find_assets( { 'type': 'noise-stats' } )
13
14# create combined dataframe 
15df = []
16for stat in noise_stats:
17	# read data for each batch
18	tdf = pd.read_csv( 
19		stat.file, 
20		names = [ stat.metadata[ 'batch' ] ], 
21		index_col = 0, 
22		header = 0 
23	)
24	
25	df.append( tdf )
26
27df = pd.concat( df, axis = 1 )
28
29# compute recipe statistics
30mean = df.loc[ 'mean' ].mean() 
31std = df.loc[ 'std' ].pow( 2 ).sum()/ 4 
32
33stats = pd.DataFrame( [ mean, std ], index = ( 'mean', 'std' ) )
34
35# export recipe statistics
36stat_properties = {
37	'name': '{} Statistics'.format( recipe.name ),
38	'type': 'recipe-stats',
39	'file': 'recipe-stats.pkl'
40}
41
42stats_path = db.add_asset( stat_properties, 'recipe_stats' )
43stats.to_pickle( stats_path )

Let’s look at some of the new things we did here:

  • line 9: Get the Container the script is running in. In this case it will be Recipe A and Recipe B.

  • line 12: When analyzing the batches we only had one Asset we wanted to use, so used the find_asset() method. Now we want to pull in both noise_stats Assets, so use the find_assets() method, which returns a list of Assets that match the criteria. Also notice that the noise_stats aren’t in the Recipe Containers directly, but are in the batch children Containers. This highlights a very important point: Containers have access to their Assets as well as all their childrens’ Assets.

  • line 16-23: Iterate over each noise_stats Asset, creating a Pandas DataFrame from it, and adding it to the data list to be combined in line 27.

  • line 37: Use the name of the Container as part of the name for the new Asset.

  • line 43: Export the new Asset, this time as a pickle (.pkl) file. This is a binary format used by Pandas to store DataFrames, making importing them later on easier.

Let’s add this new script to run on our recipe Containers, and then run it.

Use any of the methods we learned before to associate the recipe_stats.py Script to the Recipe A and Recipe B Containers, then analyze the project again.

Let’s build our final analysis script now so we can see which recipe is better. In the analysis folder create the recipe_comparison.py script.

Listing 14 recipe_comparison.py#
 1# import packages
 2import pandas as pd
 3from thot import ThotProject
 4
 5# intialize thot
 6db = ThotProject()
 7
 8# prepare data
 9recipe_stats = db.find_assets( { 'type': 'recipe-stats' } )
10
11df = []
12for stat in recipe_stats:
13    # read data for each recipe
14    tdf = pd.read_pickle( stat.file )
15    tdf.rename( { 0: stat.metadata[ 'recipe' ] }, axis = 1, inplace = True  )
16    
17    df.append( tdf )
18
19# combine into one dataframe
20df = pd.concat( df, axis = 1 )
21
22# export data as csv for reading
23comparison_properites = {
24	'name': 'Recipe Comparison',
25	'type': 'recipe-comparison',
26	'file': 'recipe_comparison.csv' 
27}
28
29comparison_path = db.add_asset( comparison_properites, 'recipe_comparison' )
30df.to_csv( comparison_path )
31
32# create bar char and export
33means = df.loc[ 'mean' ]
34errs = df.loc[ 'std' ]
35
36ax = means.plot( kind = 'bar', yerr = errs )
37
38bar_properties = {
39	'name': 'Recipe Comparison',
40	'type': 'recipe-bar-chart',
41	'tags': [ 'chart', 'image' ],
42	'file': 'recipe_comparison.png'
43}
44
45bar_path = db.add_asset( bar_properties, 'recipe_bar' )
46ax.get_figure().savefig( bar_path, format = 'png' )

Let’s breakdown the new concepts:

  • line 15: Use the recipe metadata from the Asset to name the data.

  • line 29, 45: We’ve already seen that we can pull in multiple Assets into our scripts. Here we also see that we can create multiple Assets in a single script. Also notice that one Asset is a CSV text file, while the other is a PNG image file. Assets can be any sort of file.

Add the proper Script Association to the Silent Fireworks Container, and analyze the project.

And we’re done! Take a look at the Assets we created so we know which recipe is quieter and can report to the boss which we should use to keep those fish as happy as possible.

Click here to download the final project.