Analyzing the Data#
Now for the fun to start! Let’s create our first analysis script. Create a new file in the analysis
folder called noise_stats.py
. Copy and paste the following code into the file.
1# import packages
2import pandas as pd
3from thot import ThotProject
4
5# initialize thot project
6db = ThotProject()
7
8# get noise data from asset
9noise_data = db.find_asset( { 'type': 'noise-data' } )
10
11# import noise data into a pandas data frame
12df = pd.read_csv( noise_data.file, header = 0, index_col = 0, names = [ 'trial', 'volume' ] )
13
14# compute statistics
15stats = df.describe()
16
17# create a new asset for the statistics
18stats_properties = {
19 'name': 'Noise Statistics',
20 'type': 'noise-stats',
21 'file': 'noise-stats.csv'
22}
23
24stats_path = db.add_asset( stats_properties, 'noise_stats' )
25
26# export the statistics to the new asset
27stats.to_csv( stats_path )
Let’s go through and break down what each chunk of code is doing.
lines 2-3: Import the packages we’re goin to use, namely Pandas and Thot. In this case we only need to use a small part: ThotProject.
line 6: Initialize the Thot project, giving us access to all the data stored within it.
line 9: Find the Noise Data Asset we made for each batch by search for Assets who have a
type
of ‘noise-data’.line 12: Load the noise data into a Pandas DataFrame.
line 15: Compute statistics on the noise data.
lines 18-24: Create a new Asset to store the noise statistics in. Notice that the
stats_properties
dictionary we pass in mimics exactly the structure of the_asset.json
files we created earlier.db.add_asset()
accepts as its second argument an_id
for the new Asset.line 27: Saves the statistics to the new Asset.
Now we need to tell Thot which Containers to run this script from. This is done by creating Script Associations.
Again, we’ll see how to add Script Associations from both the Project and File Tree views.
From the Project view right click on Recipe A
> Batch 1
and select Edit Scripts. This open the Script Associations dialog. Click the Add Script button at the top of the dialog and select the noise_stats.py
Script we just created. We can ensure the Script Association was created successfully by changing the preview from Assets to Scripts.
For Recipe A
> Batch 2
lets assign the same Script, but instead of right clicking on the Container to open the Script Associations dialog, double click on the Scripts preview of the Container. At the moment this is (none) for the Container.
Now switch to the File Tree view and select Recipe B
> Batch 1
. Click the Add Scripts button, and perform the same steps.
Add Scripts will add Scripts to the selected Container, while Set Scripts will remove any previously associated Scripts, and set them to match what is submitted.
Let’s run our first analysis! From the Project view switch the Assets preview so we can see our new Assets being created. Then, when you’re ready, click the Analyze button in the upper right of the workspace.
Warning
Running the analysis by pressing the Analyze
button may give you an error. If this occurs please attempt to run the analysis from the command line.
To do this open up a terminal (Anaconda prompt on Windows) navigate to teh project root (data folder) and run thot run
.
More information is available in the cli
tab of this section.
When the analysis is running we can continue to work on our project, and when the analysis is complete we will get a notification pop up and the new Assets will appear in our preview.
We’ll start off again creating a Script Association by hand, then see how to automate the process using the Utilities.
Navigate to the Recipe A > Batch 1 Container. We create Script Associations by placing a _scripts.json
file in a Container. Go ahead and create this file and paste the contents below inside it.
[
{
"script": "root:/../analysis/noise_stats.py"
}
]
This tells Thot to run the noise_stats.py
script from this Container. The script
field is the path to the script to run, it can be a relative or absolute path. The special root:
directive points to the project root.
Before adding the script to the rest of the batches, let’s try it out. In your terminal run
thot run
You should see the noise_stats
Asset be added to the folder. Great! Now let’s make it so we analyze all the batches. Navigate to the project root (data
folder) and run
thot utils set_scripts -s '{ "type": "batch" }' --scripts '[ { "script": "root:/../analysis/noise_stats.py" } ]'
You’ll notice that we replaced the --search
flag from the previous command like this with the -s
flag. The two are synonyms for each other, -s
just giving us a shorthand for --search
.
Now let’s analyze the entire project by running thot run
again. This will create a new Noise Statistics Asset for each of the batches.
Click here to download this project step.
Moving On Up#
Now that we have the statistics for each of our batches we can move up one level in our project tree to compile the statistics for each recipe. Let’s first make the analysis script calling it recipe_stats.py
.
1# include packages
2import pandas as pd
3from thot import ThotProject
4
5# initialize thot
6db = ThotProject()
7
8# get recipe container
9recipe = db.find_container( { '_id': db.root } )
10
11# get noise statistics data
12noise_stats = db.find_assets( { 'type': 'noise-stats' } )
13
14# create combined dataframe
15df = []
16for stat in noise_stats:
17 # read data for each batch
18 tdf = pd.read_csv(
19 stat.file,
20 names = [ stat.metadata[ 'batch' ] ],
21 index_col = 0,
22 header = 0
23 )
24
25 df.append( tdf )
26
27df = pd.concat( df, axis = 1 )
28
29# compute recipe statistics
30mean = df.loc[ 'mean' ].mean()
31std = df.loc[ 'std' ].pow( 2 ).sum()/ 4
32
33stats = pd.DataFrame( [ mean, std ], index = ( 'mean', 'std' ) )
34
35# export recipe statistics
36stat_properties = {
37 'name': '{} Statistics'.format( recipe.name ),
38 'type': 'recipe-stats',
39 'file': 'recipe-stats.pkl'
40}
41
42stats_path = db.add_asset( stat_properties, 'recipe_stats' )
43stats.to_pickle( stats_path )
Let’s look at some of the new things we did here:
line 9: Get the Container the script is running in. In this case it will be Recipe A and Recipe B.
line 12: When analyzing the batches we only had one Asset we wanted to use, so used the
find_asset()
method. Now we want to pull in bothnoise_stats
Assets, so use thefind_assets()
method, which returns a list of Assets that match the criteria. Also notice that thenoise_stats
aren’t in the Recipe Containers directly, but are in the batch children Containers. This highlights a very important point: Containers have access to their Assets as well as all their childrens’ Assets.line 16-23: Iterate over each
noise_stats
Asset, creating a Pandas DataFrame from it, and adding it to the data list to be combined in line 27.line 37: Use the name of the Container as part of the name for the new Asset.
line 43: Export the new Asset, this time as a pickle (.pkl) file. This is a binary format used by Pandas to store DataFrames, making importing them later on easier.
Let’s add this new script to run on our recipe Containers, and then run it.
Use any of the methods we learned before to associate the recipe_stats.py
Script to the Recipe A
and Recipe B
Containers, then analyze the project again.
From the project root run
thot utils add_scripts -s '{ "type": "recipe" }' --scripts '{ "script": "root:/../analysis/recipe_stats.py" }'
thot run
Let’s build our final analysis script now so we can see which recipe is better. In the analysis folder create the recipe_comparison.py
script.
1# import packages
2import pandas as pd
3from thot import ThotProject
4
5# intialize thot
6db = ThotProject()
7
8# prepare data
9recipe_stats = db.find_assets( { 'type': 'recipe-stats' } )
10
11df = []
12for stat in recipe_stats:
13 # read data for each recipe
14 tdf = pd.read_pickle( stat.file )
15 tdf.rename( { 0: stat.metadata[ 'recipe' ] }, axis = 1, inplace = True )
16
17 df.append( tdf )
18
19# combine into one dataframe
20df = pd.concat( df, axis = 1 )
21
22# export data as csv for reading
23comparison_properites = {
24 'name': 'Recipe Comparison',
25 'type': 'recipe-comparison',
26 'file': 'recipe_comparison.csv'
27}
28
29comparison_path = db.add_asset( comparison_properites, 'recipe_comparison' )
30df.to_csv( comparison_path )
31
32# create bar char and export
33means = df.loc[ 'mean' ]
34errs = df.loc[ 'std' ]
35
36ax = means.plot( kind = 'bar', yerr = errs )
37
38bar_properties = {
39 'name': 'Recipe Comparison',
40 'type': 'recipe-bar-chart',
41 'tags': [ 'chart', 'image' ],
42 'file': 'recipe_comparison.png'
43}
44
45bar_path = db.add_asset( bar_properties, 'recipe_bar' )
46ax.get_figure().savefig( bar_path, format = 'png' )
Let’s breakdown the new concepts:
line 15: Use the recipe metadata from the Asset to name the data.
line 29, 45: We’ve already seen that we can pull in multiple Assets into our scripts. Here we also see that we can create multiple Assets in a single script. Also notice that one Asset is a CSV text file, while the other is a PNG image file. Assets can be any sort of file.
Add the proper Script Association to the Silent Fireworks
Container, and analyze the project.
Add the proper Script Association (_scripts.json
) to the root folder, and analyze the project.
And we’re done! Take a look at the Assets we created so we know which recipe is quieter and can report to the boss which we should use to keep those fish as happy as possible.