User Tools

Site Tools


documentation:dataextractors

How to write a data extractor (legagy version)

[fc - 18.4.2007, restored 8.12.2010, still valid]

[A new version of this doc is under construction, but should be improved before it can be used by the modellers: How to write a data extractor ]

What is a Data Extractor

Data Extractors are among the most used Capsis extensions. These tools extract data in Capsis projects in order to draw various diagrams, tables or other representations to check what happens during the simulations. An extractor is synchronized on a project target Step, carrying a situation at a given date. It generally shows either the state at this date, e.g. an histogram, or the state evolution in the scenario from the root step to this target step, e.g. a curve.

Several extractors can be added in a single Data Block to be rendered in the same graphic. This way, several curves for various scenarios can be compared easily (fig. 1). The resulting graphic is rendered by an other extension, which type is Dara Renderer. Several compatible data renderers may be swapped if available to see different representations of the extracted data, e.g. curves, histograms, scatterplots, tables.

Fig. 1. A Data Block with two Data Extractors “Basal area over Time” synchronized on two steps “45a” and “50b” in project named “fi”, represented by the “DRCurves” Data Renderer

Structure of the extractors / renderers system

Data extractors are all subclasses of the superclass capsis.extension.DataExtractor, itself implementing capsis.extension.Extension. Each data extractor class must implement a format interface (subinterface of capsis.extension.DataFormat) to tell what it provides. These format interfaces are known by the renderer classes which can check if they recognize the extractor. If yes, the renderer is compatible with the extractor.

Fig. 2. Data Extractors and Data Renderers. UML class diagram (abstract)

The example in fig. 2 shows that the extractor DETimeG - Basal area over Time - implements the format interface DFCurves. This means that DETimeG will provide getCurves (), getLabels (), getAxesNames () and getNY (). The renderer DRCurves needs these four methods to draw its graphic containing axes, graduations and curves. Its compatibility method matchWith () checks that the given extractor implements the interface DFCurves and if so returns true (“I match with this extractor”). In this case, DRCurves can draw the DETimeG extractors.

  • Note: the methods matchWith () are called by the Capsis extension manager when needed, i.e. when the Capsis user interface needs to provide a list of extensions of a given type compatible with some object.
  • Note: the format interfaces are in the package capsis.extension.dataextractor.format.

You can easily write a new extractor by copying another one with a structure you know close to your needs. The following sections show some common cases you can encounter and how to adapt them to your particular situation.

Before writing a new extractor

Before writing a new data extractor, you should have a look at the existing ones to check if some of them could match your needs, eventually with a slight moification by its author or by you with his permission.

To help, you can look in Capsis at Tools > Extension Manager > DataExtractor for an exhaustive list and some short descriptions of the available extractors and also in the Tools > Calculation Methods to see the list of all known “public method providers” (method provider, french) and the related modules and extensions using them (the source codes are directly accessible from the document by hyperlinks).

  • Note: the name of the author of an extension is notified in the tag @author in the extractor class comment (the main comment just before the class or interface line in the code).

How to write an evolution extractor

An evolution data extractor shows the evolution of some variable over time. Time can be a date or an age. The result is often a curve or a table showing the different values of the variable (fig. 3).

Fig. 3. Same extractor than fig. 1, but seen in the data renderer DRTables. It is possible to select the table contents and copy / paste it in other applications like editors or spreadsheets

A template explained in details

For an evolution extractor, the extractor DETimeG (Basal area over Time) can be used as a template. We are going to consider this example and examinate it in details. At the end of this section, you will be able to create a new evolution extractor by copying and renaming DETimeG.

All Data extractors are Capsis extensions and are located in the package capsis.extension.dataextractor. A naming convention is proposed : DE followed by some english short description. For y as a function of x extractors, the proposed name is DExy. For our example Basal area as a function of time: DETimeG (G is a common denomination for Basal area, could have been DETimeBasalArea). Some other convention may be used if needed.

All new extensions must be declared in the etc/capsis.extensions file. To add an entry, edit the file and dupplicate the paragraph of the copied extension just changing the extension names, adapt if needed according to the comments at the top of the file.

The DETimeG extractor begins with the free software comment header. All components except modules in Capsis have this header in accordance with the Capsis charter.

/* 
 * Capsis 4 - Computer-Aided Projections of Strategies in Silviculture
 * 
 * Copyright (C) 2000-2003  Francois de Coligny
 * 
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 * 
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied 
 * warranty of MERCHANTABILITY or FITNESS FOR A 
 * PARTICULAR PURPOSE. See the GNU Lesser General Public 
 * License for more details.
 * 
 * You should have received a copy of the GNU Lesser General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

The package definition follows, meaning that - if capsis installation directory is somewhere/capsis4/ - this file must be in the directory capsis4/bin/capsis/extension/dataextractor/.

package capsis.extension.dataextractor;

The imports tell the compiler in which packages the other classes used here can be found.

import capsis.kernel.*;
import capsis.extension.*;
import capsis.gui.*;
import capsis.util.*;
import capsis.util.methodprovider.*;
import capsis.extension.dataextractor.format.*;
import java.awt.*;
import java.awt.event.*;
import java.util.*;
import javax.swing.*;

The main class comment ( notation to help further automatic documentation generation by javadoc) tells what this tool is about and contains the @author tag (also recognized by javadoc) giving the name of the owner of the extension and the date of building. More comments are possible here but there should not be less.

The class is public: its name must be the same than the file name, here DETimeG.java (in capsis.extension.dataextractor). All data extractors extend the class DataExtractor (in capsis.extension) and the latter implements DFCurves, meaning we are going to build curves (see the four methods of DFCurves below).

/**     Basal area ("Grundflache" (G), "surface terrière") over Time.
*       @author F. de Coligny - november 2000
*/
public class DETimeG extends DataExtractor implements DFCurves {

The class has two instance variables, they are protected, meaning that they are visible by possible subclasses. The first one named curves will contain the series of data to draw the graphic and the second named methodProvider is a reference to the model's method provider (method provider, french).

        protected Vector curves;
        protected MethodProvider methodProvider;

The following static initializer tells the Capsis Translator to consider a new bundle file containing translations related to the current DETimeG extension. That's how the subsequent Translator.swap (key) will be replaced by texts in the good language.

        static {
                Translator.addBundle("capsis.extension.dataextractor.DETimeG");
        } 

All extensions have two constructors. The Phantom constructor takes no parameters and does nothing. Instances built with this constructor are unusable, it is only used by the Extension manager for management purpose. The official constructor is the one which matters, it has only one parameter, instance of ExtensionStarter or a subclass. This parameter contains everything the extension needs to be run.

You can notice that the starter is immediately passed to the superclass DataExtractor. The latter memorizes two parameters of the starter: step and dataBlock in protected variables (accessible in the DETimeG subclass). step is the reference of the target step this extractor is synchronized on and dataBlock is a reference to the data block this extractor is part of (see upper, two curves in same graphic mean two extractors in a data block).

The try … catch block is a good practice in extensions official constructors, it ensures that if any trouble occurs at construction time, there will be a trace in the Capsis Log. In this extension, we only instanciate the Vector curves that will be fed in the doExtraction () method later.

        /**     Phantom constructor. 
        *       Only to ask for extension properties (authorName, version...).
        */
        public DETimeG () {}
 
        /**     Official constructor. It uses the standard Extension starter.
        */
        public DETimeG (ExtensionStarter s) {
                super (s);
                try {
                        curves = new Vector ();
                } catch (Exception e) {
                        Log.println (Log.ERROR, "DETimeG.c ()", "Exception occured while object construction : ", e);
                }
        }

All extensions contain a method matchWith (Object referent). This method is called when needed by the Capsis extension manager. Data extractors are passed a referent of type subclass of GModel. They must return true (“I match with this referent”) if they can extract their data from a project linked to a model class of this type. Consequently, the method contains tests to make sure the extraction will be possible later if the extension is selected by the user.

Here, matchWith () checks that referent is an instance of GModel, gets its method provider and makes sure it is instance of GProvider (in capsis.util.methodprovider). This means that the method provider will contain a method named getG (GStand stand, Collection trees). This method will be used a great number of times when getting the values of G at each date in the project (see below).

If not compatible, the method return false. In case of trouble, some message is writen in the Log and the method returns false.

        /**     Extension dynamic compatibility mechanism.
        *       This matchwith method checks if the extension can deal (i.e. is compatible) with the referent.
        */
        public boolean matchWith (Object referent) {
                try {
                        if (!(referent instanceof GModel)) {return false;}
                        GModel m = (GModel) referent;
                        MethodProvider mp = m.getMethodProvider ();
                        if (!(mp instanceof GProvider)) {return false;}
 
                } catch (Exception e) {
                        Log.println (Log.ERROR, "DETimeG.matchWith ()", "Error in matchWith () (returned false)", e);
                        return false;
                }
 
                return true;
        }

Next method is a redefinition of setConfigProperties () in the superclass. Here we can add properties that will be presented in the configuration panel when opened and that can be tested directly in the method doExtraction ().

[Pending: more details needed here]

        /**     This method is called by superclass DataExtractor.
        */
        public void setConfigProperties () {
                // Choose configuration properties
                addConfigProperty (DataExtractor.HECTARE);
                addConfigProperty (DataExtractor.TREE_GROUP);           // group multiconfiguration
                addConfigProperty (DataExtractor.I_TREE_GROUP);         // group individual configuration
        }

doExtraction () is the main method in the data extractor. When called, it updates the data to be drawn on the final graphic. It may be called when the extractor is synchronised on a new step (from the project manager) ou when some configuration changed (e.g. calculate values per hectare).

The first two lines check if update is needed and possible. If not, false is returned (meaning not done).

Then the method provider of the model is retrieved. From the step reference (see upper), we get the reference to the including project by getScenario (), from there the reference of the model linked to the project by getModel () and ask it its method provider with getMethodProvider ().

  • Note: this provider will be used later in the method to reach the getG () method.
  • Note: the project class in Capsis is named Scenario.

From now on, we work in a try … catch block to control possible exceptions during the building of the data series.

First of all, we calculate an hectare coefficient by which we will multiply all values. If the hectare property is not set, this coefficient is set to 1 to have no effect. The variable settings.perHa is related to the DataExtractor.HECTARE property in the setConfigProperties () method upper.

We ask the Project class step.getScenario () for the collection of steps from the root step to the referent step of this extractor. We will soon iterate on them to calculate the couple (date, basal area) for each step. Then we prepare the collections which will contain the collected values: c1 for dates and c2 for basal area values.

        /**     From DataExtractor SuperClass.
        * 
        *       Computes the data series. This is the real output building.
        *       It needs a particular Step.
        *       This output computes the basal area of the stand versus date
        *       from the root Step to this one.
        * 
        *       Return false if trouble while extracting.
        */
        public boolean doExtraction () {
                if (upToDate) {return true;}
                if (step == null) {return false;}
 
                // Retrieve method provider
                methodProvider = step.getScenario ().getModel ().getMethodProvider ();
 
                try {
                        // per Ha computation
                        double coefHa = 1;
                        if (settings.perHa) {
                                coefHa = 10000 / step.getStand ().getArea ();
                        }
 
                        // Retrieve Steps from root to this step
                        Vector steps = step.getScenario ().getStepsFromRoot (step);
 
                        Vector c1 = new Vector ();              // x coordinates
                        Vector c2 = new Vector ();              // y coordinates

We now loop on the steps from root. For each step, we get the reference of the linked stand and call the doFilter () method in the superclass to get the trees to consider. If a group was chosen by the user (see setConfigProperties ()), it will be applied, otherwise all the trees will be returned.

  • Note: this extractor works for models with individual trees or diameter class distributions (where each class is considered as a tree). Other models, for example stand level models, will not be able to provide a getG (GStand stand, Collection trees) method and the extractor will not be found compatible.

The date is a property of the stand, it is an integer. To calculate the basal area, we use the method provider by the method provider of the model. We multiply the result by the hectare coefficient and add the date and basal area value in c1 and c2 respectively.

  • Note: c1 contains integers (dates) and c2 contains double values (basal area values).
                        // Data extraction : points with (Integer, Double) coordinates
                        for (Iterator i = steps.iterator (); i.hasNext ();) {
                                Step s = (Step) i.next ();
 
                                // Consider restriction to one particular group if needed
                                GStand stand = s.getStand ();
                                Collection trees = doFilter (stand);            // fc - 5.4.2004
 
                                int date = stand.getDate ();
                                double G = ((GProvider) methodProvider).getG (stand, trees) * coefHa;   // fc - 24.3.2004
 
                                c1.add (new Integer (date));    
                                c2.add (new Double (G));
                        }

Under the loop, we add c1 and c2 in the main vector curves, process possible exceptions in the catch block by writing a message in the Log and returning false (some trouble happened) and finally set upToDate to true for next time and return true (the vector curves was updated).

                        curves.clear ();
                        curves.add (c1);
                        curves.add (c2);
 
                } catch (Exception exc) {
                        Log.println (Log.ERROR, "DETimeG.doExtraction ()", "Exception caught : ",exc);
                        return false;
                }
 
                upToDate = true;
                return true;            
        }

From this point, there are only accessors required by the interfaces implemented by DETimeG either directly or by its superclasses. First is getName () which returns a translated text to appear in the user interface. Required by DataFormat and Extension interfaces.

        /**     From DataFormat interface.
        *       From Extension interface.
        */
        public String getName () {
                return getNamePrefix ()+Translator.swap ("DETimeG");
        }

The four methods required by DFCurves: getCurves () returns the vector we built in doExtraction (), getLabels () returns nothing (see comments in the DFCurves class for more details about label options), getAxesNames () returns the translated texts to be writen in the graphic near the axes and getNY () returns the number of curves in the vector curves, here 1.

        /**     From DFCurves interface.
        */
        public Vector getCurves () {
                return curves;
        }
 
        /**     From DFCurves interface.
        */
        public Vector getLabels () {
                return null;    // optional : unused
        }
 
        /**     From DFCurves interface.
        */
        public Vector getAxesNames () {
                Vector v = new Vector ();
                v.add (Translator.swap ("DETimeG.xLabel"));
                if (settings.perHa) {
                        v.add (Translator.swap ("DETimeG.yLabel")+" (ha)");
                } else {
                        v.add (Translator.swap ("DETimeG.yLabel"));
                }
                return v;
        }
 
        /**     From DFCurves interface.
        */
        public int getNY () {
                return 1;
        }

Some methods required by the Extension interface to return the version of the tool (here 1.1), the name of its author and a translated description of what it does.

        /**     From Extension interface.
        */
        public String getVersion () {return VERSION;}
        public static final String VERSION = "1.1";
 
        /**     From Extension interface.
        */
        public String getAuthor () {return "F. de Coligny";}
 
        /**     From Extension interface.
        */
        public String getDescription () {return Translator.swap ("DETimeG.description");}
}

Possible customisation

To customize this extractor to calculate the evolution of (e.g.) mean number of branches over time, you must:

  1. check that something does not exist already that does this task directly or with few changes (see §3.), if needed, contact the author to discuss with him,
  2. create a new interface called for example MeanNBranchesProvider in capsis.util.methodprovider with a method declaration inside like public int getMeanNBranches (GStand stand, Collection trees),
  3. make sure the model(s) you want to be compatible have a method provider implementing this interface and answering correctly the mean number of branches of the whole stand or the given trees collection,
  4. copy DETimeG into the same directory capsis.extension.dataextractor and rename it into DETimeMeanNBranches.java, rename inside all DETimeG into DETimeMeanNBranches (the change all command of your editor would be a good idea to forget nothing, with case sensitive: ON and whole word only: OFF),
  5. update comments carefully: names, explanations, references to author and modification date, error and warning messages. The class is not long, you can take the time to read and correct it completely from the top to the bottom,
  6. update the matchWith () method to comply with your compatibility condition: if (!(mp instanceof MeanNBranchesProvider)) {return false;},
  7. in doExtraction (), change the calculation line into for example
int n = (int) (((MeanNBranchesProvider) methodProvider). getMeanNBranches (stand, trees) * coefHa);

and below,

c2.add (new Integer (n));
  1. copy and rename the DETimeG translations files in french AND english (both required) for the new extractor DETimeMeanNBranches: DETimeMeanNBranches_fr.properties and DETimeMeanNBranches_en.properties. Check carefully all the translations inside.
  2. Declare your new extension by adding a paragraph in capsis4/etc/capsis.extensions.
  • Note: be careful to always extend DataExtractor and not another extractor, it is generally easier if the extensions are not coupled together
  • Note: for configuration properties, you can have a look at other extractors, either by using them in Capsis or in viewing the code: a lot has been made in the DataExtractor superclass to facilitate the use in the subclasses.

If you encounter troubles in the use of this howto (things not clear, errors, missing information…), please contact me to help correct the document: mailto:coligny@cirad.fr

documentation/dataextractors.txt · Last modified: 2021/12/13 09:28 by 127.0.0.1