EurostatEurostat
Eurostat EDIT 2012
Functional Presentation
EurostatEurostat
EDIT Introduction
EDIT allows users to import data, perform a set of predefined operations on the imported datasets and export data resulted from these processing operations.
Validations / Computations Record Vertical Hierarchical
Dataset Operations Copy, Merge, Alter, Aggregate, etc see Scripting manual.
EurostatEurostat
EDIT Introduction (2)
Data Validation tool, allowing users to import data, run validation programs and export results
The validation process relies on a custom Scripting Language
Web-based User Interface
Data and Metadata isolated into independent Domains
EurostatEurostat
Specialised functions
Time series outliers (Terror)
Berthelot-Hidiroglu
Sigma Gap
Programmable functions
EurostatEurostat
Technology Overview
Web Based InterfaceUnified interface for both the local version and the
server deploymentEUROSTAT Look & FeelLight interface, simplified workflows
RDBMS (Oracle and PostgreSQL)
ECAS or local authentication ( end year: SMS)
EurostatEurostat
EDIT Integration Capabilities
Exposes full API as Web Services
Integrated with EDAMIS detect incoming files and process them in unattended mode publish validation results to the Feedback Channel
Integrated with the SDMX Registry fetch DSDs into EDIT structures load codelists from the Registry
EurostatEurostat
EDAMIS Integration
EDAMIS can send data to EDIT by placing the files in a configurable location
EDIT detects metadata based on the EDAMIS naming convention
EDIT performs the processing in unattended mode
EDIT acts as a client for the EDAMIS Feedback Channel Web Service in order to publish the results of a job execution
EurostatEurostat
SDMX Registry Integration
EDIT can import DSDs or codelists
EDIT acts as a client for the SDMX Registry Web Services in order to fetch DSD files and codelists data
The DSD file is broken down into EDIT components Key families are translated to EDIT formats Codelists are translated to EDIT Lookups An EDIT Program is created performing lookup validations
and basic checks on the dimension fields
A specific importer has been implemented to process codelist data
EurostatEurostat
EVE Integration
EVE Rules can be imported into EDIT or executed from an external file during the EDIT Job Execution
An EDIT component can translate EVE rules defined in XML files to EDIT Scripting Language
EurostatEurostat
Important principles
1) From Microdata to macrodata
2) Scripting principle (symbols/placeholders)
3) Editing seen as a case of complex computations
4) Multidataset approach
5) Cube approach in computations
EurostatEurostat
Rule layout
Rule name Rule type Rule body Error part (msg,selected vars) Then compute part Else compute part
EurostatEurostat
EDIT Scripting Language Capabilities
Custom Scripting Language designed specifically for data validation
Tries to be as simple as possible and still flexible enough to fit the requirements of any existing domain
Allows the definition of Formats and validation Programs Formats (Dataset Definitions) describe the structure of the
data (Format Definition Language) Validation Programs describe the validation rules and are
composed from a set of steps with inputs and outputs (Program Definition Language)
EurostatEurostat
EDIT Standalone Installation
Standalone Installation supported for Windows XP and Windows 7
Simple installation wizard Shortcuts are created in the Start Menu
EurostatEurostat
EDIT User Types
User Executes jobs on datasets
Programmer Manages the Metadata needed by the User to execute jobs Sets up the unattended mode configuration
Administrator Manages users and permissions
EurostatEurostat
User Module Functionality
Change Password Change the password of the user(when not logged through
ECAS) Dataset Import/Export
Import and export data to and from the System Monitor any ongoing import/export processes
Job Execution Execute validation programs on imported datasets View the results of a Job Execution
EurostatEurostat
User Workflow
EurostatEurostat
Programming ModuleMain Functional Capabilities
Formats and Programs Definition Define Metadata using the editors in the User Interface Import/Export Metadata from/to external TXT files Import/Export to Oracle
Data Import/Export Import Auxiliary Data (lookup datasets)
Job Execution Execute validation programs on imported datasets
EurostatEurostat
Programming Module Workflows
EurostatEurostat
Programming Module - Format Definition
EurostatEurostat
Programming Module – Program Definition
EurostatEurostat
Programming Module – Import Auxiliary Data
EurostatEurostat
Programming Module – Unattended Mode Configuration
EurostatEurostat
Administration ModuleMain Functional Capabilities
User Management Manage the Users and their permissions
User Group Management Manage the User Groups and their members
Domain Management Manage the Domains
EurostatEurostat
Dataset format CVTS
FORMAT cvts_4 { DESCRIPTION "CVTS Format"; FIELDS { COUNTRY { DESCRIPTION "None"; CAPTION "None"; TYPE STRING; LENGTH 2; } ENTERPR { DESCRIPTION "None"; CAPTION "None"; TYPE NUMBER; LENGTH 6; } REFYEAR { DESCRIPTION "None"; CAPTION "None"; TYPE NUMBER; LENGTH 4; } WEIGHT { DESCRIPTION "None"; CAPTION "None"; TYPE DOUBLE; LENGTH 20.10; } NACE_SP { DESCRIPTION "None"; CAPTION "None"; TYPE STRING; LENGTH 5;
EurostatEurostat
Program CVTS annex8
PROGRAM cvts_4 {INPUT cvts_4 inputDataSet; <= all input datasets we use for the validationINPUT LANGUAGES_LIST LANGUAGES;INPUT COUNTRIES_LIST COUNTRIES;INPUT NUTS_LIST NUTS;INPUT NACE_LIST NACE;STEPS { <= can be multi-step program (for example separately ERRORS .. WARNINGS )VALIDATION annex8_error { INPUT inputDataSet; <= main dataset being validated LOOKUP LANGUAGES; LOOKUP COUNTRIES; <= lookup tables LOOKUP NUTS; LOOKUP NACE; ERROR err_annex8_error; <= output log - error dataset RULES {
RECORD FL001 {CONDITION inLookup (COUNTRY, COUNTRIES, "CODE"); <= check validity of the COUNTRY code using lookup tableERRMSG "Rule FL1 failed for field [COUNTRY]: See EU Manual for valid list of codes (annex 12)" SEVERITY "Error"
(COUNTRY) ; }RECORD FL002 {
CONDITION (ENTERPR>=0 AND ENTERPR<=999996); ERRMSG "Rule FL2 failed for field [ENTERPR]: In the range 0 to 999996" SEVERITY "Error" (ENTERPR) ;
}RECORD FL004 {
CONDITION (strToDouble (NACE_SP)>=2001 AND strToDouble (NACE_SP)<=2020);ERRMSG "Rule FL4 failed for field [NACE_SP]:In the range 2001 to 2020 - See Manual (annex 1)" SEVERITY "Error"
(NACE_SP) ; }RECORD FL005 {
CONDITION in (SIZE_SP, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9);ERRMSG "Rule FL5 failed for field [SIZE_SP]: In the range 0 to 9" SEVERITY "Error" (SIZE_SP) ;
}RECORD FL171 {
CONDITION (NOT isNull (A1bis)) -> inLookup (A1bis, NACE, "CODE");ERRMSG "Rule FL171 failed for field [A1bis]: NACE rev 1.1" SEVERITY "Warning" (A1bis) ;
}RECORD FL172 {
CONDITION (NOT isNull (A2bis)) -> ( (A2bis >=0 and A2bis <= 999996) or A2bis = 999999); ERRMSG "Rule FL172 failed for field [A2bis]: In the range of 0-999996 or 999999" SEVERITY "Warning" (A2bis) ; }
EurostatEurostat
Complex program example (1)PROGRAM ComputationExamples { INPUT countryDsd inputData; STEPS { VALIDATION checkValues { INPUT inputData; ERROR errorData1; RULES { RECORD pureRecord { PRICE := 20; } RECORD conditionalRecord { CONDITION isNull(VALUE); THEN { VALUE := PRICE * QUANTITY; } ELSE { PRICE := VALUE / 5; QUANTITY := VALUE / PRICE; } } VERTICAL pureVertical { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; TRKEYS COUNTRY; VALUE['TOTAL'] := nvl(VALUE['TOTAL'],0); } }
EurostatEurostat
Complex program example (2)VERTICAL conditionalVertical { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; TRKEYS COUNTRY; CONDITION 2 * VALUE['TOTAL'] = sum(VALUE[*]); THEN { VALUE['FR'] := VALUE['TOTAL'] / 3; VALUE['GB'] := VALUE['TOTAL'] / 2; VALUE['TOTAL'] := sum(VALUE[*]) - VALUE['TOTAL']; } ELSE { VALUE['TOTAL'] := sum(VALUE[*]) - VALUE['TOTAL']; } } }
VERTICAL multipleTranspositionsComputation { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; TRKEYS COUNTRY, PRODUCT; VALUE['TOTAL']['GAS'] := nvl(VALUE['TOTAL']['GAS'],0); } }
EurostatEurostat
Complex program example (3)VERTICAL multipleTranspositionsCondition { EXPRESSION { KEYS COUNTRY, CTYPE, MONTH, PRODUCT; TRKEYS COUNTRY, PRODUCT; CONDITION VALUE['TOTAL']['GAS'] > 5000; THEN { VALUE['TOTAL']['GAS'] := VALUE['TOTAL']['OIL'] * 2; VALUE['TOTAL']['GLD'] := VALUE['TOTAL']['OIL'] * 5; } ELSE { VALUE['TOTAL']['GLD'] := VALUE['TOTAL']['GAS'] + 3000; } } } } }
DATAOPERATION sortData { SORT { INPUT inputData; ORDER MONTH ASC, CTYPE ASC, COUNTRY ASC, PRODUCT ASC; } } }}
EurostatEurostat
Accepted data formatsGesmes / BOP ITS, BOP FDIUNA:+.? 'UNB+UNOC:3+FR2+4D0+100929:1637+IREF000243++GESMES/TS'UNH+MREF000001+GESMES:2:1:E6'BGM+74'NAD+Z02+ECB'NAD+MR+4D0'NAD+MS+FR2'IDE+10+EUROSTAT_BOP_01 reporting'DSI+BOP_FDI_A'STS+3+7'DTM+242:201009291637:203'DTM+Z02:20072009:702'IDE+5+EUROSTAT_BOP_01'GIS+AR3'GIS+1:::-'ARR++A:FR:N:2:330:N:4A:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F‘ <= multi-year 2007, 2008, 2009 observationsARR++A:FR:N:2:330:N:4F:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F'ARR++A:FR:N:2:330:N:7Z:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F'ARR++A:FR:N:2:330:N:A1:E:1100:9999:20072009:702:5824:A:F+5930:A:F+4204:A:F'ARR++A:FR:N:2:330:N:A1:E:1495:9999:20072009:702:5828:A:F+5932:A:F+4206:A:F'
CSV (with or without header) /SBS, CVTS,TOURISM
9H; 2008; LT; 2; B-N_X_K642; 11930; 16236; ; ; ; ; UNIT; ; ; ; ; ; TT0; ; ; ; ; D08
9H; 2008; LT; 3; B-N_X_K642; 11930; 1001; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08
9H; 2008; LT; 4; B-N_X_K642; 11930; 529; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08
9H; 2008; LT; 30; B-N_X_K642; 11930; 17766; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08
9H; 2008; LT; 2; B-E; 11930; 1138; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08
9H; 2008; LT; 3; B-E; 11930; 104; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08
9H; 2008; LT; 4; B-E; 11930; 61; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08
FLR example
001E20100121814 00 804.822
001E20100121816 93 5295.549
001E20100121814 99 6166.24
001E20100125290334 581.371
FLR example 2
2010010011 010252000405595911005909580E 01ZZZZZ 2691.966 2734482.0 0.0
2010010011 010252000405595911004009600E 01ZZZZZ 237.543 341202.0 0.0
EurostatEurostat
Program with parameter(s)
PROGRAM SBS_ANNEX1_SingleSeries { INPUT SBS_DATA input1; PARAMETER P_T NUMBER; PARAMETERSET PARAMETERS { P_T = 2009; } STEPS { VALIDATION Validation { INPUT input1;
ERROR ErrorLog; RULES { VERTICAL Rule001 { EXPRESSION { KEYS SERIES, YEAR, TER_UNIT, SIZECLASS,
ECO_ACTIVITY, VARIABLE; TRKEYS VARIABLE;
CONDITION SERIES = '1A' AND YEAR = p_T AND countMissing(aux_val['12150'],aux_val['12110'])=0 -> aux_val['12150'] <= aux_val['12110'];
ERRMSG '12150 > 12110' SEVERITY 'Warning' (aux_val['12150'],aux_val['12110']) ; } }
EurostatEurostat
Functions, data types, operators (1)Data Types There are four types of data: Boolean Double Number String
Operators There are three types of operators in the SL Expressions:
Boolean operators, used to evaluate expressions into a true/false result:
o ORo ANDo NOTo -> (implication)o = (equals)o <> (not equals)o > (greater than)o < (lower than)o >= (greater than or equal to)o <= (lower than or equal to)
Computation operators, used to produce a value result following evaluation:
o + (plus)o – (minus)o * (multiply)o / (divide)
Assignment operator, used to assign a value to an operando := (supports assignment to NULL, value := NULL)
Functions
These following function calls are supported:A abs(Double) – absolute value ascii(Char) – returns the ASCII code for a characterB between(Double, Double, Double) – verifies if a number is
inside a closed interval between(Double, Double, Double, Boolean, Boolean) – verifies
if a number is inside an interval allowing the user to specify if the interval is closed or open at each end
between(String, String, String) – same as above between(String, String, String, Boolean, Boolean) – same as
aboveC ceiling(Double) – ceiling for number (Ex: ceiling(3.2) => 4) chr(Integer) – returns the ASCII character for the ASCII code concatenate(String…) – concatenate Strings countMissing(List) – returns the number of null values in
the list count(List) – returns the number of elements in the listE exp(Double)F firstIndexOf(String toSearch, String searchIn, Double
startingFrom) – first occurance of the toSearch String in the searchIn String
floor(Double) – floor for number (Ex: ceiling(3.2) => 3)G getRowCount(datasetReference) – returns the number of rows
for the specified dataset reference
EurostatEurostat
Functions, data types, operators (2)I in(Boolean, Boolean List) – check to see if a value is inside a list in(Double, Double List) in(String, String List) identicalInList(List) – checks that all elements in a list are identical isIdentical(List, List, …) – returns a boolean indicating if each list
contains identical elements (elements are identical inside a single list) – Ex: isIdentical(price[*],quantity[*],value[*])
isUnique(List, List, …) – returns a boolean indicating if the combination of elements from each list for all the index positions are unique– Ex: isUnique(price[*],quantity[*],value[*])
isTrue(Boolean) – checks if a boolean is true isNull(Boolean/String/Double) – checks if a value is NULL inLookup(value1, value2, …, datasetReference, “fieldName1”,
“fieldName2”,… ) – returns a boolean indicating whether or not the value or combination of values is defined for the fieldname or combination of fieldnames in the specified dataset
L lastIndexOf(String toSearch, String searchIn, Double startingFrom) – last
occurance of the toSearch String in the searchIn String length(String) – returns the length of the String like(String, String) – compares two strings in a SQL manner log(Double) ln(Double) ltrim(String) – trim left side of String lower(String) – switch to lower case left(String S, Double N) – returns the first N characters from SM max(Double List) – maximum value from a list min(Double List) – minimum value from a list mean(Double List) – computes the average value not counting null values missingMean(Double List) – computes the average value counting null
values as zero mod(Double N, Double n) – N%n
N nvl(Boolean, Boolean) – if first argument value is null return second
argument nvl(Double, Double) nvl(String, String)O occurs(String S, String s) – returns the number of occurences of s in SP printf(String, String/Double…) – offers the capabilities of the printf
method pow(Double N, Double n) – N**nR right(String S, Double N) – returns the last N characters from S round(Double N, Double n) – round N till n decimals (Ex: round(4.46, 1)
=> 4.5) rtrim(String) – trim right side of StringS str(Double N1, Double N2, Double N3) – Ex: str(30.25, 7, 3)=
“_BLANK_30.250“ strToDouble(String) – convert a String into a Double – return null if
String cannot be converted substring(String, Double N, Double n) – substring starting from N,
counting n characters sum(Double List) – sum of elements from list sqrt(Double) – returns the square root of the valueT trim(String) – trim String transcode (“targetField”, lookupReference, “lookupField”,
lookupValue) – performs a lookup based on the specified field and value and returns the value of the target field on the matching row
U upper(String) – switch to upper case uniqueInList(List) – checks if a list contains unique values
EurostatEurostat
2013 functionalities
Scalability improvements
Gesmes full integration
Internationalisation and interface improvements
EurostatEurostat
THANK YOU FOR YOUR ATTENTION
EDIT? Hmm