+ All Categories
Home > Business > Jan Hřivňák - Co vám čísla nepoví, ale jejich analýza napoví

Jan Hřivňák - Co vám čísla nepoví, ale jejich analýza napoví

Date post: 18-Jul-2015
Category:
Upload: k2-atmitec
View: 136 times
Download: 0 times
Share this document with a friend
26
Extend your BI solution with Preditive Analytics for better decisions Jan Hrivnak Consultant & Analyst [email protected]
Transcript

Extend your BI solution

with Preditive Analytics

for better decisions

Jan Hrivnak

Consultant & Analyst

[email protected]

Customers Stories

„These guys know what they’re doing. Thanks their targett approach our open and conversion rates icreased over 100% which led to 45% campaign revenue growth“ Dalia Lasaite, CG Trader

“They helped us with our Facebook acquisition campaigns, identyfying VIP customers anfd using them for lookalike audience creation. The result was impresive – we generated over four times more campaign revenue and our ROI was almost 50% higher” Martim Chamrad, Craneballs

“A predictive model driven recommendation engine was super-easy to implement, and we found that the recommendations are 85% more likely to lead to sales.” Chirag Nirmal, Bow & Drape

“Behavioral segmentation gave us a fresh look our current clients with another perspective in different consequences. It has been helping us to identify clients who changed their shopping approach towards our company, so we could react to this change adequately” David Kroupa, Seznam

Data Mining Concept Data mining is the process of automatically discovering useful information in large data

repositories. (Tan, Steinbach, Kumar, 2006)

A process of non-trivial retrieval of the implicit, formerly unknown and potentially useful

information from data (Fayad et al., 1996).

A process of revealing hidden consequences in data.

Exploratory analysis of observational data.

Data -> Information -> Decision.

Traditional techniques may

be unsuitable due to

Large amount of data

High dimensionality of data

Heterogeneous,

distributed nature of data

Statistics

Data Mining

AI

Machine Learning

Pattern Recognition

Data Mining Tasks In general: predictive vs. descriptive

Classification (credit risk calculation)

Estimation (long-term customer value)

Segmentation (groups of subjects with similar behavior)

Shopping cart analysis (products being bought together)

Fraud detection (suspicious credit card transactions, claim validation)

Anomaly detection (aircraft systems monitoring during flight, medical systems)

Prediction (“Churn” – which customers will leave next year?)

Social networks mining, spatial data mining

Data quality mining (data quality measurement and improvement)

Find human-

interpretable patterns

that describe the

data.

Use some variables

to predict unknown

or future values of

other variable.

Data Mining Methods Decision trees

Association analysis

Clustering

Graphical probabilistic models

Neural networks

Kohonen self-organizing maps

Support vector machine

Nearest neighbor

Non/linear regression

Logistic regression

Time series analysis

Genetic algorithms

Fuzzy modeling

GUHA, …

Areas of Data Mining Applications

Banking & insurance (fraud detection,

predicting customer life-time value, …)

Telecommunication (-||-)

Direct marketing

Supply chain management

eCommerce

Trading (technical analysis)

Scientific research

Medicine & healthcare (medical expert systems)

Technical fault diagnosis

Data Quality: a Critical Issue

“Garbage in, garbage out”

90% of time: data preparation (ETL)

10% of time: the DM itself

Data transformation issues

Data ambiguity (e.g. Gender = ‘F’, ‘Female’, ‘woman’, ‘male’, ‘man’, etc.)

Missing values

Duplicate values

Naming conventions of terms and objects

Different currencies

Different formats of numbers and text strings

Referential integrity

Missing dates

Software for Data Mining Commercial

SPSS PASW Modeler / Clementine (http://www-01.ibm.com/software/analytics/spss/)

SAS (http://www.sas.com/)

Microsoft SQL server (http://www.microsoft.com/sqlserver/2008/en/us/default.aspx)

Microsoft Excel (DM Add-In; http://www.microsoft.com/sqlserver/2008/en/us/data-mining-

addins.aspx)

Oracle DM (http://www.oracle.com/technology/products/bi/odm/index.html)

Kxen (http://www.kxen.com/)

MS Azure ML (Claud)

OpenSource or Freeware

Weka (http://www.cs.waikato.ac.nz/ml/weka/)

R (http://www.r-project.org/)

Orange (http://www.ailab.si/Orange/)

LISP Miner (http://lispminer.vse.cz/)

Ferda (http://ferda.wiki.sourceforge.net/)

Benefits for Customers Better understanding of their business

Increasing efficiency

Increasing safety, reliability

Possibility of restructuring business processes.

Possibility of changing person’s mindset.

Possibility of increasing profit, decreasing risks, better financial stability.

Competitive

advantage

Risks

Unsure result

Data Mining can reveal already known or obvious facts

The result depends on data quality (errors) and distribution of values (skewness, kurtosis,

...)

Overfitting (model is not generalizing enough, it is too much trained to concrete data) can

occur, but there are ways to minimize it.

Use Case: Progressive rewards

Each customer has

different worth

Focus on customers

individually

=> Happy Customer

=> $$$

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

Your cut optimum

Best Worst

January December

Repeating

One time

Use Case: Up-sell, X-sell

Focus on Loyal customers => +4%

Convert them into VIP customers

=> Increase Revenue $$$

Use Case: Up-sell, X-sell

R 1 2 3 4 5

M F F F F F

1 1 1 1 1 2

2 1 1 1 2 5

3 1 2 3 3 7

4 1 3 3 5 10

5 2 6 7 12 21

R 1 2 3 4 5

F $ $ $ $ $

1 45k 197k 118k 114k

2 162k 89k 55k

3 369k 238 507k 246k 342k

4 260k 575k 1407k 1061k 397k

5 918k 1476k 2863k 6677k

R 1 2 3 4 5

F $ $ $ $ $

1 1M 15M 8M 3M

2 24M 3M 0M

3 13M 23M 14M 9M 11M

4 5M 25M 97M 57M 14M

5 17M 50M 157M 935M

R 1 2 3 4 5

F # # # # #

1 27 78 69 29

2 140 37 4

3 34 95 27 36 32

4 18 43 69 54 34

5 19 34 55 140

# of Customers - Heat map RF Heat map RF

Avg Frequency - Heat Map RM

Avg Transactions / product - Head Map RF

Use Case: Customer Churn

identifiy customers, who want to leave to competition in given period

Historical data

(Previous months)

Regular predictions

(Current month)

Marketing campaign

(Next month)

Potential churn

(Next 2 months)

Use Case: Market basket analysis,

Shelf content optimalisation

Offering of products that you most likely to buy

Defining the contents of the shelf according to what people most often buy

together

Use Case: Claim handling

Automation of claim handling process and therefore saving money

Speeding-up the process

Reducing complexity without impacting the result

Better understanding of what are the real key factors of the decision

process

Identifying suspicious exceptions in the decision process (fraud detection)

Optimizing the process to be more accurate in terms of whether a claim

should be accepted or rejected

Projects - References

Mondi – production process optimalisation, cost

optimalisation, Paper Mill

Nordic mobil devices producer – claim handling

Overkill, CGTrader, Seznam.cz – customer

segmentation, campaigns and markreting

Enterasys – analysis of won opportunities

NBA – customer segmentation, market basket

analysis

Customers Stories

„These guys know what they’re doing. Thanks their targett approach our open and conversion rates icreased over 100% which led to 45% campaign revenue growth“ Dalia Lasaite, CG Trader

“They helped us with our Facebook acquisition campaigns, identyfying VIP customers anfd using them for lookalike audience creation. The result was impresive – we generated over four times more campaign revenue and our ROI was almost 50% higher” Martim Chamrad, Craneballs

“A predictive model driven recommendation engine was super-easy to implement, and we found that the recommendations are 85% more likely to lead to sales.” Chirag Nirmal, Bow & Drape

“Behavioral segmentation gave us a fresh look our current clients with another perspective in different consequences. It has been helping us to identify clients who changed their shopping approach towards our company, so we could react to this change adequately” David Kroupa, Seznam

Thank you Don’t hesitate to contact us.

Jan Hrivnak

Consultant & Analyst

[email protected]

CRISP-DM: Project Methodology

Predictive DM Models with Highest Prediction Accuracy

Up to 95%

Just few attributes really needed

WHO

START

RECVG TO

SHIPD DAYS SRVC CODE

SENDER

CNTRY

SRVC COSTS

CRNCY

IS

INFORMATION

ONLY RETN TYPE IN WRTY IND

Decision Tree Detail

Benefits for Customer Automation of claim handling process and therefore saving money

Speeding-up the process

Reducing complexity without impacting the result

Better understanding of what are the real key factors of the decision

process

Identifying suspicious exceptions in the decision process (fraud

detection)

Optimizing the process to be more accurate in terms of whether a

claim should be accepted or rejected


Recommended