+ All Categories
Home > Technology > BI Forum 2009 - Principy architektury MPP datového skladu

BI Forum 2009 - Principy architektury MPP datového skladu

Date post: 16-Jun-2015
Category:
Upload: oksystem
View: 142 times
Download: 1 times
Share this document with a friend
26
© 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BI Forum 2009 Principy architektury MPP datového skladu 26. listopadu 2009 Václav Hubka - Hewlett-Packard
Transcript
Page 1: BI Forum 2009 - Principy architektury MPP datového skladu

© 2009 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice.

BI Forum 2009

Principy architektury MPP datového skladu

26. listopadu 2009

Václav Hubka - Hewlett-Packard

Page 2: BI Forum 2009 - Principy architektury MPP datového skladu

2

27 November

2009

Agenda

• Spotřebiče pro enterprise datové sklady (EDWH)

• Principy návrhu datového skladu na platfrormě

„EDWH spotřebičů“

• Představení Operational DWH

• Architektura MPP datového skladu pro Operational

DWH

Page 3: BI Forum 2009 - Principy architektury MPP datového skladu

© 2009 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice.

Data Warehouse Appliances

Page 4: BI Forum 2009 - Principy architektury MPP datového skladu

DWH appliances

• Provide:

• Systems that are packaged (tightly integrated stack, bundled, balanced, pre-tuned, pre-installed; single support contact) and optimized for BI workloads

• Low maintenance and support through a single source

• Fast installation

• Integrated management and automated system administration. Increased functionality can not mean increased administrative complexity.

• Easy incremental expansion

• Guaranteed performance for specific purposes, use of new technologies to drive high performance

• Lower TCO and faster ROI

• Have established market acceptance

• 10TB or more in first phase is common today

• Have begun to host EDWs

• Satisfy real-time, on-demand, and/or Operational BI requirements outside of EDW

• 38% of TDWI research survey respondents have deployed, are currently evaluating, or plan to evaluate soon

Page 5: BI Forum 2009 - Principy architektury MPP datového skladu

Data Warehouse Appliance Pros

39% = Pre-tuned for

data warehousing

18% = Fast query performance

12% = Reduced system integration

11% = Fast installation

6% = Low cost

8% = Easy incremental expansion

6% = Other

What do you think is the leading benefit of

a data warehouse appliance?

Sources: TDWI Tech Survey, August 2005, 119 responses

TDWI Tech Survey, February 2007, 112 responses

Page 6: BI Forum 2009 - Principy architektury MPP datového skladu

Analyze

data

Physical

database

design

Load

data

Index

and

aggregate

Query

data

Ongoing

tuning

• Performance is dependent on getting a good physical design

• Time to market with new data is limited by skills and resources

• Query performance is poor when queries don’t take advantage of the design (no index scans)

Traditional data warehouse approach

Page 7: BI Forum 2009 - Principy architektury MPP datového skladu

• High-power-to-data ratio can operate without database tuning

− Load a TB in 1 hour

− Scan a TB in 30 seconds

• Neoview features take performance beyond scanning

− Next-generation optimizer can resolve complex queries efficiently

− pMesh dual switch fabrics provide massive bandwidth between nodes

− “Skewbusting” technology to resolves the traditional skew issues of MPP system

Neoview platform performance

Load

data

Query

data

Page 8: BI Forum 2009 - Principy architektury MPP datového skladu

• Overpowering a workload with inexpensive power has many benefits

− The ability to perform queries that no one anticipated

− “Load-and-go” simplicity of design

− Reduced indices, materialized views, etc., to manage and tune

− Enough power to quickly drop, reload, and restructure tables

Simply put…

Page 9: BI Forum 2009 - Principy architektury MPP datového skladu

• Random I/O− Scan a 1 TB table to find 1,000 rows using 256 inexpensive

processors in 30 seconds

− Same access with an index takes a fraction of a second using only one of the 256 inexpensive processors

• Common aggregations− Brute force does wonders for aggregating data on the fly so that

you don’t need to prebuild materialized views

− The same brute force can build a MV quickly, reducing CPU consumption 1,000x at runtime

• Both limit concurrency− How many times can one table scan in a day?

What the appliance model is not good at…

Page 10: BI Forum 2009 - Principy architektury MPP datového skladu

Performance repository

10–1,000x faster

Much higher concurrency

Beyond the database appliance with the Neoview platform

• Outperforms a pure appliance with little or no design decisions

• Permits you to “graduate” tables to a more enhanced design that supports extremely high concurrency

• Mixed workload support allows both optimized and nonoptimized workloads to coexist

Load

data

Query

data

Improve

database

design

Page 11: BI Forum 2009 - Principy architektury MPP datového skladu

© 2009 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice.

Operational DWH

Page 12: BI Forum 2009 - Principy architektury MPP datového skladu

12

Operational BI

Next

generation

EDW

Responding

to business

events as

they

occur

Operating

priorities

Market

signals

Real-Time

streams

Timely, operational decisions

• Real-time analytics

• 24 x 7 availability

Rich

interactions

across all

touch

points

Self-

service

Call

center

Suppliers

Know your customer and

partners

• 1000s of concurrent users

• Single version of the truth

Leveraging information as a

strategic asset

Web

Operational

SystemUnstructured

data

Richer data inputs

• Large data volumes

• Complex, mixed workloads

Page 13: BI Forum 2009 - Principy architektury MPP datového skladu

13

Performing multiple decisions by a large number of users characterizes operational BI

Number of decisions

Collective

impact of

decisions

High

High

Strategic

Business

Intelligence

Operational

Business Intelligence

Tactical Business

IntelligenceTactical Business

IntelligenceLow

Low

Page 14: BI Forum 2009 - Principy architektury MPP datového skladu

14

The evolution of data warehouses in operational BI environments

Built into business processes

Automating actions

Continuous online updates

Enterprise-wide resource

Thousands of users performing many types of tasks

Mission-critical

Back room analysis

Reporting

Offline batch updates

Departmental data marts

Few users doing strategic analysis

Availability not critical

Traditional BI Operational BI

Page 15: BI Forum 2009 - Principy architektury MPP datového skladu

15

• Enterprise data warehouse, for large enterprise operational needs

• Scales to thousands of users, terabytes of data

• Rapidly deployed, easily managed, compatible with existing BI tools

• Integrated solution, HP innovation and flexible- standards-based

components

Integrated

Hardware

OS

DBMS

Real time

Updates

Query ToolsData Integration

Concurrent Users

HP Neoview & Operational Business Intelligence –Real-time insight for your business

HP Neoview

Mixed

Workloads

Enterprise Data Warehouse

Page 16: BI Forum 2009 - Principy architektury MPP datového skladu

16

• Shared-nothing MPP− Each processor a unit of parallel work

• Database virtualization− Data transparently hashed across all disks

• Parallel query execution− Queries divided into subtasks and executed in

parallel with results streamed through memory

• Real-time data warehousing− Mixed workload & transactional heritage

• Unrivaled availability− Continuously available in spite of any single

point failure; online database operations

• Extreme processing power− 1 Intel® Itanium® processor to 2 RAID 1

volumes

Architected for availability, scalability, and performance

BI clie

nt

ET

L c

lients

Page 17: BI Forum 2009 - Principy architektury MPP datového skladu

91 billion rows

of data (20TB)

Neoview is designed for changing customer requirements

Analytical

queries120 concurrent

220 queries

Report

queries300 concurrent

1.85 million

queries

Adhoc

surprise4 concurrent

8 queries

SLA: 20 min/2 hours SLA: 5 Seconds

SLA: 2

Seconds

SLA: 2 Seconds

SLA: 200 ms

SLA: 10 min

Neoview: 2min to 46min

Neoview: 7min

Neoview: 1.6sec

Neoview: 0.5sec

Neoview: 200ms

Neoview: 167ms

All workloads run

concurrently

Online

ingest7.5m rows of data

every 10 min

Tactical-3400 concurrent

13.4 million

queries

Tactical-2

350 concurrent

6.4 million

queries

Tactical-1320 concurrent

3.6 million

queries

Page 18: BI Forum 2009 - Principy architektury MPP datového skladu

© 2009 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice.

Architecture of an MPP DWH – HP Neoview

Page 19: BI Forum 2009 - Principy architektury MPP datového skladu

19

PS PSCS CS

Neoview segment architecture

• 16 nodes per segment

• Dual active (X,Y) interconnect fabrics

• Multiple fat I/O pipes

• Dual cluster switches for inter-segment I/O

X Fabric Y Fabric

BladeBlade

P01,P02 P03,P04 P19,P20P07,P08 P09,P10 P11,P12 P13,P14P05 P06 P21,P22P15,P16 P17,P18 P23,P24 P25,P26 P27,P28 P29,P30 P31,P32

P01 P14

M15 M28

M01 M14

P15 P28

P29 P42

M29 M42

B27,B30 B05,B28 B12,B21B02,B32 B03,B06 B04,B13 B07,B09B01,B31 B15,B17B08,B10 B11,B14 B16,B18 B19,B22 B20,B29 B23,B25 B24,B26

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 Node 14 Node 15 Node 16

• RAID1 (mirrored)

disk protection

• Active reading from

both RAID1 copies

• Separate controller

writes for integrity

• End-to-end disk

checksum integrity

Page 20: BI Forum 2009 - Principy architektury MPP datového skladu

20

Neoview multi-segment architecture

• Active dual fault tolerant fabrics

• Multi-layered clustering (>128p)

• 500 MB/sec dedicated links

• Each segment adds bandwidth

• Cross sectional bandwidth up to 128 GB/sec

FT Clustered Mesh Fabric 1 to 16 segments

Neoview Segment Neoview Segment

Neoview Segment Neoview Segment

Page 21: BI Forum 2009 - Principy architektury MPP datového skladu

21 -216 -

Hash of partitioning key

Partitioning key

• The key is transparently hashed to identify data placement

• Balanced data distributions across all disks

• Balanced SQL execution across all processors

• Table, index, and materialized view support

Table

ATable

B

Table

C

Highly Parallelized Database

Page 22: BI Forum 2009 - Principy architektury MPP datového skladu

Neoview Shared Nothing Architecture

22

Page 23: BI Forum 2009 - Principy architektury MPP datového skladu

23 -236 -

− Data indexed for fast access by the clustering key

− Clustered data for fast sequential access

Hash by

order #

Line itemCluster by

order date, order number,item number

Cluster by

order date, order number

Partitioning determined

by hash of partitioning

key

Data indexed and clustered

by clustering key

Hash by

order #

Order

Indexed Clustering for Performance

Page 24: BI Forum 2009 - Principy architektury MPP datového skladu

© 2009 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice.

Questions?

Thank you

Page 25: BI Forum 2009 - Principy architektury MPP datového skladu

25 -256 -

Co-locating Index and Base Table Data− Eliminates cross-processor messaging overhead

− Fast and efficient indexing for query speed-up

Cluster by

order date, order number,item number

Cluster by

item number

Hash by

order #

Line item

Hash by

order #

Index on

line item

Page 26: BI Forum 2009 - Principy architektury MPP datového skladu

26

Parallel UOW drives MPP performance

• Measured performance

− Scan: 286 MB/sec/CPU2.34 GB/sec/segment

− Ingest: 1MB/sec/CPUto 256 CPUs using 3 loaders

− Extract: 2.5MB/sec/CPUto 64 CPUs

− Insert: 1MB/sec/connection at 128 connections

− Fetch: 1.5MB/sec/connection at 128 connections

RAID 1 RAID 1

LDV3-P LDV3-B

Itanium 2 processor Itanium 2 processor

SCAN

Data

ManagementData

ManagementData

management

Data

ManagementData

ManagementData

management

LDV1-P LDV1-B


Recommended