+ All Categories

g

Date post: 24-May-2021
Category:
Upload: phuochiep123
View: 0 times
Download: 0 times
Share this document with a friend
Description:
f
Popular Tags:
42
Foreseer A Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword Searches Developed by Hailong Cai & Jun Wang Presented By: Alex Thompson
Transcript
Page 1: g

ForeseerA Novel, Locality-Aware Peer-to-Peer System Architecture for Keyword SearchesDeveloped by Hailong Cai & Jun Wang

Presented By: Alex Thompson

Page 2: g

Overview1. Peer-to-Peer Network Architecture2. The standard (Gnutella)3. Improving the design4. Foreseer5. The Search Algorithm6. Experiment7. Results8. Further Study9. Conclusions

Page 3: g

Peer to Peer Architecture• Motivations

• Rapidly distribute data• Create vast repository of data spread over many nodes• Increase data availability• Leverage collective computing power to ease stress on

individual nodes and connections• Conserve bandwidth by distributing load throughout network

• Obstacles• Scalability• Bottlenecking• Wasted Bandwidth• File Corruption

Page 4: g

Peer to Peer Architecture

Simple

Page 5: g

Peer to Peer Architecture

Simple

Complex

Page 6: g

Gnutella• ‘True’ Peer-to-Peer

• All nodes equally participate• All nodes expected to share workload and bandwidth

similarly• Key ideal: As long as 2 nodes exist, network will persist

• Searching in Gnutella• Send query to all connected nodes (typically 5 to 10)• These nodes search their file base

• If the data exists, they set up a connection and transfer the data.• If not, they forward the query the nodes to which they are connected.

• This process continues until query is found or dropped• Query will attempt to reach outer limits of network.

Page 7: g

Gnutella• Issues:

• Not very scalable• Message flooding occurs as network size increases.

• Inefficient use of bandwidth• Searches may cross same path multiple times.• Searches take “walks to nowhere” by querying nodes with nothing shared.• Searches completely directionless and bounce around network.

• Node tracking and inventory nonexistent• Nodes do not inform peers when they disconnect and searches may be sent

to dead ends.

Page 8: g

Improving the Design• Use a limited “blind” searching algorithm

• Random Walk – forward query to random subset of neighbors.

• Pro: Reduces message flooding• Con: Still have repeated walks and walks to nowhere

• Directed BSF – select neighbors that have produced quality results before.

• Pro: Reuses paths by maintaining ‘Shortcuts’ list to previously useful nodes.

• Con: If a peer in the path departs, the path is lost.• Con: may delay queries for objects not on ‘shortcut’ neighbors as that

list must be exhausted before other neighbors are contacted.

Page 9: g

Improving the Design• Use an “informed” searching algorithm

• Intelligent BFS – maintain query tables of data classes for neighbors at each node.

• Pro: Reuses paths for queries of similar classes.• Con: The space required for this is too large, with update overhead

offsetting potential benefits.• Con: Node data deletion and node departure are not addressed.

• Local Indices – Maintain list of files for neighbors within a certain radius (r)

• Pro: A single node may respond for many others• Con: If r is small, many queries cannot be satisfied• Con: If r is large, updating indices becomes too expensive

Page 10: g

Foreseer• Exploit inherent 2-dimensional locality in P2P systems

• By using geographical and temporal localities, search efficiency and scalability can be achieved.

• Geographical Locality: Nodes which are geographically close to each other.

• Temporal Locality: Nodes which have been useful previously.

• 3 Components• Neighbors overlay (geographical component).• Friends overlay (temporal component).• Bloom Filters (indices distributed according to

relationships between friends and neighbors).

Page 11: g

Foreseer

Neighbor Link

Friend Link

Page 12: g

Foreseer• Rationale: Daily Life

• Everyone has friends and neighbors• Neighbors live close by and you meet when you first settle in• Friends are met through business dealings and may live distantly

• Together, these constitute social network• If you need help on a project, you will go to this network• If no one in your circle of friends and neighbors can help, they will

check with their own circle.

Page 13: g

Foreseer• Bloom Filters - Hash-based data structure

representing a set.• Why use this approach

• # of shared files on peers is limited (75% share 100 or less).• Documents on peer tend to share common topics.• Most files shared are multimedia, which has few keywords.

• Some stats• Returns no false negatives• May return false positives at a rate of where m = # of bits in the filter, n = # of elements in the set• Assuming K-max keywords = 10,000 and k hash functions = 8, the

equation to compute m is

nm

)6185.0(

bitskKMax 416,114

2ln8000,10

2ln

Page 14: g

Foreseer• Your Neighbors

• Bidirectional – neighbors are aware of e/other.• Latency used as metric for distance

• The longer the delay, the further away the potential neighbor must be

• Moving into the neighborhood• Established at setup

• ‘Ping’ sent out at startup• If a potential neighbor has fewer than ‘max’ neighbors, it will reply.• New neighbors also send filters along with positive replies.

• Process continues until new peer has minimum # of neighbors.

Page 15: g

Foreseer• Your Friends

• Unidirectional• Each peer has a certain group that it considers friends.• Each peer has a certain group that considers it a friend (backfriends)

• Nodes keep content info of friends, but not backfriends.

• How to make friends• At setup, new node asks its neighbors who their friends are.• Any friends of the neighbors that can accept a friend will

reply positively.• This is only an ‘initial’ list.

• After successful downloads, peer may update its friends list.• The idea: if a node was useful in finding data before, it might be

again – make it a friend.

Page 16: g

Foreseer• Node departure

• Expected/Intentional - Nodes inform friends and neighbors when they leave.

• Drop – friends and neighbors are not informed• Neighbors may learn of departure when they try and contact• No provision given for node to release backfriends

• Re-establishment• Attempt to contact neighbors and friends• If neighbors and friends can still accept you, they will

• May be necessary to repeat initialization if sufficient time has passed.• No one knows you anymore or you’re in a new neighborhood.

Page 17: g

Search Algorithm• Basic Steps

1. Initiating node attempts to resolve query locally2. If query cannot be resolved locally, it is forwarded along

the friends and neighbors links.• Note: ‘Free-riders’ will not be searched until neighbors are touched.

3. If query is resolved, send success confirmation message.

Page 18: g

Search Algorithm• Local Matching

• Keywords from stored documents are mapped to a filter:

Local Filter Keywords“Geographical Locality”“Temporal Locality”“Foreseer”“Bloom Filter”

Friend Filter Keywords“search”“distributed has table”“caching”“Bloom Filter”

Local

Filter

Friend

Filter

Page 19: g

Search Algorithm• Expanding to Friends and Neighbors

1hh 121 hhhh

21 hhh

Search Friends

Search Neighbors

Terminate Search

h1 = the # of hops along the friends overlayh2 = the # of hops along the neighbors overlay

Page 20: g

Search Algorithm• Friends go 1st

1.Friends files interest the source nodes, so friends of friends files are more likely to interest the source node.

2.Friends links don’t refer to free-riders.

3.Friends link are likely more spread out over the network.

HopNodes

TouchedNodes

ForseenMsgs

Produced0 1 f + n 01 f (f + n) f f… … … …i f-i (f + n) f-i f-i

… … … …h1 f-h1 (f + n) f-h1 f-h1

Page 21: g

Search Algorithm• Then neighbors

• The friends search is limited to h1 hops.

• After this, the neighbors are searched for h2 hops.

HopNodes

Touched Nodes ForseenMsgs

Produced0 1 (f + n) f-h1 01 n (f + n) n f-h1 n… … … …i n-j (f + n) n-j f-h1 n-j

… … … …h2 n-h2 (f + n) n-h2 f-h1 n-h2

Page 22: g

Search Algorithm• Other resource-considerations

• Free Riders – When there’s nothing to look for…• (Initially) skipping free-riders saves a great deal of pointless walks.• Free-riders may issue queries to the system, but will have no friends.• Eliminates many meaningless messages and recovers bandwidth.

• False Positives – “But it’s supposed to be here?!”• Identification – multiple friends/neighbors seem to have the matching

data.• Resolution – selectively forward query to 2 or more friends/neighbors

• Chance of double error extremely slim.

Page 23: g

Search Algorithm• System Maintenance

• Object Publishing & Removal• Extract keywords from new file• Recreate context filter• Send ‘changes’ message to all backfriends and neighbors

• Alters bits on remote content filters.

• Node join & Departure• When a node leaves, it tells its backfriends and neighbors

• Backfriends & neighbors remove connection, delete content filter.• Departing node caches F & N’s list for contact upon reconnection.

• When a node drops, BF’s & N’s wont know until they attempt contact• A more aggressive ‘ping-pong’ system could be implemented to deal

with this.

Page 24: g

Experiment• Foreseer settings

Neighbors (N)

Friends (F)

Backfriends (F-1)

102 N

84 F

201 F

Page 25: g

Experiment• The Challengers

• Gnutella• TTL = 7

• IBS (Interest Based Shortcuts)• TTL = 7• Max Shortcuts = 10

Page 26: g

Experiment• LI (Local Indices)

• LI-1• r = 1• Note: less indices but can’t

satisfy enough queries

• LI-2• r = 2• Note: more indices but

expensive to maintain

Page 27: g

Experiment• Network Topology

• Transit-Stub model• Euclidean coordinate space• 51,984 physical nodes• Randomly create nodes

• 9 transit domains• 16 transit nodes per domain (on average)• 9 stub domains per transit node• 40 stub nodes (on average)

• Randomly create links• Transit nodes connect at a probability of 60%• Stub nodes connect at probability of 40%

Page 28: g

Experiment• Latency

Page 29: g

Experiment• Hardware

• SUNFire 880• 16 CPUs• Java 2 SDK environment

• Software• Java

• Test Data• Trace of an eDonkey system

• 923,000 filenames• 37,000 peers• Conducted during 1st Week, Nov. 2003

• Reconstruct download trace• Consists of query, response, transfer• Assume only 1 copy of file is shared before simulation begins

Page 30: g

Experiment• Test Metrics

• Success Rate – % successful queries.• Response Time – average time to fine 1st match.• Relative Distance – ‘Virtual’ distance traveled to reach destination.• Messaged Produced – average # of messages produced during search.• Nodes touched – average # of nodes visited during search.• Free-riders touched – average # of non-participating nodes visited during

search.

Page 31: g

Results• Foreseer simulations

• Optimal policy chosen is P1: {F5 N1}• Fewer free riders than ‘best’ policy

PolicySuccess

rateAvg. hops

Avg. Response Time

Relative Disance

# of messages

# of nodes

touched# of free-riders

touchedP1: {F3 N3} 93.64% 3.01 426 3.91 194 164 56P1: {F4 N2} 99.70% 2.95 453 4.22 234 175 18P1: {F5 N1} 99.90% 2.97 459 4.29 262 177 2P1: {F6 N0} 99.34% 2.95 457 4.24 250 169 0P2: {N4 F4} 25.30% 5.64 482 4.36 181 163 5P3: {F4 // N2} 94.87% 2.77 402 3.7 191 150 7P3: {F4 // N3} 98.25% 2.75 382 3.42 210 170 28P3: {F4 // N4} 99.29% 2.74 374 3.36 228 187 42P4: {N2 F2 N2} 96.46% 4.14 442 4.01 228 181 18P5: {F2 N2 F2} 98.10% 3.35 403 3.65 208 174 47

Page 32: g

Results• Queries Served

• Majority served by friends• Why use Neighbors?

• h > 3 = trouble

Page 33: g

Results• Search Efficiency

• Response Time• Foreseer exploits both temporal

and geographical locality.• IBS pays greater penalty prior to

contacting neighbors.• Gnutella and LI-x since neither

exploit both localities.

Page 34: g

Results• Search Efficiency

• Relative Distance• Only Foreseer defines neighbor as

‘temporally’ close.• Other systems may be sending greater

distances to neighbors.

Page 35: g

Results• Search Cost

• # of Messages• Gnutella floods all queries• Others do if not met by

shortcuts or local indices• Foreseer benefits

• Friends satisfy most queries

Page 36: g

Results• Search Cost

• # of nodes touched• Gnutella – no optimization• IBS – temporal locality• LI-2 – abundant indexing• Foreseer – combines both

intelligently

Page 37: g

Results• Search Cost

• Free-riders touched• Challengers mostly the same %• But Foreseer still touches fewer

total nodes

Page 38: g

• Search Cost• Update Messages

• IBS & Gnutella inapplicable• Foreseer almost as good as

LI-2 with less bandwidth used• Bloom filters decrease size of

remote filters

Page 39: g

Results• Foreseer testing

• Response time• 4 groups, e/ roughly double

size of smaller group.• Note: very little improvement

from adding neighbors.

• Message Count• Same idea as above• Note: The paper concedes that

for this test there was an anomaly with Max |F| = 4

• Query could not reach enough nodes due to the upper bound on friends.

Page 40: g

Results• Foreseer testing

• Update costs• Linear function

• y = 1.5x + b

• # of free-riders touched• Varying N has little to do with

result once Max |F| >= 8• At Max |F| >= 12, query will not

touch free riders• Infers resolution within friends

overlay.• Indicates, again, reliance on

friends overlay for success.

Page 41: g

Further Study

• Non-participant flags for free-riders.• Actual message costs of ping-pong peer detection.• Friends neighbors intersection resolution.

Page 42: g

Conclusion• Created 2 overlays for an unstructured P2P network

• Geographical (neighbors)• Temporal (friends)

• Achieved high search efficiency• Content filters

• Trace-driven simulation• 70% improvement

• response time • relative distance

• 90% improvement• # of query messages• # of nodes touched


Recommended