Finding Fraud Part Two

In the last blog post, we saw how we can use Neo4j to find the merchants where credit card fraud originated or was used for testing stolen data in order to prevent further fraudulent charges. It stemmed from a webinar on our youtube channel with has hundreds of videos about graphs and Neo4j. We will continue diving into the technical details by looking at how Neo4j can help you find fraud rings. The way this fraud works is that a large set of synthetic accounts are created and act like normal customers. Over time, they request higher and higher levels of credit, which they pay back on time. Then, they all request the maximum credit they can get, take out the money, and disappear! Let’s find them before this happens.

We’ll start by having a bunch of customers. We will reuse John, Sheila, and Karen from the last blog post:

CREATE (john:User {name:"John"})
CREATE (sheila:User {name:"Sheila"})
CREATE (karen:User {name:"Karen"})

Our graph looks like this so far:

Neo4j initial graph

Unfortunately, a few unconnected nodes do not make for much of a graph; let’s add a few more components. John has a credit card, a bank account, and an unsecured loan.

CREATE (cc1:Card {number:"4012888888881881", balance: 493.23})
CREATE (ba1:Account {number:"85474584", balance:1322.30, type:"Checking"})
CREATE (us1:Loan {number:"63493639", balance:5000.00, rate: 5.8, type:"Unsecured Loan"})
CREATE (john)-[:HAS_ACCOUNT]->(cc1)
CREATE (john)-[:HAS_ACCOUNT]->(ba1)
CREATE (john)-[:HAS_ACCOUNT]->(us1)

Sheila has a bank account and a credit card, as well as an SSN tied to her account.

CREATE (ba2:Account {number:"25384738", balance:2983.99, type:"Checking"})
CREATE (cc2:Card {number:"5105105105105100", balance: 893.11})
CREATE (ssn2:Identification {number:"000-42-4329", type:"SSN"})
CREATE (sheila)-[:HAS_ACCOUNT]->(ba2)
CREATE (sheila)-[:HAS_ACCOUNT]->(cc2)
CREATE (sheila)-[:HAS_ID]->(ssn2)

Karen has a bank account, an unsecured loan and a phone number tied to her account:

CREATE (ba3:Account {number:"63493639", balance:3204.83, type:"Checking"})
CREATE (us2:Loan {number:"28372342", balance:5000.00, rate: 6.0, type:"Unsecured Loan"})
CREATE (phone2:Phone {number:"312-606-0842"})
CREATE (karen)-[:HAS_ACCOUNT]->(ba3)
CREATE (karen)-[:HAS_ACCOUNT]->(us2)
CREATE (karen)-[:HAS_PHONE]->(phone2)

Our graph now looks like this, and so far so good. Nothing to worry about here.

Graph with accounts tied to individuals

Let’s add a few more things we know about our three customers. John and Sheila are sharing a phone number; they could be a couple or maybe roommates. The number could also used to be John’s number and is now Sheila’s. So far, things look ok:

CREATE (phone1:Phone {number:"312-876-5309"})
CREATE (john)-[:HAS_PHONE]->(phone1)
CREATE (sheila)-[:HAS_PHONE]->(phone1)

John and Karen are sharing an identification number… how did that happen?

CREATE (ssn1:Identification {number:"000-91-7434", type:"SSN"})
CREATE (john)-[:HAS_ID]->(ssn1)
CREATE (karen)-[:HAS_ID]->(ssn1)

It turns out that all three of these users happen to have the same address. That makes the shared telephone number make sense, but the shared identification number doesn’t add up.

CREATE (ad:Address {line1:"175 N. Harbor Drive", city:"Chicago", state:"IL", zip:"60601"})
CREATE (john)-[:HAS_ADDRESS]->(ad)
CREATE (karen)-[:HAS_ADDRESS]->(ad)
CREATE (sheila)-[:HAS_ADDRESS]->(ad)

Graph with phone lines and authentication number

With our graph, we can easily see that these accounts are connected, but how can we tell Neo4j they are connected? We are looking for Connected Components, are subgraphs of nodes connected together.

To create these, we can make use of the Neo4j Graph Algorithms Plugin, specifically the Union Find algorithm.

Make sure you have the Neo4j Graph Algorithms plugin installed. It’s easy if you use Neo4j Desktop, but if not, you can download it from the Neo4j Download Center (scroll down about halfway). You will need to unzip and place the .jar file inside the neo4j “plugins” folder as well as configure the neo4j.conf file in the “conf” directory to allow the use of the graph algorithm stored procedures by adding this line and restarting Neo4j:

 dbms.security.procedures.unrestricted=algo.*

Before we continue, we’ll add “Robert” as a user, with a checking account and credit card but not directly connected to anything else in our graph.

CREATE (robert:User {name:"Robert"})
CREATE (ba4:Account {number:"8374927", balance:1273.39, type:"Checking"})
CREATE (cc3:Card {number:"378282246310005", balance: 134.72})
CREATE (robert)-[:HAS_ACCOUNT]->(ba4)
CREATE (robert)-[:HAS_ACCOUNT]->(cc3)

Alright, we should have two connected components in our graph. Let’s find them. We’ll use the streaming version of the “unionFind” algorithm and convert the nodeIds it returns to the actual nodes in order to retrieve the name property alongside the “setId” assigned by the algorithm. The Neo4j Graph Algorithms have the option to create an in-memory graph using Cypher Projection. This option allows us to define the nodes and relationships of a virtual graph and run our algorithms on that instead of the actual graph. We are looking for connected users, but there are only indirect relationships between them; with Cypher projection, we can turn their indirect relationships into virtual direct relationships like so:

CALL algo.unionFind.stream( 'MATCH (p:User) RETURN id(p) as id', 'MATCH (p1:User)-->()<--(p2:User) RETURN id(p1) as source, id(p2) as target', {graph:'cypher'}
) YIELD nodeId, setId RETURN algo.asNode(nodeId).name AS user, setId

As expected, John, Sheila, and Karen are in one set, while Robert is by himself in another.

John, Sheila, and Karen connected to one another

This is pretty good, but what if the fraud rings made sure not to use any of the same identifying information for any of the accounts they opened? Let’s go back to what the graph looked like here:

Modeling a fraud ring with different IDs

Now what do we do? Well, we can look at how they accessed these accounts. Let’s say the accounts for John and Sheila were accessed via the Interactive Voice Response (IVR) by the same ANI phone number.

MATCH (john:User {name:"John"}), (sheila:User {name:"Sheila"})
CREATE (ani:ANI {number:"312-666-1234"})
CREATE (ani)-[:CALLED]->(john)
CREATE (ani)-[:CALLED]->(sheila)

Additionally, let’s say the accounts for John and Roberts were accessed by the same browser fingerprint.

MATCH (john:User {name:"John"}), (robert:User {name:"Robert"})
CREATE (fg:Browser {fingerprint:"asdf7373jsdf3rw"})
CREATE (fg)-[:ACCESSED]->(john)
CREATE (fg)-[:ACCESSED]->(robert)

Now our graph looks like (we left out Karen’s accounts for brevity) this:

Potential fraud ring graph (minus Karen)

Let’s run the union find algorithm again; this time, we will just use the “ACCESSED” relationships in our Cypher projection query.

CALL algo.unionFind.stream( 'MATCH (p:User) RETURN id(p) as id', 'MATCH (p1:User)<-[:ACCESSED]-()-[:ACCESSED]->(p2:User) RETURN id(p1) as source, id(p2) as target', {graph:'cypher'}
) YIELD nodeId, setId RETURN algo.asNode(nodeId).name AS user, setId

Now John, Sheila, and Robert are all in the same set, while Karen is by herself:

Relationship found between John, Sheila, and Robert

Now, we can then combine both the browser/ivr access and user attributes together to see if they are all really part of one big fraud ring by looking for any indirect relationships in any direction between users in our Cypher projection:

CALL algo.unionFind.stream( 'MATCH (p:User) RETURN id(p) as id', 'MATCH (p1:User)--()--(p2:User) RETURN id(p1) as source, id(p2) as target', {graph:'cypher'}
) YIELD nodeId, setId RETURN algo.asNode(nodeId).name AS user, setId

There it is. All four of our users are connected in some shape or form into the same set.

Connections found between all four users

It’s pretty normal to find connected components of small sizes. For example, a couple with separate accounts using the same computer at home, but large connected components should probably be investigated. If we have a large number of accounts, it may be best to use the non-streaming union find procedure and write the results into the node properties like this.

CALL algo.unionFind( 'MATCH (p:User) RETURN id(p) as id', 'MATCH (p1:User)--()--(p2:User) RETURN id(p1) as source, id(p2) as target', {graph:'cypher'}
) YIELD setCount

That stored procedure call will add a “partition” property to our User nodes. Then, we can return the top partitions with this query:

MATCH (n:User)
RETURN n.partition, COUNT(*) AS members, COLLECT(n.name) AS names
ORDER BY members DESC
LIMIT 10

Returning top partitions

You would run the union find algorithm and this query regularly to find connected components, but when creating new accounts or getting new information for existing accounts (change of address, ivr logs, web lots, etc), we can simply check if any of our new nodes are connected to existing partitions or bridge multiple partitions indicating there may be a fraud ring in the making.

You can find the code for this post on this gist. Now you know how to find Fraud Rings using Neo4j, so go on and fight fraud!

This UrIoTNews article is syndicated fromDzone