Bleckwen

Challenges in machine learning models 1/2

article



April 7, 2022



by Isabelle Robin

‍

In the innovation division at Bleckwen, we pursue improving and training our models and seek initiatives to complement their effectiveness. We leverage artificial intelligence to fight against credit fraud and provide our customers with a fast and easy-to-use system. But what exactly are artificial intelligence and machine learning? Which are the specific challenges of using them for fraud detection, and how to combat them? Beyond these questions, this article presents how we can train the machinery for fraud detection and the complexities of building an AI model from behind the scenes.

‍

Machine learning 101

Let’s introduce the basic principles of machine learning. Artificial Intelligence gathers systems based on the rule, heuristics, and systems-based machine learning algorithms. A rule-based system will rely on expert knowledge, whereas machine learning relies on learning patterns through data. ML is composed of several algorithms, including the very hype deep learning, natural language processing, and many others sometimes only known by Data Scientists.

‍

‍

However, these different machine learning algorithms can be classified into two families:supervised and unsupervised learning.

‍

‍

Supervised learning aims to develop predictive models based on labeled data. Supervised models detect patterns like those they were trained on. Unsupervised learning finds patterns in unlabeled data. Data can be grouped (genuine versus outliers, similar points in categories, and events in a time zone), or transformation can be applied to the variables to facilitate supervised training. Both techniques can be combined for fraud detection to detect existing fraudulent patterns while integrating anomaly detection as a proper model. These two stages are distinct and imply different steps. Here is a simplified diagram of the training and prediction pipelines.

‍

‍

Fraud detection challenges

Detecting fraud with machine learning raises many challenges. First, the definition of what is fraud is not always straightforward. The line with solvency can be thin, and the process from payment default to fraud qualification can vary among organizations.

Yet, mislabeling can lead to a dramatic model performance decrease, as the model could confuse fraud and simple patterns. This mislabeling issue can also be caused by the maturity period. This is the time between the reception of a record and its labeling as fraud.

Depending on the use case, this time may range from a few days to several months. During this maturity period, fraudulent records could be tagged as genuine and confuse our model! Consequently, the most recent records cannot be used in the training data.

Another challenge of using machine learning for fraud detection is the dataset's imbalanced nature. Fortunately, there are far fewer fraudulent cases than genuine cases. But this under-representation of frauds does not facilitate fraudulent pattern learning, our main objective. Thus, solving this equation implies multiple means throughout the training pipeline, from the features used to the model creation methodology and evaluation metrics.

‍

Feature family portrait

In supervised learning on tabular data and especially for fraud detection, feature engineering can be seen as the foundation of the training pipeline. Models are trained on the features created during this step and determine which patterns they will be able to detect and how much they can identify frauds from open records. Suppose a model is only based on raw variables. In that case, it will see basic patterns, which are combinations of those variables.

‍

‍

From trees to forests, or why using rules is not enough

Now that some exciting features are available, we can look at the possible algorithms for our fraud detection model. Gradient boosting machines are tree-based algorithms, particularly efficient on the imbalanced dataset. But what is a tree, and how are they combined in this algorithm? A decision tree is a set of rules where variables and thresholds would be chosen automatically. In the simple decision tree example, each node represents a variable and each branch a choice for this variable.

‍

‍

After several levels, for example: when the amount is below 500$ and the number of transactions on the last day below 3), we finally reach a leaf (here: no fraud). Decision trees are simple to understand and represent. However, with only a few levels, they cannot precisely define the complexity of fraud patterns and can generate too many false alerts. On the other hand, if there are too many levels and the tree is too deep, the model will learn "by heart" the frauds in its training dataset and will not be able to generalize the pattern on different data. This is called the overfitting problem. That's where the solution of combining those trees intervenes. Several techniques exist to ensemble them but let's focus on the boosting used in gradient boosting machines.

‍

Every tree will learn about the errors, false alerts, and missed frauds of the previous one. Then the final scoring will be the average of all the tree's results.‍

‍

In our next article, we will continue to unravel the interlinks behind building machine learning. At Bleckwen, we combine state-of-the-art technology with our experience in financial fraud. Our models are designed and driven by industry experts. We tailor it to your needs and business with easy and frictionless integration to your existing systems.

‍

Want to learn more ?