Regulations around AI and machine learning
Sandeep Maira, former Head of Strategic Systems, The OCC
Below is an insight into what can be expected from Sandeep’s session at Non-Financial & Operational Risk USA 2023.
The views and opinions expressed in this article are those of the thought leader as an individual, and are not attributed to CeFPro or any particular organization.
Why is innovation important in machine learning? Can you give any examples of outstanding innovations in machine learning in recent times?
Innovation is crucial in machine learning (ML) for several reasons:
- Improving efficiency: As data sizes grow, efficient algorithms become essential. Innovations can lead to faster training and inference, making ML applications more feasible in real-time scenarios or on devices with limited computational resources. Examples include real-time risk monitoring leveraging structured and unstructured data for faster and better risk insights. For example, providing an initial analysis of an operational error to a given firm’s reputational and other risks.
- Better predictive accuracy: Improved algorithms can lead to more accurate models with no hallucinations. This is especially important when high stakes are in trading, investment, and risk management decisions.
- Ethical and fair decision-making: As machine learning models become more integrated into decision-making processes, innovations are required to ensure they are transparent, unbiased, and ethical.
- Driving interdisciplinary research: Machine learning, being a versatile field, often intersects with other disciplines. Innovations in ML can lead to advancements in biology, physics, economics, and more.
- Hardware innovations: Bringing down the cost and increasing the speed of ML chipsets is important for expanding usage.
Some examples of outstanding innovation in machine learning in recent times include:
- Deep learning: This involves neural networks with many layers (known as deep neural networks). It has revolutionized fields like computer vision and natural language processing.
- Transformers and attention mechanisms: The transformer architecture, especially the BERT (Bidirectional Encoder Representations from Transformers) model and its variants, has drastically improved performance in a range of NLP tasks.
- Generative Adversarial Networks (GANs): GANs are used for generating data. They have been particularly influential in image generation, style transfer, and data augmentation.
- Transfer learning: Instead of training models from scratch, transfer learning allows models to leverage knowledge from previous tasks, saving computational resources and time.
- Reinforcement and supervised learning innovations: Techniques such as RLHF (Reinforcement learning from human feedback) can provide more accuracy, especially in areas that benefit from domain expertise and have high thresholds for accuracy, such as finance and healthcare.
- Federated learning: Allows for model training across multiple devices or servers while keeping data localized. This approach is beneficial for privacy concerns.
- Neural Architecture Search (NAS): Automates the process of finding the best neural network architecture for a given problem, potentially saving researchers significant time and resources.
- Explainable AI (XAI): As ML models, especially deep neural networks, are often seen as “black boxes,” there’s a growing emphasis on making their decisions interpretable and transparent. XAI is an emerging field focused on this goal.
- Self-supervised learning: A training paradigm where the learning algorithm generates its own supervisory signal, usually from unlabeled data. This can be particularly useful in scenarios where labeled data is scarce.
- Capsule networks: Proposed as an alternative to traditional convolutional neural networks, capsule networks aim to recognize patterns in data in a way that’s invariant to spatial hierarchies, potentially improving the model’s robustness and interpretability.
What is the process for dataset validation in machine learning? Why is this important?
Dataset validation in machine learning ensures that the data you are using to train and evaluate your model is of high quality, accurate, and relevant. Proper dataset validation can help ensure that your machine learning models generalize well to new, unseen data and are not merely overfitting to the peculiarities or noise in your dataset.
Here’s an outline of the dataset validation process:
Splitting the data: Splitting the dataset into training, validation, and test sets is crucial. Typically, you’ll train your model on the training set, tune the hyperparameters using the validation set, and then finally evaluate its performance on the test set.
The process includes the following
- Checking for data imbalances: If one class is underrepresented in your dataset, the model might not learn to recognize it well. Techniques such as resampling, synthetic data generation (like SMOTE), or using different evaluation metrics can be useful.
- Identifying and handling missing values: Determine whether data is missing at random or if there’s a pattern. You can impute missing values, remove rows/columns with missing data, or use algorithms that can handle missing values directly.
- Outlier detection: Outliers can skew the model’s training. Use techniques like IQR, Z-score, or visual tools like box plots to identify outliers.
- Ensuring data consistency: Check for duplicate rows, inconsistent data types, or contradictions in the data.
- Evaluating model on validation data: After training your model, evaluate its performance on the validation dataset. Use this feedback to adjust model hyperparameters, and architecture or to revisit feature engineering steps.
Why is it important?
- Avoid overfitting: A model can perform exceptionally well on the training data without proper dataset validation but fail on new, unseen data.
- Model generalization: Properly validated datasets ensure that models can generalize to a wide variety of similar inputs, not just the data they were trained on.
- Trustworthiness: Ensuring data quality builds trust in the model’s predictions.
- Efficient resource use: Validating datasets early can save time and computational resources by preventing training on low-quality or irrelevant data.
- Informed decision-making: Understanding the characteristics and quality of your dataset enables better decision-making throughout the machine learning pipeline, from feature engineering to model selection.
In summary, dataset validation is a fundamental step in the machine learning process. Ensuring data quality and relevance helps in building robust models that perform well in real-world scenarios.
How can machine learning models be leveraged for risk analysis? What implications will this have for the wider institution?
Machine learning (ML) has begun to play an increasingly important role in the financial sector, especially in the domain of risk analysis. Here’s a breakdown of its applications and the potential implications:
Applications of machine learning in financial risk analysis:
- Credit scoring: ML models can analyze a vast array of data, both traditional (like credit history) and non-traditional (like social media activity or online behavior), to predict the likelihood of a borrower defaulting on a loan more accurately than traditional models.
- Fraud detection: ML algorithms can identify patterns in transactional data that are indicative of fraudulent activity. By constantly learning from new transactions, these algorithms can evolve to recognize new fraud techniques.
- Market risk analysis: ML can be used to predict price movements and volatility by analyzing vast datasets, including market sentiment from news articles, social media, and other sources.
- Operational risk management: ML can help in predicting potential operational failures, whether they be from human error, system failures, or external events.
- Portfolio management: ML models analyze market conditions and adjust investment portfolios in real time, aiming to maximize returns and minimize risks.
- Algorithmic trading: High-frequency trading platforms use ML to analyze market data and execute trades at speeds impossible for humans.
- Liquidity risk: ML can forecast liquidity gaps under various scenarios by analyzing trends and patterns in cash flows and market conditions.
Implications for financial institutions:
- Efficiency and accuracy: ML can often process and analyze vast amounts of data more quickly and accurately than traditional models. This can lead to better risk assessments.
- Competitive advantage: Institutions that leverage ML effectively can gain an edge over competitors in terms of offering better loan rates, detecting fraud more accurately, or optimizing investment strategies.
- Adaptive learning: Unlike traditional models, ML models can improve over time as they’re exposed to more data, leading to ever-improving risk assessments.
- Regulatory scrutiny: The use of ML can lead to regulatory concerns. Regulators may question the transparency and fairness of ML models, especially if they rely on non-traditional data sources or are difficult to interpret.
- Ethical concerns: Relying on algorithms to make decisions about creditworthiness or investments can raise ethical issues, especially if these models inadvertently discriminate against certain groups of people or if they prioritize short-term gains over long-term stability.
- Data privacy: As ML often requires large amounts of data, this can raise concerns about data privacy, especially if sensitive or personal information is used.
- Skills and infrastructure: Financial institutions need to invest in new skills and infrastructure to develop, maintain, and interpret ML models. This can require significant upfront investment.
- Over-reliance on technology: While ML can be powerful, over-relying on it without human oversight can be risky. ML models, like all models, can be wrong, and human judgment can play a crucial role in mitigating these errors.
In conclusion, while ML offers numerous benefits for financial risk analysis, it’s essential for institutions to consider the broader implications carefully. They should ensure proper oversight, transparency, and ethics while harnessing the power of ML.
In your opinion, what role do you think machine learning will play in the future of financial services?
- Automation of routine tasks: Many routine tasks in financial services, such as data entry, report generation, and basic customer service, are increasingly being automated using ML, leading to operational efficiencies.
- Advanced fraud detection: ML models can identify complex patterns of behavior in transaction data that may indicate fraudulent activity. As fraudsters use more sophisticated techniques, ML can adapt quickly to counter these strategies.
- Personalized banking: ML can analyze customer data to provide highly personalized services, such as bespoke investment advice, tailored product recommendations, and individualized customer experiences.
- Risk management: ML will play a pivotal role in assessing various forms of risk, from credit risk (determining loan eligibility) to market risk (assessing investment portfolios’ vulnerability to market changes).
- Algorithmic trading: ML algorithms can predict market changes in real-time, leading to strategies that maximize profit and minimize loss in algorithmic trading.
- Credit assessment: Beyond traditional credit scoring, ML can utilize various unconventional data points, from social media activity to news, earnings calls and guidance, to provide a more holistic assessment of an individual’s or company’s creditworthiness.
- Robo-advisors: While they are already prevalent, ML-powered robo-advisors will become more sophisticated, providing customers with investment advice that rivals or even surpasses human financial advisors in accuracy.
- Regulatory compliance: Regulatory technology (RegTech) will leverage ML to automatically ensure that financial transactions, operations, and processes comply with ever-evolving regulatory standards, reducing the chance of breaches and penalties.
- Sentiment analysis: ML can parse vast amounts of data from news articles, social media, and other platforms to gauge market sentiment, which can then be used to predict market movements.
- Chatbots and virtual assistants: While already in use, the future will see more advanced versions that can understand and process complex customer queries, provide financial advice, and even execute transactions on behalf of the customer.
Challenges and Considerations:
- Transparency and interpretability: As financial decisions significantly impact people’s lives, it’s crucial to ensure that ML models’ decisions can be understood and explained.
- Data privacy: With ML requiring vast amounts of data, ensuring the privacy and security of personal information will be paramount.
- Regulation: As ML becomes more integrated into financial services, regulatory bodies will likely impose stricter standards and guidelines.
- Over-reliance: Over-relying on ML without human intervention can be risky, so a balanced approach will be essential.
In conclusion, ML promises to revolutionize the financial services industry, making operations more efficient, personalized, and secure. However, these advancements come with their challenges, which institutions will need to address to ensure trustworthiness and reliability.