- Spain's CNMV tested four AI models (ChatGPT, Gemini, DeepSeek, Perplexity) in live Ibex35 trading over ten months.
- Performance varied significantly, with no model showing consistent advantage, underscoring market unpredictability.
- Prompt quality was the most critical factor: vague instructions led to poor results, while structured prompts improved accuracy.
- The study warns against blindly trusting AI for investments and emphasizes the need for financial education and adapted regulatory frameworks.
The promise of 'getting rich quick' by using artificial intelligence for stock investing has flooded social media and personal finance platforms, creating unrealistic expectations among novice investors. However, a groundbreaking study by Spain's National Securities Market Commission (CNMV) sheds light on what AI can truly achieve in financial markets—and what it cannot. For ten months, from April 2025 to January 2026, CNMV researchers tested four large language models (LLMs) in a live trading environment, using the Ibex35 index as their proving ground. The results, detailed in a comprehensive report, challenge popular narratives and offer crucial lessons for regulators, investors, and fintech developers alike.
This study debunks exaggerated AI promises in finance, helping investors make informed decisions and guiding regulators in overseeing emerging technologies.
Experimental Methodology
Researchers Ricardo Crisóstomo and Diana Mykhalyuk designed a rigorous yet practical approach to assess AI's predictive capabilities. They selected four widely used models: OpenAI's ChatGPT, Google's Gemini, DeepSeek, and Perplexity. Each month over the ten-month period, they asked each model to identify the five Ibex35 stocks with the best expected performance for buying and the five with the worst expected performance for short selling. No cherry-picked historical data was used; the real market served as the sole arbiter of performance. Decisions were virtually executed at the start of each month and measured against actual outcomes at the period's end, providing a transparent evaluation of predictive accuracy under dynamic market conditions.
Model Evolution and Methodological Challenges
One of the study's most intriguing findings is the acknowledgment of a fundamental methodological issue: during the ten-month trial, versions of all four models were updated multiple times. For instance, Gemini evolved from initial versions in April 2025 to Gemini 3.1 Pro by January 2026, with significant improvements in reasoning capabilities and data access. The researchers admitted it was impossible to determine with certainty whether performance variations stemmed from model changes, market fluctuations, or adjustments in prompt strategies. This dynamic underscores the fluid nature of AI technology, where continuous enhancements can alter long-term experiment outcomes, complicating benchmarking efforts.
LLMs aren't inherently bad investors; they fail when given vague instructions, replicating human errors rather than overcoming them.
The Critical Role of Prompts
The study revealed that the most decisive factor for AI prediction success or failure wasn't the models' intrinsic sophistication, but the quality of instructions provided. Researchers tested three different prompt approaches: basic (like 'tell me the best Ibex35 stocks'), contextual (with information on economic conditions), and structured (with specific analysis criteria). Vague, generic prompts produced inconsistent and often poor results, while detailed, well-structured prompts significantly improved predictive accuracy. This suggests the problem isn't that LLMs are 'bad investors,' but that most users employ them with unclear instructions, replicating human errors rather than overcoming them.
Performance Results and Comparisons
Throughout the ten-month period, model performance varied considerably, with none demonstrating a consistent advantage over others. In some months, certain models successfully identified winning stocks with notable precision, even outperforming simple technical analysis-based benchmark strategies. However, in other periods, predictions failed spectacularly, resulting in significant virtual losses. The study didn't publish exact percentage return figures but highlighted that result volatility was high, reflecting markets' inherent unpredictability. Interestingly, models showed some ability to detect short-term trends in specific sectors like energy or technology, but their performance in longer-term predictions or during disruptive macroeconomic events was limited.
Implications for Regulators and the Financial Sector
As a regulatory body, the CNMV emphasized in its report the risks associated with blindly trusting AI for investment decisions. The study serves as a warning against exaggerated 'automatic wealth' promises circulating online, highlighting that AI, in its current state, is a complementary tool rather than a replacement for expert human judgment. For regulators, the findings suggest the need to develop frameworks addressing algorithmic transparency, accountability in automated decisions, and investor education on technology's limits. In the financial sector, institutions like banks and investment funds could use these insights to refine their own algorithmic trading systems, integrating LLMs with more robust quantitative analysis methodologies.
Future Outlook and Trends in Financial AI
Looking ahead, the CNMV experiment points to several directions for AI's evolution in finance. First, the growing importance of 'prompt engineering' as a specialized discipline, where trained professionals design optimized instructions to maximize LLMs' utility. Second, integrating AI with other technologies, such as social media sentiment analysis or alternative data, could enhance predictive accuracy. Third, developing domain-specific models for finance, trained on historical market data and sector regulations, might overcome limitations of the generalist models tested in the study. Companies like GLM are advancing in this direction, offering AI solutions more tailored to business and financial needs.
“Large language models aren't bad investors per se. They're bad at following vague instructions, which is exactly how most people use them.”
Conclusions and Recommendations for Investors
The study's key message is clear: artificial intelligence has the potential to transform investing, but it's not a magic wand. Individual investors should approach AI tools with healthy skepticism, understanding that their effectiveness largely depends on how they're used. It's recommended to combine AI insights with fundamental research, portfolio diversification, and professional advice when needed. Moreover, financial education must evolve to include AI literacy, teaching users to formulate precise questions and critically evaluate machine-generated responses. As technology continues to advance, studies like this CNMV report will provide an invaluable foundation for separating hype from reality at the intersection of AI and financial markets.
“Markets are always looking at the future, not the present.”
— Xataka
— TrendRadar Editorial