Polish AI on Steroids: How Pseudo-Experts Are Building a Bubble on Empty Promises
As someone who works with artificial intelligence daily, I look at the current AI experts with a mix of amusement and terror. We are witnessing a classic Dunning-Kruger effect, driven by the market's hunger for simple solutions. Access to powerful language models (LLMs) has given a false sense of competence to people who were yesterday's cryptocurrency gurus or life coaches. The problem is that when the dust settles, we will be left with non-functional systems, data leaks, and a leaky infrastructure built on 'magic prompts' rather than proper knowledge of system implementation. This is not just about individual careers—it's a threat to the entire economy, as investments flow to those who shout the loudest, not to those who build solid foundations.
Criticism of the Bielik Model
Take the Bielik model, for example, promoted as a great breakthrough in Polish AI. Created by the SpeakLeash Foundation, which boasts an open project based on volunteering and collaboration with ACK Cyfronet AGH, Bielik was supposed to be the answer to the dominance of English-language giants. The foundation, also known as Spichlerz, collected an impressive set of Polish data—over 1 terabyte of texts—and trained models from 1.5 to 11 billion parameters based on this.
Sounds promising, right? But when we look closer, several shortcomings come to light. The creators of Bielik boast about results in benchmarks that they created themselves, such as the Open PL LLM Leaderboard or Polish MT-Bench. The methodology of these tests is at least questionable: they compare their model with outdated versions of Llama, ignoring newer multilingual models that handle Polish much better in terms of general knowledge, cultural context understanding, or even generating coherent text.
For example, in legal tests, such as a simulation of the exam for membership in the National Chamber of Appeals, Bielik scores below 10/100, making cardinal errors in interpreting regulations, hallucinating non-existent articles, and creating inconsistent arguments. Another example: in mathematical tasks, the model confuses basic formulas, such as calculating the derivative of a quadratic function, giving results with incorrect signs or ignoring constants. This is not 'sovereign AI' for Poland—it's more of a local empty promise that masks a lack of deeper understanding of how to train models on low-quality data, avoiding biases or data contamination. Moreover, the foundation's benchmarks often rely on manual selection of examples, where difficult topics such as Polish history or specialized medical terminology are avoided, where Bielik performs poorly, for example, confusing disease symptoms with folk myths.
Let's face the truth: Bielik actually outperforms models such as Mixtral-8x7B (0.63) and Mistral-Nemo-12.2B (0.60) in benchmarks. But when we overlay these results with a timeline, the 'success' turns into evidence of technological stagnation.
Bielik-11B-v2.3-Instruct debuted in May 2025. Its 'great achievement' is surpassing Mixtral, which premiered in December 2023, and Mistral-Nemo from July 2024. We are talking about a model that took almost a year and a half to catch up with the architecture from the end of 2023 and almost a year to slightly (by 0.06 points) surpass the French Mistral. In an industry where progress is measured in weeks, chasing 'yesterday's technology' is not innovation—it's archaeology. Even the victory over the European EuroLLM-9B (0.48) from December 2024 is not impressive when we consider that Bielik is larger and trained half a year later!
This puts us in a brutal perspective: the 'national AI model' promoted in the media is, in reality, a technological latecomer. When Bielik barely surpasses year-old models, the global leaders are already light-years ahead. Let's look at the top of the table: Gemma-2-27b-Instruct (0.71) and Meta-Llama-3.1-70B (0.70) outclass the 'Polish' product. A 5 percentage point difference in the average is, in practice, a chasm in reasoning and ease of use.
The criticism is not that the model doesn't work—it works fine by 2024 standards. The criticism is about the false narrative of a 'breakthrough.' A product that was already outdated compared to global leaders on its release day is being promoted. It's like celebrating the production of a Polish smartphone in 2026 that is slightly faster than the iPhone 13.
In knowledge tests (MMLU), a score of 0.63 is decent, but does it justify building an 'AI gigafactory' and investing hundreds of millions of zlotys to reinvent the wheel? A business user is not guided by national sentiment but by efficiency—and here the choice is simple: either a free, powerful Llama or our 'national average,' which was already behind at its debut.
Investments in Bielik and Implementation in InPost
I mentioned the investment of hundreds of millions of zlotys. Rafał Brzoska, the founder of InPost, not only took the position of chairman of the Business Council Bielik.AI but also invested significant funds in it—we're talking about potential 100 million euros for an 'AI gigafactory.' In December 2025, InPost implemented Bielik on a trial basis in its mobile application, serving over 15 million users, as part of the 'Feed Bielik' campaign. Users were to 'feed' the model with data by asking questions and correcting answers, which sounds like crowdsourcing genius.
In reality, however, problems quickly came to light: the model generated inconsistent, incorrect answers, confused historical facts with the Polish context (for example, mixing the dates of the Warsaw Uprising with other events), and users complained about slow performance and lack of precision. Comments on forums and social media pointed out that Bielik performed poorly compared to free tools like ChatGPT, which handled Polish better despite the lack of dedicated training. One user described how the model provided a recipe for bigos with pineapple, claiming it was a 'traditional Polish variation,' which sparked a wave of memes and criticism. Another example: in logistics queries, Bielik suggested delivery routes that ignored real roads in Poland, which could lead to real problems in the InPost application.
This is a classic example of how hype drives investments but ignores real usability tests. Instead of building a solid infrastructure based on gradual validations of a given solution, something that in practice serves more for marketing than solving problems is promoted. Brzoska, known for his successes in e-commerce, seems to be a victim of his own enthusiasm here, investing in a project without independent audits, which only strengthens the bubble around Polish AI.
Pseudo-Experts in Prompting
This problem of pseudo-expert culture spreads wider to Polish 'AI experts.' Take the prompting specialists—those who consider themselves masters because they can formulate a query to ChatGPT to spit out ready-made text or code. Poland is teeming with 'prompt engineering' courses where people learn how to 'charm' AI as if it were magic, not statistics and algorithms. These experts often don't understand what's under the hood: they don't know about overfitting, biases in training data, or context window limitations.
For example, a popular trend is using prompts to generate marketing content—sounds simple, but it ends with plagiarism, fact hallucinations, or even copyright violations because the model 'borrows' from the internet without citation. I've seen cases where companies like advertising agencies in Warsaw implemented AI-based campaigns and later had to withdraw materials because they contained copied fragments from competitors. Another example: in education, prompts used to create school tests led to factual errors, such as confusing historical dates or chemical formulas, which teachers reported on industry forums. The result? Companies implement AI without security audits, leading to data leaks—such as when prompts contained sensitive customer information, and the model remembered and reproduced it in other answers (no data anonymization layer was implemented).
Misuse of Coding Tools
It's even worse with the misuse of tools like Cursor, Claude Code, or ChatGPT for coding. In the Polish developer community, especially among juniors and 'AI evangelists,' there's a trend to copy code without understanding. Cursor, which auto-completes lines based on context, is great for prototypes, but when someone builds production systems with it, ignoring security holes like SQL injection or lack of input validation, disaster is imminent. I've seen cases where 'experts' generated entire applications using Claude Code, boasting on LinkedIn about 'fast development,' but the code was full of errors: unoptimized, with duplicate functions or even malicious suggestions if the prompt was poorly formulated.
For example, in one Polish AI startup, AI generated code for payment processing that didn't handle network errors, leading to lost transactions and customer complaints. Another example: in hackathons, teams using ChatGPT to write Python scripts ended up with code that only worked on their machines because it ignored system dependencies like library versions. This is not expertise—it's laziness masked by an illusion of competence. The Dunning-Kruger effect in its purest form: the less you know, the more confident you are that AI will do the rest. And when the system crashes under load or a hacker exploits a hole, the 'AI error' is to blame, not the lack of human knowledge.
The Case of Remigiusz Kinias
A good example of such a pseudo-expert is Remigiusz Kinias, known on Polish Twitter (now X) as @remekkinias. His activity is mainly limited to posting about AI, often sponsored by HP—he advertises laptops, gets free machines for model inference, like GPU servers from Hewlett Packard. It sounds like influencer marketing, but where's the deeper analysis? Kinias rarely shares specific technical insights: he has no publications on model architectures, doesn't analyze source codes, doesn't criticize implementation errors.
Instead, his feed is a mix of hype around AI novelties and product promotions. For example, in a series of posts from 2025, he praised 'revolutionary' AI GPUs but didn't explain how they affect model training or energy consumption—just a link to the sponsor's product. Another post: he recommended prompting courses without mentioning the risks, like model jailbreaking. This is a classic case of how the industry rewards loudness over substance—you get sponsorship because you have followers, but you don't bring real value to the discussion about AI in Poland.
The Case of Aleksandra Przegalińska
Aleksandra Przegalińska, often cited as an authority in the field of AI and ethics, doesn't fare much better. As a philosopher and futurologist, Przegalińska writes books like 'Strategizing AI in Business and Education' and 'Converging Minds,' where she discusses the impact of AI on society. But when we delve into her statements, a lack of technical depth emerges. For example, in interviews, she claims that AI is a 'marketing trick' because models are just pattern-matchers without real understanding—which is true but an oversimplification that ignores advances in multimodal models.
She criticizes the hype around AGI but simultaneously promotes AI as a 'great equalizer' in education, without specific proposals to solve problems like racial biases in data or the risk of disinformation. In one article, she doubts whether generative AI will lead to true AGI but doesn't provide substantive arguments based on metrics like scaling laws or emergent abilities. Moreover, Przegalińska often takes positions focusing solely on LLMs because they are currently the most popular—ignoring broader fields like computer vision, reinforcement learning, or edge AI. Last year, in 2025, she publicly stated that 'the year 2025 is the year of AI agents,' which turned out to be complete nonsense: AI agents, like those based on LangChain or Auto-GPT, did not become mainstream because they still struggle with stability, security, and efficiency issues.
Instead of a revolution, we had a series of failed implementations where agents 'escaped' beyond control or simply failed at simple tasks. This is an example of how pseudo-experts predict trends without a basis, relying on hype rather than data.
The Paradox of AI Ethics in Poland
This paradox deepens when these same experts call for a focus on AI ethics—'ethical AI' and 'AI governance.' It sounds noble, but how can we discuss ethics when Poland has no real achievements in the field of AI? We don't have our own competitive models on a global scale, there's a lack of investment in basic research, and our universities produce more AI marketing graduates than ML engineers.
Przegalińska and her ilk talk about 'governance' at conferences, citing EU regulations like the AI Act, but they ignore the fact that without a strong technological base, these discussions are empty words. For example, in discussion panels in 2025, experts debated biases in AI, but none provided a specific case from the Polish context—such as biases in Bielik's data, where the model favors content from large cities, ignoring regional dialects.
It's an interesting paradox: we focus on ethics before building something worth regulating. The result? Resources go to conferences and reports, not to laboratories or grants for young scientists. In Poland, where GDP spending on AI is a fraction of that in the USA or China, this is a recipe for stagnation—ethics without technology is philosophy, not progress.
Systemic Crisis in Polish AI
This problem is not just anecdotal—it's a systemic crisis. The Polish AI market rewards those who shout loudly about 'revolution' but marginalizes security specialists, such as those dealing with adversarial attacks or privacy-preserving ML.
The Dunning-Kruger effect thrives because low competence goes hand in hand with excessive confidence: someone who has used ChatGPT once feels like an expert, ignoring that these models are vulnerable to jailbreaking or poisoning. Examples? In 2025, several Polish startups fell victim to attacks because they implemented AI without audits—one of them, based on Bielik, exposed user data through poorly secured prompts.
Another example: a company in the HR sector used AI to analyze CVs, but the model discriminated against candidates based on their names, which came to light after complaints and lawsuits. This is not a coincidence: when pseudo-experts dominate, we risk a speculative bubble similar to the crypto hype, where promises exceed reality.
Conclusion and Proposed Solutions
To draw attention to this problem, we need to promote education: courses not only on prompts but also on the basics of ML, data ethics, and model validation. Companies like InPost should invest not only in hype but also in rigorous testing—for example, comparing Bielik with global benchmarks like MixEval or the Berkeley Function-Calling Leaderboard, where it performs average.
Experts like Kinias or Przegalińska could contribute more by collaborating with engineers instead of solo promoting narratives without technical depth. After all, AI is a tool, not magic—and ignorance rewarded with loudness is a recipe for disaster. If we don't change course, Polish AI will remain behind, built on the sand of illusory competence, while China, the USA, and others like France race ahead with real innovations.
ABOUT THE AUTHOR

Piotr Bednarski
Editor-in-Chief
Professionally, he works in R&D with a focus on artificial intelligence and systems security. His AI analyses were recognized by Dr. Andriy Burkov, author of global AI/ML bestsellers. Through Bug Bounty programs, he disclosed critical security vulnerabilities in Intel and AMD systems. He is cited by Zaufana Trzecia Strona and international industry media. He completed the Hebrew University of Jerusalem program in computer architecture and operating systems design and has participated in numerous hackathons. As editor-in-chief of Agitka, he translates technical jargon into public debate, analyzing how digital capital shapes contemporary society.