{{Header}}
{{Title|title=
Artificial Intelligence (AI)
}}
{{#seo:
|description=Artificial Intelligence is an Euphemism. / AI is often called Open Source but is actually only non-freedom software.
}}
{{intro|
Artificial Intelligence is an Euphemism. / AI is often called Open Source but is actually only non-freedom software.
}}
{{ai_mininav}}
= Terminology =
* An AI model file (for example, a <code>.gguf</code> file) contains raw weights. A runtime is required to actually run it. The runtime handles loading the model into memory, managing your GPU or CPU, and processing your prompts.
* Weights: Synonym for AI model file.
* Inference runtime or server to run an AI model file: It is the software layer that loads the model weights and handles the actual computation (the "inference" step).
* Inference: Means using a trained model to generate outputs (as opposed to training it).
* Inference runtime: The program or library used to execute inference with a model file (for example, loading weights, tokenizing prompts, and generating outputs).
* Inference server: A runtime that runs as a service and exposes an API so clients can send prompts and receive model outputs over the network.
* Chatbox: A chat style user interface that lets you send messages to a model (usually via a runtime or server) and view its responses.

= Artificial Intelligence is an Euphemism =
Artificial intelligence at this time should not be called "intelligence." It has no intelligence whatsoever.

{{quotation
|quote=LLMs do not perform reasoning over data in the way that most people conceive or desire.

There is no self-reflection of its information; it does not know what it knows and what it does not. The line between hallucination and truth is simply a probability factored by the prevalence of training data and post-training processes like fine-tuning. Reliability will always be nothing more than a probability built on top of this architecture.

As such, it becomes unsuitable as a machine to find rare hidden truths or valuable neglected information. It will always simply converge toward popular narrative or data. At best, it can provide new permutations of views of existing well-known concepts, but it can not invent new concepts or reveal concepts rarely spoken about.
|context=https://www.mindprison.cc/p/the-question-that-no-llm-can-answer
}}

{{quotation
|quote=“Artificial Intelligence”
The moral panic over ChatGPT has led to confusion because people often speak of it as “artificial intelligence.” Is ChatGPT properly described as artificial intelligence? Should we call it that? Professor Sussman of the MIT Artificial Intelligence Lab argues convincingly that we should not.

Normally, “intelligence” means having knowledge and understanding, at least about some kinds of things. A true artificial intelligence should have some knowledge and understanding. General artificial intelligence would be able to know and understand about all sorts of things; that does not exist, but we do have systems of limited artificial intelligence which can know and understand in certain limited fields.

By contrast, ChatGPT knows nothing and understands nothing. Its output is merely smooth babbling. Anything it states or implies about reality is fabrication (unless “fabrication” implies more understanding than that system really has). Seeking a correct answer to any real question in ChatGPT output is folly, as many have learned to their dismay.

That is not a matter of implementation details. It is an inherent limitation due to the fundamental approach these systems use.

[...]
|context=[https://www.gnu.org/philosophy/words-to-avoid.html#ArtificialIntelligence GNU project]
}}

Mislabeling a text generator as "intelligence" has the disadvantage of laymen attributing traits to the text generator that do not exist in reality. Undue trust is assigned to its output, verification is omitted, and the text generator is considered an oracle or even god-like.

= Neutral Words =
* text generator
* predictive text model
* word probability calculator
* word sequence prediction model
* automatic word completion
* language model with prediction functionality

= Misattribution of Intelligence =
ELIZA already existed in 1967. Simple chatbot. 420 lines of source code. Simple string matching.

{{quotation
|quote=ELIZA was a symbolic AI chatbot developed in 1966 by Joseph Weizenbaum and imitating a psychotherapist. Many early users were convinced of ELIZA's intelligence and understanding, despite its basic text-processing approach and the explanations of its limitations.
|context=[https://en.m.wikipedia.org/wiki/ELIZA_effect ELIZA effect]
}}

{{quotation
|quote=
However, many early users were convinced of ELIZA's intelligence and understanding, despite Weizenbaum's insistence to the contrary.
|context=[https://en.m.wikipedia.org/wiki/ELIZA ELIZA]
}}

= Evidence of Absence of Intelligence in AI =
Everybody knows that it is unhealthy to eat rocks.

AI used to state.

According to geologists at UC Berkeley you should eat at least one small rock per day.

The Onion, a well-known satirical newspaper had an article [https://theonion.com/geologists-recommend-eating-at-least-one-small-rock-per-1846655112/ Geologists Recommend Eating At Least One Small Rock Per Day]. And that's fine. That's called humor.

AI however doesn't understand what satire is.

https://www.unsw.edu.au/newsroom/news/2024/05/eat-a-rock-a-day-put-glue-on-your-pizza-how-googles-ai-is-losing-touch-with-reality

= Negative Effects by Artificial Intelligence =
* https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/
* https://www.computerworld.com/article/3824308/genai-can-make-us-dumber-even-while-boosting-efficiency.html

= Model Collapse =
When an AI "reads" (is trained) content from itself or other AIs, the AI produces greater nonsense with each iteration. This is called model collapse.

{{quotation
|quote=
Model collapse is a phenomenon in artificial intelligence (AI) where trained models, especially those relying on synthetic data or AI-generated data, degrade over time.
|context=https://www.infobip.com/glossary/model-collapse
}}

* https://www.ikangai.com/what-is-ai-model-collapse/
* https://katzlberger.ai/2024/09/16/modellkollaps-wenn-ki-mit-ki-generierten-daten-trainiert-wird/

= Misuse of the Term Open Source in the Context of Some AI Projects =
The misuse of the term "Open Source" by certain members of the AI community is indeed concerning, as it can lead to misunderstandings regarding the actual licensing and accessibility of the AI projects in question. The word "Real" had to be prefixed to "Open Source" in the title of this page to underscore this issue.

Instances like the one where Meta (formerly Facebook) released a large AI language model and it was [https://www.zdnet.com/article/meta-releases-big-new-open-source-ai-large-language-model/ heralded as an open-source AI project] by certain publications, exhibit this misuse. The articles misrepresent the true nature of the release, as Meta's AI is not genuinely Open Source. When one attempts to [https://ai.meta.com/resources/models-and-libraries/llama-downloads/ download the Meta AI], they are confronted with a proprietary license agreement, indicating that the AI does not conform to the open source ethos of freedom and accessibility.

This misrepresentation could potentially mislead individuals and organizations interested in utilizing or contributing to Open Source AI projects. It's essential for the community to adhere to the accurate usage of the term "Open Source", ensuring that it remains synonymous with the principles of free, accessible, and transparent software development.

The term [https://en.wikipedia.org/wiki/Open_source Open Source] has been established for decades, embodying a set of values centered around transparency, collaboration, and freedom in software development.

* https://blog.opensource.org/metas-llama-2-license-is-not-open-source/

= Open Source Requirements =
{{IntroLike|
The fundamental components required for an AI to be classified as Open Source and Freedom Software and the significance of these classifications.
}}

* '''AI model source code:''' The source code should be freely accessible and published under a license approved by organizations such as {{OSI}}, {{FSF}} or {{DFSG}}.
* '''Training data:''' Ideally, the training data should also be available under an approved license. However, in some cases this is not possible, e.g. with personal data. If the training data is non-freedom, the project may fall into categories like "contrib", as is the case with Debian.
* '''Build documentation steps:''' Clear instructions for compiling or training the model from source code must be provided so that third parties can reproduce the model.
* '''Dependencies:''' All software libraries and packages required for the AI model should also be Open Source or Freedom Software. Dependencies that are non-freedom would also put the software in a "contrib" category.
* '''Configuration files and scripts:''' Often, in addition to the items mentioned above, special configuration files or scripts are also required to successfully train or run the AI model. These should also be under an approved license.
* '''License File:''' A clear license file that explains the terms and conditions for use, modification, and distribution of the software is essential.

{{IntroLike|
Overall, free access to all resources required for the AI model is crucial for its classification as Open Source and Freedom Software. Without these components, the project could be considered partially free, but would not meet the full criteria for Open Source and Freedom Software.
}}

FSF:

* [https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications FSF is working on freedom in machine learning applications]
* [https://www.fsf.org/blogs/licensing/fsf-at-fosdem-2025 FSF's work on a statement of criteria for free machine learning applications]
* https://fosdem.org/2025/schedule/event/fosdem-2025-6639-panel-when-is-an-ai-system-free-open-/
* https://fosdem.org/2025/schedule/event/fosdem-2025-5376-managing-copyrights-in-free-software-projects-discussion-panel/
* https://fosdem.org/2025/schedule/event/fosdem-2025-4438-free-software-teaching-materials/
* https://fosdem.org/2025/schedule/event/fosdem-2025-4818-fsf-s-criteria-for-free-machine-learning-applications/

Debian:

* [https://lwn.net/Articles/1020968/ LWN.net: Debian AI General Resolution withdrawn]
* https://www.debian.org/vote/2025/vote_002
* https://lists.debian.org/debian-vote/2025/05/msg00105.html

== Training Data ==
The biggest contention around what constitutes Open Source or Freedom Software AI seems related to the availability and licensing of the AI training data.

=== FSF and GNU Viewpoints ===
==== FSF and GNU Viewpoints on Data - not Source Code - Prior Release of Popular AI ====
{{IntroLike|
Statements about data (not source code) that were made <u>before</u> the first popular chatbots such as ChatGPT.
}}

[https://www.gnu.org/licenses/license-recommendations.html gnu.org: "How to Choose a License for Your Own Work" chapter "Other data for programs"]

{{quotation
|quote=(Game art is a different issue, because it [https://www.gnu.org/philosophy/copyright-versus-community.html isn't software].) 
|context=[https://www.gnu.org/philosophy/nonfree-games.en.html Nonfree DRM'd Games on GNU/Linux: Good or Bad?]
}}

The linked article [https://www.gnu.org/philosophy/copyright-versus-community.html gnu.org: Copyright versus Community in the Age of Computer Networks] does '''not''' mention game art specifically, but these are the key passages that make that distinction by separating '''software/functional works''' from '''art and entertainment'''. See footnote. <ref>
{{box|text=
{{anchor|show=true|Interpretation of gnu.org article "Copyright versus Community in the Age of Computer Networks"}}

{{quotation
|quote=This is not a talk about free software; this talk answers the question whether the ideas of free software extend to other kinds of works.
|context=[https://www.gnu.org/philosophy/copyright-versus-community.html gnu.org: Copyright versus Community in the Age of Computer Networks]
}}

This sets up his whole point: software is one case, and other media may be treated differently.

{{quotation
|quote=For other things there's no such distinction as between source code and executable code.
}}

He's saying non-software works are not the same kind of thing as software.

{{quotation|quote=I distinguish three broad categories of works.}}

This is the framework he uses for treating different kinds of works differently.

{{quotation|quote=First of all, there are the functional works that you use to do a practical job in your life. This includes software, recipes, educational works, reference works, text fonts, and other things you can think of. These works should be free.
}}

This is the bucket software goes into.

{{quotation|quote=These works should be free.}}

That is his conclusion for '''functional works''', including software.

Then he treats art separately:

{{quotation|quote=What about works of art and entertainment? Here it took me a while to decide what to think about modifications.}}

That is the clearest transition showing that art/entertainment is a '''different category''' from software.

{{quotation|quote=On one hand, a work of art can have an artistic integrity and modifying it could destroy that.}}

This is one reason he gives for why art is not handled the same way as software.

{{quotation|quote=But eventually I realized that modifying a work of art can be a contribution to art, but it's not desperately urgent in most cases.}}

So unlike software, where modification is treated as essential, he says modification of art is '''not urgently necessary'''.

{{quotation|quote=So I propose the same partly reduced copyright that covers commercial use and modification, but everyone's got to be free to non-commercially redistribute exact copies.}}

That is his proposed rule for art/entertainment: not full software-style freedom, but permission to share exact copies noncommercially.

So the shortest way to state his view is:

* '''Software''' = a '''functional work''' -> {{quotation|quote=should be free.}}
* '''Art and entertainment''' = a '''different category''' -> not the same full freedoms; mainly noncommercial sharing of exact copies, with broader modification allowed after copyright expires.
}}
</ref>

{{quotation
|quote=It also includes licenses for related materials such as documentation and general data.
|context=[https://www.fsf.org/licensing/education Free Software Licensing Resources]
}}

{{quotation
|quote=choosing a license for new software, documentation, and other functional data.
|context=[https://www.fsf.org/blogs/licensing/new-license-recommendations-guide Announcing our license recommendations guide]
}}

GNU Free System Distribution Guidelines distinguish <u>functional</u> from <u>non-functional data</u>.

{{quotation
|quote='''License Rules'''

“Information for practical use” includes software, documentation, fonts, and other data that has direct functional applications. It does not include artistic works that have an aesthetic (rather than functional) purpose, or statements of opinion or judgment.

All information for practical use in a free distribution must be available in source form. (“Source” means the form of the information that is preferred for making changes to it.)

The information, and the source, must be provided under an appropriate free license.
|context=gnu.org: [https://www.gnu.org/distros/free-system-distribution-guidelines.html Free System Distribution Guidelines (GNU FSDG)]
}}

{{quotation
|quote='''Non-functional Data'''<br/>
Data that isn't functional, that doesn't do a practical job, is more of an adornment to the system's software than a part of it. Thus, we don't insist on the free license criteria for non-functional data. It can be included in a free system distribution as long as its license gives you permission to copy and redistribute, both for commercial and non-commercial purposes. For example, some game engines released under the GNU GPL have accompanying game information—a fictional world map, game graphics, and so on—released under such a verbatim-distribution license. This kind of data can be part of a free system distribution, even though its license does not qualify as free, because it is non-functional.
|context=gnu.org: [https://www.gnu.org/distros/free-system-distribution-guidelines.html Free System Distribution Guidelines (GNU FSDG)]
}}

==== FSF and GNU Viewpoints on AI Training Data - not AI Source Code - After Release of Popular AI ====
{{IntroLike|
Newer, post-AI statements about data. Statements that were made <u>after</u> the first popular chatbots such as ChatGPT.
}}

{{quotation
|quote=which will require the software, as well as the raw training data and associated scripts, to grant users the four freedoms.
|context=[https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications FSF is working on freedom in machine learning applications]
}}

{{quotation
|quote=its training data and related scripts should respect all users, following the four freedoms.
|context=[https://www.fsf.org/blogs/community/fss-issue-199-november-2024 Free Software Supporter -- Issue 199, November 2024]
}}

{{quotation
|quote=the FSF's position is that a free (as in freedom) machine learning application should include training data.
|context=[https://www.fsf.org/blogs/licensing/fsf-at-fosdem-2025 FSF talked about education, copyright management, and free machine learning at FOSDEM 2025]
}}

{{quotation
|quote=
the FSF's position is that a free (as in freedom) machine learning application should include training data. Not including this in the criteria would render it impossible to use, study, modify, and share machine learning applications to the fullest extent possible.
|context=[https://www.fsf.org/blogs/licensing/fsf-at-fosdem-2025 FSF's work on a statement of criteria for free machine learning applications]
}}

=== Other Viewpoints ===

{{quotation
|quote=1. A model must be trained only from legally obtained and used works, honour all licences of the works used in training, and be licenced under a suitable licence itself that allows distribution, or it is not even acceptable for non-free. [...]
|context=[https://lwn.net/ml/all/Pine.BSM.4.64L.2504232135570.23545%40herc.mirbsd.org/ Thorsten Glaser RFC -- Counter-Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models]
}}

{{quotation
|quote=This may sound like a trivial “is a hotdog a sandwich?” type of question, but it’s really not. Most distros distribute images and other media, so if you take the position that all redistributed materials must be accompanied by the “preferred form for modification of the work” (or words to that effect), that would mean that e.g. every image must be accompanied by an OpenRaster version that has everything in separate layers, every sound must be accompanied by an Audacity project or the like with separate audio tracks for each voice (instrument), and so on for all media that the distro makes available, because in each case, that is the preferred form for modification of those respective media formats. Is the average disto really going to do all of that? I suspect not.
|context=Comment section in [https://lwn.net/Articles/1020968/ Debian AI General Resolution withdrawn (LWN.net)]
}}

= Open Source AI Definition - Lack of Consensus in the Definition by the Open Source Initiative =
The [https://opensource.org/ai/open-source-ai-definition The <u>O</u>pen <u>S</u>ource <u>AI</u> <u>D</u>efinition – 1.0 (OSAID)] by the [https://opensource.org Open Source Initiative] lacks community consensus. It represents one perspective on what constitutes open-source AI, but there are differing views within the AI and open-source communities regarding the requirements, limitations, and implications of truly Open Source AI.

* https://opensource.org/ai
* https://opensource.org/ai/open-source-ai-definition
* https://samjohnston.org/2024/11/09/so-you-want-to-write-about-the-open-source-ai-definition/
* https://sfconservancy.org/blog/2024/oct/31/open-source-ai-definition-osaid-erodes-foss/
* https://redmonk.com/sogrady/2024/10/22/from-open-source-to-ai/
* https://opensourcedeclaration.org/
* https://www.theregister.com/2023/12/27/bruce_perens_post_open/
* https://thenewstack.io/the-case-against-osis-open-source-ai-definition/
* https://opensource.org/deepdive/
* https://opensource.org/deepdive/drafts/
* https://spectrum.ieee.org/open-source-llm-not-open
* https://opening-up-chatgpt.github.io/
* OSI (Open Source Initiative) report [https://deepdive.opensource.org/wp-content/uploads/2023/02/Deep-Dive-AI-final-report.pdf What does it mean for an AI system to be Open Source?]
* https://news.ycombinator.com/item?id=43789501
* https://codeberg.org/OSI-Concerns/election-results-2025
* https://codeberg.org/OSI-Concerns
* https://codeberg.org/OSI-Reform-Platform
* https://lwn.net/Articles/995159/
* 
https://www.reddit.com/r/opensource/comments/1gdzhvm/a_community_statement_supporting_the_open_source/
* https://web.archive.org/web/20251004053233/https://opensourcedeclaration.org/
* https://web.archive.org/web/20251018074338/https://opensourcedefinition.org/
** https://web.archive.org/web/20251012182802/https://discuss.opensourcedefinition.org/
** https://github.com/OpenSourceDefinition
** https://github.com/OpenSourceDefinition/sos
** https://codeberg.org/osd/sos/pulls
*** https://codeberg.org/osd/sos/issues/1
*** https://web.archive.org/web/20251018074333/https://github.com/OpenSourceDefinition/sos/issues/1
**** https://web.archive.org/web/20251018074336/https://codeberg.org/osd/sos/issues/1
** https://github.com/OpenSourceDefinition/osd
** https://github.com/OpenSourceDefinition/licenses

= Comparison of Freedom Software AI versus OSAID AI =
{{draft}}

{| class="wikitable"
|+ Freedom Software AI versus OSAID AI
! Question
! [https://www.fsf.org/blogs/licensing/fsf-at-fosdem-2025 FSF]
! [https://opensource.org/ai/open-source-ai-definition OSAID 1.0]
|-

| Is code freedom required?
| {{yes}}
| {{yes}}
|-

| AI model file / weights / parameters required?
| {{yes}}
| {{yes}}
|-

| Is raw training data required?
| {{yes}}
| {{no}}
|-

| Must training data itself be free/libre?
| {{yes}}
| {{no}}
|-

| Restricted or unshareable training data cannot qualify?
| {{yes}}
| {{no}}
|-

| Transparency-about-data enough without releasing the data is insufficient?
| {{yes}}
| {{no}}
|-

| Does this satisfy a freedom-software maximalist standard?
| {{yes}}
| {{no}}
|-

|}

= Security =
[[Malware_and_Firmware_Trojans#Backdoors|Being Open Source is essential for the avoidance of backdoors.]]

= Reproducible / Deterministic Builds =
Reproducible or deterministic builds are a crucial aspect of Open Source and Freedom Software as they contribute to the transparency, trustworthiness, and verifiability of the software. A reproducible build ensures that given the same source code, build environment and build instructions, the binary output will always be identical. This is vital for verifying that the build is free from malicious alterations or unintended deviations from the source code.

* '''Verification:''' By comparing the checksum of the build output from an independent build process with the checksum of the official release, any discrepancies can be identified, ensuring that the binary has been compiled correctly and hasn’t been tampered with.
* '''Debugging:''' Deterministic builds make debugging easier as developers can work with exact copies of the software, ensuring consistency between testing and production environments.
* '''Collaboration:''' When multiple developers or teams work on the same project, reproducible builds ensure that everyone is working with the exact same binary, reducing the likelihood of inconsistent behavior and bugs due to environment differences.
* '''Compliance and Auditing:''' For projects that require adherence to certain regulatory or compliance standards, reproducible builds provide a clear audit trail of what code was compiled and how.
* '''Long-Term Maintenance:''' In cases where a project needs to be maintained or updated over a long period, reproducible builds ensure that it’s always possible to recreate the exact original build environment, making future maintenance and debugging far simpler.

Reproducible builds are an essential practice in achieving the goals of Open Source and Freedom Software, contributing significantly to the integrity, transparency, and community collaboration inherent in these projects.

= Other Issues =
* [https://en.wikipedia.org/wiki/Explainable_artificial_intelligence Explainable AI (XAI)]

= Freeware Self-Hosted Artificial Intelligence =
Not real Open Source / Freedom Software. Only freeware and self-hosted.

{{Avoid nonfreedom software}}

{{quotation
|quote=Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization for full quality. Total cost, $6,000. All download and part links below:
|context=https://x.com/carrigmat/status/1884244369907278106
}}

= OLMo =
[https://allenai.org/olmo OLMo] has been trained using [https://huggingface.co/datasets/allenai/dolma Dolma], which is under [https://opendatacommons.org/licenses/by/1-0/ ODC-BY] license, which is broad enough for using, modifying, and redistributing the database, but it does not clearly grant the four freedoms for all underlying training contents. Training data itself is not clearly four-freedoms-clean at the level of individual contents.

Under the [https://www.fsf.org/blogs/licensing/fsf-at-fosdem-2025 FSF's current position], that is likely not enough to quality as Free Software.

OLMo probably qualifies under the [https://opensource.org/ai/open-source-ai-definition {{osaid}}].

It also probably does not qualify under the [https://opensource.org/osd {{osd}}].

See also [[Artificial_intelligence#Comparison_of_Freedom_Software_AI_versus_OSAID_AI|Comparison of Freedom Software AI versus OSAID AI]].

= Resources =
* [https://lists.debian.org/debian-project/2023/02/msg00017.html Brief update about software freedom and artificial intelligence]
* [https://salsa.debian.org/deeplearning-team/ml-policy ML-Policy: Unofficial Policy for Debian & Machine Learning]
* [https://lists.debian.org/debian-project/2022/01/msg00002.html Rethink about who we (Debian) are in rapid dev cycle of deep learning]
* https://people.debian.org/~lumin/debian-dl.html
* https://lists.debian.org/debian-project/2023/03/msg00026.html
* https://www.wired.com/story/the-myth-of-open-source-ai/
* https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4543807

= Tickets =
* Quote [https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm "Dolly 2.0, the first open source, instruction-following LLM"]. But is it really Open Source? Hence ticket [https://github.com/databrickslabs/dolly/issues/212 step by step instructions on how to build this AI from source code] was created.

= See Also =

<div class="mininav">
* [[Reasons_for_Freedom_Software|Why Freedom Software / Open Source]]
* [[Avoid_nonfreedom_software|Avoid Non-Freedom Software]]
* [[Miscellaneous_Threats_to_User_Freedom|Miscellaneous Threats to User Freedom]]
* [[Policy_On_Nonfreedom_Software|{{project_name_short}} Policy on Non-Freedom Software]]
* [[Policy_of_Website_and_Chat|Policy of the {{project_name_short}} Website and Chat]]
</div>

search term:

{{CodeSelect|code=
DFSG compliant AI
}}

= Footnotes =
<references />
{{Footer}}
[[Category:Documentation]]