The description mentioned open source ai definition doesnt require data to be open. But https://opensource.org/ai/open-source-ai-definition explicitly require the data to be open?
no it doesn’t. it only says the weights an information about the training data must be open, not the training data itself. which is honestly useless.
Data information == metadata?
(2) a listing of all publicly available training data and where to obtain it; and (3) a listing of all training data obtainable from third parties and where to obtain it, including for fee.
imo this also includes “the data must be open”. The data used for training must be obtainable.
that’s your interpeetation. what’s missing for me is “must be freely (as in not only by specific entities) obtainable”. with this wording i could just say: “this data is not obtainable” and be done with it.
When you mix being an ideologue with software will forever lose.