Details
Nothing to say, yet
Big christmas sale
Premium Access 35% OFF
Details
Nothing to say, yet
Comment
Nothing to say, yet
Data labelers are essential in enhancing the value of data sets and allowing advanced algorithms to learn. They were once seen as less important but are now recognized as co-creators of data sets. Data labelers organized themselves after their annotations were mass rejected. They exposed biases in guidelines and protested their low pay and exclusion from the research process. After their protest, data labelers gained more recognition and better pay, leading to improved job prospects and better data sets and algorithms. Listen to the interview with Crystal Kaufman for more details. you you you you you you In a world where data is like gold, the data labeler can be compared to the gold refiner, the jeweler, or the designer. The data labelers are the people who confer their knowledge and understanding of the world onto the data set, enhancing its value in ways that allow our most advanced algorithms to learn what to consider and how to operate. Just like gold, the data set in its pure raw form lying deep beneath the earth's surface can't be sold without the added value of the experts. Once relegated to the sidelines, data labelers have come to be seen as co-creators of data sets and models, codifying their experiences, values, and beliefs through the algorithm's mimicry of the labelers' annotations. Despite this prestige, data labelers continued to wear a copper necklace, an act of solidarity with one another, and a reminder of the days when annotation work was paid in cents. This was life after the Orange Run, a bold act of civil disobedience performed back in 2024. During the Orange Run, annotators organized themselves using a subreddit called RejectU. The subreddit was open to all annotators but was built specifically by and for those that had their annotations mass rejected on Mechanical Turk, and there were many. As the story goes, mass rejections meant annotation work that annotators had spent hours performing were not compensated because the people who hired them had decided, for whatever reason, that the annotations didn't deserve to be. In RejectU, the annotators assembled, coordinating how they would annotate across all of their data sets between September and November 2024. The annotators annotated to the letter of their instructions, exposing poor guidelines that they recognized as biasing the data set. Annotators were also encouraged to add tags that revealed the plethora of insights that the scientists commissioning their work had had. Tags that said that the data set itself was skewed, the categories they were provided were inadequate, and the scales they were given to annotate led to confusion and inconsistency. In addition to the mass rejections, labelers were protesting the subjugation of their roles in terms of salary, benefits, and their exclusion from the research and development process. Ever since the Orange Run, those commissioning data annotation work recognized the labelers as playing an indispensable role in the quest to refine their gold. This recognition translated into substantially more meaningful engagement with data labelers throughout the AI development lifecycle, enhancing not only their pay, which would be commensurate to that of other experts on the project, but the transition of data labelers into other high-performing roles, sorry, into other roles within the AI pipeline, increasing job prospects, human rights, and the quality of our data sets and algorithms. Does the story sound unlikely? Listen to our interview with Crystal Kaufman to hear more about how this might just be the world we're building.