数据+目标

yujia zhang

0 followers

00:00-02:31

Nothing to say, yet

Voice Overspeechmusicclickinginsidesmall room

AI Mastering

Transcription

Imagine you're a film investor and a tech enthusiast. You decide to create an app to predict whether a movie will be a box office hit. The app will only recommend movies that are predicted to be successful. To define a box office hit, you set a criterion of movies with box office revenue double the production cost and a net income of over $40 million. You collect data from platforms like IMDb and social media, including movie attributes like title, release date, cast, budget, and director popularity. However, you may need to clean and adjust the data to remove missing or erroneous values and correct extreme box office records. Additionally, you need to consider inflation by using 2024 as a benchmark year. Adjusted box office revenue is then compared to the criterion to determine if a movie is a box office hit. This process is called data annotation. 想象一下,你是个电影投资人,同时你又是个科技咖,哎,这真是个让人火大的假设啊由于工作的需要,你决定设计个应用来预测电影是否可以大卖那应用预测大卖的你才会去投资,不去大卖的你理都不理人家这就是基于经济学习落地的应用首先有个非常有意思的问题如何定义大卖因为大卖是一个相对模糊,依赖主观判断的词语在数据说话的地方,它不够清晰你需要重新定义大卖票房收入大于成本的两倍,且纯收入大于四千万美元的电影,就是大卖电影我简单定义了一下就是为了说明这个例子啊,现实生活中可能这个并不太恰当这个重新定义的行为称之为目标设定有了目标之后你开始收集数据你在豆瓣IMDB等数据库里面社交媒体里面收集到这些过往的电影数据数据的属性可能有名字上映时间主演预算以及导演知名度等等当然肯定有你最感兴趣的票房成绩这个动作就是数据收集可能有些数据呢其实还是不能直接使用的因为数据中可能会有一些缺失值错误值你需要做的是你要删除或者填补这些错误的数据可能还要修正一些异常高或者异常低的票房记录用来提高数据的质量避免模型受到无效或者是错误数据的影响啊即使你获得的数据是非常非常完美的但是仍然需要调整你想一下我们的目标是纯收入大于4000万美元假设电影拍摄在1950年那这个目标恐怕就很难达到了所以聪明的观众老爷应该已经想到了你需要把通胀还得考虑进去具体方式就是以2024年作为基准年根据电影不同的上映年份缩小或者是放大他的电影成绩刚刚提到这两种方式分别是数据清洗和数据调整那紧接着根据调整后的票房收入啊是否大于成本的两倍且纯收入大于4000万美元来标注电影是否为大卖电影这个动作就是数据标注

Listen Next

模型
yujia zhang
Voice Over
speechmusicnarration
+2
00:00-02:30

Other Creators

Minutes12
Nidhi
Voice Over
speechfemale speechwoman speaking
+2
00:00-00:45
Наталия
Марина Черевыщенко
1
1
Voice Over
speechdomestic animalspets
+2
00:00-01:30
sgt Marvin
Mazuu
1
1
Voice Over
speechclickingtyping
+2
00:00-00:48
tes 1 gamila
gamila arief
Voice Over
comedy
00:00-00:08
Slide 8
Jophn
2
2
Voice Over
speechfemale speechwoman speaking
+2
00:00-00:18
StyleSculpt 31
Stephen Tate
Voice Over
00:00-00:31
Ethan
Dustin Vermast
Voice Over
musicrappinghip hop music
+2
00:00-00:28
cds_xho_044_1
Sinethemba
Voice Over
00:00-00:48
alex
PASCOM Arquidiocese de Porto Velho
Voice Over
speechmale speechman speaking
+2
00:00-01:27
prova
Antonella P
1
1
Voice Over
speechclickinginside
+2
00:00-00:08