Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review the brief and ask any questions on this project #1

Open
barrytarter opened this issue Oct 3, 2022 · 32 comments
Open

Review the brief and ask any questions on this project #1

barrytarter opened this issue Oct 3, 2022 · 32 comments
Assignees
Labels
Current Sprint question Further information is requested

Comments

@barrytarter
Copy link
Contributor

@hardcommitoneself if you have any technical questions, feel free to post them in this issue here as should allow us to document the development process better.

@hardcommitoneself
Copy link

hardcommitoneself commented Oct 4, 2022

I'd like to know more about the spreadsheet and roster table. Please send me sample spreadsheet file.

Schema::create('rosters', function (Blueprint $table) {
            $table->id();
            $table->string('university');
            $table->string('url');
            $table->string('sport');
            $table->timestamps();
        });

@hardcommitoneself
Copy link

Should we use TALL stack in our app?

@barrytarter
Copy link
Contributor Author

@hardcommitoneself

Here is what @edgrosvenor shared with me -- this will mainly be all back-end functionality so feel free to use whatever you prefer, e.g. browsershot, curl, even python is ok, etc. The early output might be CSVs of the profile data just to check it (e.g. name, position, year in school, etc).

If you are planning to do a front-end piece, TALL would be useful.

Does that make sense?

@hardcommitoneself
Copy link

Thanks for letting me know, @barrytarter .
It makes sense. So first I will scrap basic profile data(name, position, year etc) from the url provided from excel.
I am not sure if you did check slack message.
I mentioned that I will use Roach PHP to scrap data from the other sites.

@barrytarter
Copy link
Contributor Author

@hardcommitoneself great, yes, this is best place to reach both me and Ed!

@hardcommitoneself
Copy link

@barrytarter

I just finished import excel feature and now I am gonna build scrapper.
So, after import excel file, should our scrapper work automatically? or we need to handle it manually?(start scrapping button something like that)

@barrytarter
Copy link
Contributor Author

@hardcommitoneself For now, whatever is easiest to get a 'test' version live that successfully pulls and stores data. If @edgrosvenor has any tips, he'll share them here as well.

You'll need to create unique decision rules for pulling the roster data as some rosters are very similar and others are different, e.g.
these two are sites that use "Sidearm Sports" templates:
https://acusports.com/sports/womens-volleyball/roster
https://asugrizzlies.com/sports/mens-soccer/roster

These ones also use Sidearm sports, but a different template I think:
https://aupanthers.com/sports/mens-soccer/roster
https://bamastatesports.com/sports/womens-volleyball/roster
https://auwolves.com/sports/mens-soccer/roster

These are both from Presto Sports templates, but the templates look different:
https://goamcats.com/sports/msoc/2017-18/roster
https://www.sunyadktimberwolves.com/sports/msoc/2017-18/roster

@hardcommitoneself
Copy link

hardcommitoneself commented Oct 6, 2022

@barrytarter

In my opinion, how about checking the number of tr of all tables in each page?
So, as I noticed so far, it seems that there is only one table which have over many items(I think that is what we want).

@barrytarter
Copy link
Contributor Author

@hardcommitoneself I like that approach. We might need a way to decipher the type of content listed.

e.g. grade level (aka "graduation Year") values could be categorized by word, e.g. 'freshman', sophomore, junior, senior? I look forward to seeing how you figure it out!

@hardcommitoneself
Copy link

hardcommitoneself commented Oct 7, 2022

@barrytarter

I just noticed that some rosters have no tables(instead list). https://www.artuathletics.com/sports/womens-volleyball/roster
I think we need to build logic for the ul list.

@hardcommitoneself
Copy link

hardcommitoneself commented Oct 7, 2022

image
It is what I just reached out to now. I think it will be base of our scrapper. Please check it out and let me know feedback.

@hardcommitoneself
Copy link

image

Please take a look at this screenshot.
You can notice that the Year field. The filed's value is different with the others.
How can I convert the numbers(1, 3, etc) to real year value(Fr., Sr etc)?

@barrytarter
Copy link
Contributor Author

@hardcommitoneself here is one possible guide on how to map the data: https://docs.google.com/spreadsheets/d/1QBCGpvXjoDAH50wQTTnYLj5cWzb3TlXWWUPn-g3kk78/edit?usp=sharing.

Specifically for the numbers, it could map as 1 = Freshman, 2 = Sophomore, 3 = Junior; 4 = Senior; 5 = Senior; 6 = Senior.

@hardcommitoneself
Copy link

@barrytarter

https://www.loom.com/share/262f7d29525f45eba0caa4e8455a965d
Please check this video. And give me feedback.

@hardcommitoneself
Copy link

@barrytarter @edgrosvenor

Regarding the extra field of athlete table, should we add the follow fields to it?
image

@barrytarter
Copy link
Contributor Author

@hardcommitoneself ,

Thanks for sharing. Can we store both as text for now? The first is a height field and the second is where they played in high school. These are pretty common, so good to collect.

@barrytarter
Copy link
Contributor Author

@hardcommitoneself will you be able to begin developing the crawler that will find the missing Twitter and Instagram IDs?

Step 3 in https://docs.google.com/document/d/1YmfAFYu4Cyl99ninB4KAeML4y-nmRW0gzI6Xeydg_2g/edit?usp=drivesdk

Can you get a v1 of that part ready by Wednesday?

@edgrosvenor
Copy link
Contributor

@hardcommitoneself Go ahead and add any data that you think might be valuable as key / value pairs in the extra column. While you're at it, enable this package for that column: https://github.com/spatie/laravel-schemaless-attributes That will allow you to do things like $athlete->extra->set('height', '5\'9"');. I think maybe I've included the package in composer (maybe not), but I haven't added the trait to the model.

@hardcommitoneself
Copy link

hardcommitoneself commented Oct 19, 2022

@barrytarter @edgrosvenor

Regarding the second crawler, I think we can use opendorse.com to scrap our athlete's contact info.
The following is just my opinion.

  1. First, we need to search university by university name https://opendorse.com/searchshowAthletesNotOptedInToDeals=true&showUnclaimedAccounts=true&term=Abilene+Christian+University
  2. Then we need to go to relevant university page
    https://opendorse.com/abilenechristian-wildcats
  3. And we need to filter by sport
    https://opendorse.com/abilenechristian-wildcats?sports=Soccer
  4. That's it, we should find our athletes in the page.
    https://opendorse.com/profile/ellen-joss?from=abilenechristian-wildcats

That's it. I am not sure this approach is working for all rosters. So I just want to test with real links.

@hardcommitoneself
Copy link

@barrytarter @edgrosvenor

I wrote my suggestion below.
I think we'd better to use Google search engine by using name, sport, college for our contact crawler.
I checked manually with many athletes and it looked nice.

example search query -
google.com/search?q=twitter+Nicole+Barham+ACU+soccer
https://www.google.com/search?q=instagram+Nicole+Barham+ACU+soccer

Please take a look at it and give me your idea.

@barrytarter
Copy link
Contributor Author

Sure, we can test that out and see how the data looks.

@hardcommitoneself
Copy link

@barrytarter @edgrosvenor

Hi, Hope you are having nice weekend!

Please take a look at this video.
https://www.loom.com/share/96661444867a4df98f6fdef1756662e3
You can notice that this scrapper is working well.
Give me your feedback.

Sorry to bother you. :)

@hardcommitoneself
Copy link

@barrytarter @edgrosvenor

Please take a look at the following.
https://www.loom.com/share/0708dffa27714d9eb3f0ac3072bb77c7
I implemented 100% automation for scrapping twitter id for test.
I think this scrapper got almost twitter ids, so please check it manually. Then give me feedback.
I already implemented opendorse logic last week, so I need to implement instagram logic now.

@barrytarter
Copy link
Contributor Author

@hardcommitoneself could we add a method that would allow us to get this person's instagram and twitter?
Caleb Kendra at 0:57 you'll see his name in https://www.loom.com/share/0708dffa27714d9eb3f0ac3072bb77c7.
e.g. https://www.instagram.com/c_kendra2/

@hardcommitoneself
Copy link

@barrytarter

So, do you want to get full twiiter link of atheltes like https://www.instagram.com/c_kendra2/ ?

@barrytarter
Copy link
Contributor Author

@hardcommitoneself yes, we want the twitter, instagram, opendorse links for all athletes in the crawler.

@hardcommitoneself
Copy link

@barrytarter

OK, as we discussed before, we can not get many athlete's social links since most of them don't have it.
Anyway please take a look at the following.
image

@barrytarter
Copy link
Contributor Author

@hardcommitoneself yes, if it doesn't exist, we definitely can't store one.

Caleb Kendra does have one but we didn't store it -- how do we fix that?

@hardcommitoneself
Copy link

@barrytarter

I think we can store it. What's the problem?
image
This is the structure of athlete table.

@barrytarter
Copy link
Contributor Author

Great! Why didn't it store previously?

@hardcommitoneself
Copy link

@barrytarter

Please take a look at it. I implemented opendorse scrap method, so we can get not only opendorse link but also twitter or instagram link from there.
https://www.loom.com/share/2b18bd1d24a04f5bbdcd018221ab7a4a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Current Sprint question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants