webdevRefinery Forum: Help please! - webdevRefinery Forum

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Rate Topic: -----

User is offline Kirity 

  • Group: Members
  • Posts: 320
  • Joined: 12-April 10
  • LocationCookie Pot
  • Expertise:HTML,CSS

Posted 12 June 2012 - 01:37 PM (#1)

Help please!


unknown error
This is what my code does(should do):

You input any youtube comment page (after you click see all(comments) on a youtube video) e.g
http://www.youtube.com/all_comments?v=pGRPWJ5Ni1U

Then you input the number of comment pages, the number of pages at the bottom of the screen. e.g
3

It then grabs the HTML of the first page, selects all the lines with usernames in it, cuts off everything else apart from usernames. Then checks if there are any duplicates. Then it outputs all the usernames.

I ran the code and entered the number of pages = 1 so only names of the first page were read out, it worked fine. For number of pages = 2 it also worked.

But when number of pages = 3 it throws an error. Then outputs some names.
What is this error?
Plus it's checking pages 1 and 2 quite quickly about 6-8 seconds per page, but when checking up to page 3 it takes ages like 2-3 minutes.

import java.util.*;
import java.net.*;
import java.io.*;




public class Youtube {
   public static void main(String args[]){
      
      String video;
      String comment;
      int number = 0;
      InputStream is = null;
      String line;
      URL page;
      boolean check = false;
      int namepos1;
      int namepos2;
      int reply;
      char lessthan = '<';
      List<String> names = new ArrayList<String>();
      
      Scanner input = new Scanner(System.in);
      
      System.out.println("Please enter the 1st Youtube comment page URL:");
      video = input.nextLine();
      System.out.println("Please enter the total number of comments:");
      double comments = input.nextInt();
      
      int pages = (int) Math.ceil( comments /  500d );
      number++;      
      comment = video + "&page=" + number;
            
         try{
           while(number <= pages){
                number++;       
                page = new URL(comment);                
                is = page.openStream(); 
                BufferedReader d= new BufferedReader(new InputStreamReader(is));




                while ((line = d.readLine()) != null){                                  
                    if(check){
                   namepos1 = line.indexOf("yt-user-name ");
                    namepos2 = line.lastIndexOf(lessthan);
                    reply = line.indexOf("in reply to");
                          
                    if(namepos1 > 0 && namepos2 > 0 && reply < 0){
                       int back=line.lastIndexOf("<");
                       int front=line.indexOf("yt-user-name ") + 25;
                       String aaa=line.substring(front , back);
                       if(!names.contains(aaa)) names.add(aaa);
                       System.out.println("Found Unique Username");                    
                    }else{}
                    
                    check = false;                                       
                    }                    
                    if(line.indexOf("author ") != -1) check = true;
                  }
               comment = video + "&page=" + number;
             }   
             
             System.out.println("List Of Unique UserNames:");
            for(String name: names){
               System.out.println(name);
            }
         }
             
             catch (MalformedURLException mue) {
                mue.printStackTrace();
            } catch (IOException ioe) {
                 ioe.printStackTrace();
            } 
               
    }
                  
}



Error
java.io.IOException: Premature EOF
	at sun.net.www.http.ChunkedInputStream.readAheadBlocking(Unknown Source)
	at sun.net.www.http.ChunkedInputStream.readAhead(Unknown Source)
	at sun.net.www.http.ChunkedInputStream.read(Unknown Source)
	at java.io.FilterInputStream.read(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
	at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
	at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
	at sun.nio.cs.StreamDecoder.read(Unknown Source)
	at java.io.InputStreamReader.read(Unknown Source)
	at java.io.BufferedReader.fill(Unknown Source)
	at java.io.BufferedReader.readLine(Unknown Source)
	at java.io.BufferedReader.readLine(Unknown Source)
	at Youtube.main(Youtube.java:40)


My coding may be bad, im new to Java xD.
Accomplishments:
Counting to infinity, twice.


GTA SA: SAMP | Minecraft | Team Fortress 2 | PS3
0


User is offline TheEmpty 

  • I say words in sequences.
  • Group: Members
  • Posts: 5154
  • Joined: 02-October 10
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Ruby on Rails,SQL

Posted 12 June 2012 - 03:17 PM (#2)

You're getting a chunked response and not handling it correctly. I would look into how to handle chunked http requests in java if you continue on this road, but I think it foolish for you to not use the YouTube API.
Reserved.
0


User is offline Kirity 

  • Group: Members
  • Posts: 320
  • Joined: 12-April 10
  • LocationCookie Pot
  • Expertise:HTML,CSS

Posted 12 June 2012 - 03:58 PM (#3)

Thanks for the info, I'm kinda new to Java so i was using this for educational purposes, I'd say I'm naive not foolish ;) .
Also this may seem kinda Stupid but i don't know what an API is or how to use one. *googling...*
Okay so its a set of pre-written classes?
So how do I install the youtube API ?
And how do i use the stuff?

Im so confused.

My code works but if only i could fix this error, i think its not having enough time to process it before the new line is read in? I read that could be a problem from somewhere
Accomplishments:
Counting to infinity, twice.


GTA SA: SAMP | Minecraft | Team Fortress 2 | PS3
0


User is offline NeilHanlon 

  • Group: Members
  • Posts: 884
  • Joined: 08-July 10
  • LocationRowley, Massachusetts
  • Expertise:HTML,CSS,PHP,Java,Graphics

Posted 12 June 2012 - 04:16 PM (#4)

As ThatRailsGuy said, read up on handling chunked data in Java.

http://en.wikipedia....ansfer_encoding
Thanks,
兄ニール

Website | Blog | @NeilHanlon | About.Me | Facebook | LinkedIn
0


User is offline Kirity 

  • Group: Members
  • Posts: 320
  • Joined: 12-April 10
  • LocationCookie Pot
  • Expertise:HTML,CSS

Posted 12 June 2012 - 04:25 PM (#5)

Sorry but that didn't help one bit Neil :/ I read all of it but nothing pointed to how i can fix my problem, it just said how chunked transfer works
Accomplishments:
Counting to infinity, twice.


GTA SA: SAMP | Minecraft | Team Fortress 2 | PS3
0


User is offline TheEmpty 

  • I say words in sequences.
  • Group: Members
  • Posts: 5154
  • Joined: 02-October 10
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Ruby on Rails,SQL

Posted 12 June 2012 - 04:26 PM (#6)

View PostKirity, on 12 June 2012 - 04:25 PM, said:

Sorry but that didn't help one bit Neil :/ I read all of it but nothing pointed to how i can fix my problem, it just said how chunked transfer works

Yes you need to understand the problem in order to fix it. We don't want to just give you a copy and paste solution, but an understanding of the problem and the steps to come op with the solution.
Reserved.
0


User is offline ianonavy 

  • Group: Members
  • Posts: 685
  • Joined: 14-April 10
  • Expertise:HTML,CSS,Java,Javascript,Python

Posted 12 June 2012 - 04:27 PM (#7)

Rather than parse each page, access the YouTube comments API at this URL:
https://gdata.youtub...DEO_ID/comments


where VIDEO_ID is the parameter in the URL after watch?v=

You should get an XML reply from Google telling you all of the comments, which you can then parse into whatever format you need.

See this page for more info: https://developers.g...trieve_comments
reputation += 1 if post.is_helpful else 0
0


User is offline TheEmpty 

  • I say words in sequences.
  • Group: Members
  • Posts: 5154
  • Joined: 02-October 10
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Ruby on Rails,SQL

Posted 12 June 2012 - 04:33 PM (#8)

View PostKirity, on 12 June 2012 - 03:58 PM, said:

Thanks for the info, I'm kinda new to Java so i was using this for educational purposes, I'd say I'm naive not foolish ;) .
Also this may seem kinda Stupid but i don't know what an API is or how to use one. *googling...*
Okay so its a set of pre-written classes?
So how do I install the youtube API ?
And how do i use the stuff?

Im so confused.

My code works but if only i could fix this error, i think its not having enough time to process it before the new line is read in? I read that could be a problem from somewhere

An API is a solution/way for programs to talk to each other. If you can understand different languages and frameworks, this is an app a friend made that uses the YouTube API to play similar artists, actually quite a popular application.
Reserved.
0


User is offline Kirity 

  • Group: Members
  • Posts: 320
  • Joined: 12-April 10
  • LocationCookie Pot
  • Expertise:HTML,CSS

Posted 12 June 2012 - 04:36 PM (#9)

View PostThatRailsGuy, on 12 June 2012 - 04:26 PM, said:

Yes you need to understand the problem in order to fix it. We don't want to just give you a copy and paste solution, but an understanding of the problem and the steps to come op with the solution.


Yeah i know, i don't want to be spoon-fed either. Its just my Java isn't up to par to even understand the error thrown out, how am i meant to understand what i'm meant to know before i know whats wrong?

So im not meant to be getting chunked responses? or is it okay if i can handle them properly?

Anyway I'll try this Youtube API like ian has posted. and see what happens.

Thanks guys
Accomplishments:
Counting to infinity, twice.


GTA SA: SAMP | Minecraft | Team Fortress 2 | PS3
0


User is offline TheEmpty 

  • I say words in sequences.
  • Group: Members
  • Posts: 5154
  • Joined: 02-October 10
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Ruby on Rails,SQL

Posted 12 June 2012 - 04:41 PM (#10)

All my websites send out chuncked responses, allows web browsers to render the site quicker. You need to be able to handle them if you want to create an pseduo-webbrowser. But I suggest instead of learning how to do that you learn how to use an API ;)
Reserved.
0


User is offline Kirity 

  • Group: Members
  • Posts: 320
  • Joined: 12-April 10
  • LocationCookie Pot
  • Expertise:HTML,CSS

Posted 12 June 2012 - 05:02 PM (#11)

This API isn't going to work unless there is away around this that i don't know.

The problem is:

The API is for video's, not really for the comments on the videos.

Sure you can get comments off the video page via its normal URL but there are many more comments not show on this page. These other comments are not read out in the XML from what i can see.

So to fix this i click see all comments and copy the URL ID from the end of that url and put it into the Comment API url thing, It works!
But now i go onto the second page of 'all' comments and the URL ID doesn't change it just adds &page=2 which the API doesn't understand, so now i cant get XML for the second page of comments.

Anyway around this?
Accomplishments:
Counting to infinity, twice.


GTA SA: SAMP | Minecraft | Team Fortress 2 | PS3
0


User is offline TheEmpty 

  • I say words in sequences.
  • Group: Members
  • Posts: 5154
  • Joined: 02-October 10
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Ruby on Rails,SQL

Posted 12 June 2012 - 05:12 PM (#12)

View PostKirity, on 12 June 2012 - 05:02 PM, said:

This API isn't going to work unless there is away around this that i don't know.

The problem is:

The API is for video's, not really for the comments on the videos.

Sure you can get comments off the video page via its normal URL but there are many more comments not show on this page. These other comments are not read out in the XML from what i can see.

So to fix this i click see all comments and copy the URL ID from the end of that url and put it into the Comment API url thing, It works!
But now i go onto the second page of 'all' comments and the URL ID doesn't change it just adds &page=2 which the API doesn't understand, so now i cant get XML for the second page of comments.

Anyway around this?

The API returns a certain amount of comments, there is then a field with a link to the next page or you change the parameters. On the documentation page there should be something about pagination.
Reserved.
0


User is offline NeilHanlon 

  • Group: Members
  • Posts: 884
  • Joined: 08-July 10
  • LocationRowley, Massachusetts
  • Expertise:HTML,CSS,PHP,Java,Graphics

Posted 12 June 2012 - 07:38 PM (#13)

What you want to do is feed the API call a link to the comments, and then you can modify that using GET requests. I give you this only because it's a little hidden in the API Doc.
Thanks,
兄ニール

Website | Blog | @NeilHanlon | About.Me | Facebook | LinkedIn
0


User is offline Kirity 

  • Group: Members
  • Posts: 320
  • Joined: 12-April 10
  • LocationCookie Pot
  • Expertise:HTML,CSS

Posted 14 June 2012 - 06:22 AM (#14)

Hey guys, I had some other people run my code and they tried up to 20000+ comment youtube videos and it worked...

Maybe this is to do with RAM issues?

Can people try the code and see if they get any errors and up to what limit of comments your computer can handle?
It would be nice to find out if this is just to do with RAM...
Accomplishments:
Counting to infinity, twice.


GTA SA: SAMP | Minecraft | Team Fortress 2 | PS3
0


User is offline TheEmpty 

  • I say words in sequences.
  • Group: Members
  • Posts: 5154
  • Joined: 02-October 10
  • Expertise:HTML,CSS,PHP,Java,Javascript,Python,Ruby on Rails,SQL

Posted 15 June 2012 - 10:25 AM (#15)

View PostKirity, on 14 June 2012 - 06:22 AM, said:

Hey guys, I had some other people run my code and they tried up to 20000+ comment youtube videos and it worked...

Maybe this is to do with RAM issues?

Can people try the code and see if they get any errors and up to what limit of comments your computer can handle?
It would be nice to find out if this is just to do with RAM...

Once again, if you understood what chunked responses were you would understand what is wrong and understand it has nothing to do with RAM. The only reason it worked for 20k comments is because your program must be slower than your internet connection.
Reserved.
0


Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Enter your sign in name and password


Sign in options
  Or sign in with these services