Project: Wrangle and Analyze Data¶

by Pride Chamisa

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns
import os
import re
import requests 
import tweepy
from tweepy import OAuthHandler
import json
from timeit import default_timer as timer
%matplotlib inline

Data Gathering¶

Downloading the WeRateDogs Twitter archive data (twitter_archive_enhanced.csv)

df_twitter = pd.read_csv('twitter-archive-enhanced.csv')

Using the Requests library to download the tweet image prediction (image_predictions.tsv)

# Making the directory if it doesn't already exist
folder_name = 'image_predictions'
if not os.path.exists(folder_name):
    os.makedirs(folder_name)
    
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)

with open(os.path.join(folder_name, url.split('/')[-1]), mode = 'wb') as file:
    file.write(response.content)

df_predictions = pd.read_csv('image_predictions/image-predictions.tsv', sep='\t')

Using the Tweepy library to query additional data via the Twitter API (tweet_json.txt)

# Query Twitter API for each tweet in the Twitter archive and save JSON in a text file
# These are hidden to comply with Twitter's API terms and conditions
consumer_key = 'HIDDEN'
consumer_secret = 'HIDDEN'
access_token = 'HIDDEN'
access_secret = 'HIDDEN'

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth, wait_on_rate_limit=True)
tweet_ids = df_1.tweet_id.values
len(tweet_ids)

# Query Twitter's API for JSON data for each tweet ID in the Twitter archive
count = 0
fails_dict = {}
start = timer()
# Save each tweet's returned JSON as a new line in a .txt file
with open('tweet_json.txt', 'w') as outfile:
    # This loop will likely take 20-30 minutes to run because of Twitter's rate limit
    for tweet_id in tweet_ids:
        count += 1
        print(str(count) + ": " + str(tweet_id))
        try:
            tweet = api.get_status(tweet_id, tweet_mode='extended')
            print("Success")
            json.dump(tweet._json, outfile)
            outfile.write('\n')
        except tweepy.TweepError as e:
            print("Fail")
            fails_dict[tweet_id] = e
            pass
end = timer()
print(end - start)
print(fails_dict)

# Creating a list to hold the json data for all tweets
data = []

with open("tweet_json.txt", "r") as f:
    for line in f:
        data.append(json.loads(line))

# Creating another lists to hold the friends' and followers' counts 
followers_count = []
friends_count = []
for i in range(len(data)):
    followers_count.append(data[i]['user']['followers_count'])
    friends_count.append(data[i]['user']['friends_count'])

df_json = pd.DataFrame(data, columns={'id_str','retweet_count','favorite_count'})

# Adding friends' and followers' counts to the dataframe
df_json['followers_count'] = followers_count
df_json['friends_count'] = friends_count

Assessing Data¶

In this section, I will detect and document quality issues and tidiness issues using visual assessment and programmatic assessement to assess the data.

df_twitter

df_predictions

df_json

Dataset 1: Enhanced Twitter Archive

df_twitter.head()

df_twitter.name.value_counts()

None            745
a                55
Charlie          12
Cooper           11
Oliver           11
Lucy             11
Penny            10
Lola             10
Tucker           10
Bo                9
Winston           9
the               8
Sadie             8
Bailey            7
Daisy             7
an                7
Toby              7
Buddy             7
Leo               6
Bella             6
Rusty             6
Stanley           6
Jack              6
Jax               6
Dave              6
Scout             6
Oscar             6
Koda              6
Milo              6
Sunny             5
               ... 
Orion             1
Mya               1
Chuck             1
Teddy             1
Kuyu              1
Jed               1
Eleanor           1
Barclay           1
Robin             1
Pete              1
Tilly             1
Pip               1
Boots             1
Bonaparte         1
Mitch             1
Zoe               1
Beckham           1
Diogi             1
Staniel           1
Rizzo             1
Schnozz           1
Kayla             1
Bones             1
Zooey             1
Ralphé            1
Lili              1
Bluebert          1
unacceptable      1
Hazel             1
Craig             1
Name: name, Length: 957, dtype: int64

df_twitter.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), object(10)
memory usage: 313.0+ KB

df_twitter.nunique()

tweet_id                      2356
in_reply_to_status_id           77
in_reply_to_user_id             31
timestamp                     2356
source                           4
text                          2356
retweeted_status_id            181
retweeted_status_user_id        25
retweeted_status_timestamp     181
expanded_urls                 2218
rating_numerator                40
rating_denominator              18
name                           957
doggo                            2
floofer                          2
pupper                           2
puppo                            2
dtype: int64

sum(df_twitter.duplicated())

0

df_twitter.dtypes

tweet_id                        int64
in_reply_to_status_id         float64
in_reply_to_user_id           float64
timestamp                      object
source                         object
text                           object
retweeted_status_id           float64
retweeted_status_user_id      float64
retweeted_status_timestamp     object
expanded_urls                  object
rating_numerator                int64
rating_denominator              int64
name                           object
doggo                          object
floofer                        object
pupper                         object
puppo                          object
dtype: object

Dataset 2: Image Predictions

df_predictions.head()

df_predictions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id    2075 non-null int64
jpg_url     2075 non-null object
img_num     2075 non-null int64
p1          2075 non-null object
p1_conf     2075 non-null float64
p1_dog      2075 non-null bool
p2          2075 non-null object
p2_conf     2075 non-null float64
p2_dog      2075 non-null bool
p3          2075 non-null object
p3_conf     2075 non-null float64
p3_dog      2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB

df_predictions.dtypes

tweet_id      int64
jpg_url      object
img_num       int64
p1           object
p1_conf     float64
p1_dog         bool
p2           object
p2_conf     float64
p2_dog         bool
p3           object
p3_conf     float64
p3_dog         bool
dtype: object

sum(df_predictions.jpg_url.duplicated())

66

Dataset 3 : Twitter json

df_json.head()

df_json.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2327 entries, 0 to 2326
Data columns (total 5 columns):
id_str             2327 non-null object
favorite_count     2327 non-null int64
retweet_count      2327 non-null int64
followers_count    2327 non-null int64
friends_count      2327 non-null int64
dtypes: int64(4), object(1)
memory usage: 91.0+ KB

df_json.describe()

df_json.dtypes

id_str             object
favorite_count      int64
retweet_count       int64
followers_count     int64
friends_count       int64
dtype: object

df_json.nunique()

id_str             2327
favorite_count     1979
retweet_count      1652
followers_count      21
friends_count         1
dtype: int64

Quality issues¶

There are retweets, and the columns: in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id', 'retweeted_status_user_id','retweeted_status_timestamp' have plenty of empty values
timestamp values have a +0000 attached
Erroneous datatype in tweet_id and timestamp
Missing column for the fraction: rating_numerator / rating_denominator
Incorrect dog names listed as a, the and an
Source data has unwanted link tags
Dog names listed as None instead of NaN
The dog breed names are separated by a '_' instead of space
Duplicated urls in jpg_url
Naming convection of id_str not consistent with tweet_id

Tidiness issues¶

The columns doggo, puppo, pupper, floofer should be combined into a single column
All the 3 tables should be one dataset

Cleaning Data¶

In this section, I will be cleaning of the issues documented while assessing.

# Make copies of original pieces of data
df_twitter_clean = df_twitter.copy()
df_predictions_clean = df_predictions.copy()
df_json_clean = df_json.copy()

df_twitter_clean.sample(4)

There are retweets, and the columns: `in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id', 'retweeted_status_user_id','retweeted_status_timestamp'` have plenty of empty values¶

Define¶

Delete retweets by removing all entries for retweeted_status_user_id that are not null

Code¶

df_twitter_clean = df_twitter_clean[np.isnan(df_twitter_clean.retweeted_status_id)]

Test¶

# Confirming null entries in retweeted status user id
df_twitter_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2175 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2175 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2175 non-null object
source                        2175 non-null object
text                          2175 non-null object
retweeted_status_id           0 non-null float64
retweeted_status_user_id      0 non-null float64
retweeted_status_timestamp    0 non-null object
expanded_urls                 2117 non-null object
rating_numerator              2175 non-null int64
rating_denominator            2175 non-null int64
name                          2175 non-null object
doggo                         2175 non-null object
floofer                       2175 non-null object
pupper                        2175 non-null object
puppo                         2175 non-null object
dtypes: float64(4), int64(3), object(10)
memory usage: 305.9+ KB

Define:¶

Drop the multiple columns with plenty of empty values

Code¶

df_twitter_clean = df_twitter_clean.drop(['in_reply_to_status_id','in_reply_to_user_id','retweeted_status_id',
                                            'retweeted_status_user_id','retweeted_status_timestamp'],1)

Test¶

# Confirming the dropped columns are nolonger part of the dataframe
df_twitter_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2175 entries, 0 to 2355
Data columns (total 12 columns):
tweet_id              2175 non-null int64
timestamp             2175 non-null object
source                2175 non-null object
text                  2175 non-null object
expanded_urls         2117 non-null object
rating_numerator      2175 non-null int64
rating_denominator    2175 non-null int64
name                  2175 non-null object
doggo                 2175 non-null object
floofer               2175 non-null object
pupper                2175 non-null object
puppo                 2175 non-null object
dtypes: int64(3), object(9)
memory usage: 220.9+ KB

`timestamp` values have a +0000 attached¶

Define¶

Truncate the last 5 values of timestamp with string slicing

Code¶

# Slicing the last 5 values of timestamp
df_twitter_clean.timestamp = df_twitter_clean.timestamp.str[:-5]

Test¶

# Confirming trailing zeros are removed
df_twitter_clean.sample(2)

Erroneous datatype in `tweet_id` and `timestamp`¶

Define¶

Convert the datatype of tweet_id from int to string
Convert the datatype of timestamp from string to datetime

Code¶

df_twitter_clean.tweet_id = df_twitter_clean.tweet_id.astype(str)
df_predictions_clean.tweet_id = df_twitter_clean.tweet_id.astype(str)
df_twitter_clean['timestamp']= pd.to_datetime(df_twitter_clean['timestamp'])

Test¶

# Confirming the datatype changes
df_twitter_clean.dtypes

tweet_id                      object
timestamp             datetime64[ns]
source                        object
text                          object
expanded_urls                 object
rating_numerator               int64
rating_denominator             int64
name                          object
doggo                         object
floofer                       object
pupper                        object
puppo                         object
dtype: object

Missing column for the fraction: rating_numerator / rating_denominator¶

Define¶

Create a new column 'rating_fraction'

Code¶

df_twitter_clean['fraction'] = df_twitter_clean['rating_numerator'].astype(float)/df_twitter_clean['rating_denominator'].astype(float)

Test¶

df_twitter_clean['fraction'].sample(3)

121     1.2
1444    0.9
716     1.2
Name: fraction, dtype: float64

Incorrect dog names listed as `a`, `the` and `an`¶

Define¶

Replace all occurences of dog names that start with small letters

Code¶

# https://github.com/tharcyla/wrangle-and-analyze-data/blob/main/wrangle_act.ipynb

# Getting the index of all wrong names
err = df_twitter_clean[df_twitter_clean['name'].str.contains(r'^[a-z]')]['name'].index

# Changing them to NaN
df_twitter_clean.loc[err, 'name'] = np.nan

Test¶

# Confirming changes
df_twitter_clean['name'].value_counts()

None         680
Charlie       11
Lucy          11
Oliver        10
Cooper        10
Penny          9
Tucker         9
Sadie          8
Lola           8
Winston        8
Daisy          7
Toby           7
Bailey         6
Oscar          6
Stanley        6
Jax            6
Bo             6
Koda           6
Bella          6
Dave           5
Louis          5
Buddy          5
Bentley        5
Scout          5
Leo            5
Rusty          5
Milo           5
Chester        5
Jeffrey        4
Larry          4
            ... 
Carll          1
Gordon         1
Willem         1
Jeffri         1
Schnozz        1
Robin          1
Crawford       1
Diogi          1
Cora           1
Andru          1
Boston         1
Dobby          1
Harnold        1
Brockly        1
Sully          1
Jamesy         1
Trigger        1
Travis         1
Damon          1
Ashleigh       1
Pete           1
Alexander      1
Tilly          1
Pip            1
Boots          1
Bilbo          1
Mitch          1
Zoe            1
Beckham        1
Bruiser        1
Name: name, Length: 931, dtype: int64

Source data has unwanted link tags attached¶

Define¶

Remove the link tags from source data using regular expressions

Code¶

Ref: https://stackoverflow.com/questions/13682044/remove-unwanted-parts-from-strings-in-a-column?noredirect=1

df_twitter_clean.source = df_twitter_clean.source.str.extract('>([\w\W\s]*)<', expand=True)

Test¶

# Confirming the link tags are removed
df_twitter_clean.source.sample(3)

1226    Twitter for iPhone
2171    Twitter for iPhone
825     Twitter for iPhone
Name: source, dtype: object

Missing dog names listed as None instead of NaN¶

Define¶

Change all 'None' entries to NaN

Code¶

err = df_twitter_clean.query('name == "None"').index

# Changing them to NaN
df_twitter_clean.loc[err, 'name'] = np.nan

Test¶

df_twitter_clean['name'].sample(8)

2163       Billl
1124        Ziva
1046     Bentley
2351         NaN
676          NaN
1907         NaN
1129    Pupcasso
2214         NaN
Name: name, dtype: object

The dog breed names are separated by a '_' instead of space¶

Define¶

Replace the '_' in all entries with space and capitalise the first letter

Code¶

df_predictions_clean['p1'] = df_predictions_clean['p1'].str.replace('_', ' ').str.title()
df_predictions_clean['p2'] = df_predictions_clean['p2'].str.replace('_', ' ').str.title()
df_predictions_clean['p3'] = df_predictions_clean['p3'].str.replace('_', ' ').str.title()

Test¶

df_predictions_clean.sample(4)

Duplicated urls in jpg_url¶

Define¶

Dele the duplicates

Code¶

df_predictions_clean = df_predictions_clean.drop_duplicates(subset=['jpg_url'], keep='last')

Test¶

sum(df_predictions_clean['jpg_url'].duplicated())

0

Naming convection of `id_str` not consistent with `tweet_id`¶

Define¶

Change the column name for id_str to tweet_id

Code¶

df_json_clean.rename({'id_str':'tweet_id'}, axis=1, inplace=True)

Test¶

df_json_clean.columns

Index(['tweet_id', 'favorite_count', 'retweet_count', 'followers_count',
       'friends_count'],
      dtype='object')

The columns doggo, puppo, pupper, floofer should be combined into a single column¶

Define¶

Combine the columns into a new column 'dog_stage'

Code¶

# Creating a new 'dog_stage' column
df_twitter_clean['dog_stage'] = df_twitter_clean['text'].str.extract('(floofer|doggo|pupper|puppo)', expand=True)

#Changing its datatype
df_twitter_clean['dog_stage'] = df_twitter_clean['dog_stage'].astype('category')

# Deleting the previous columns
df_twitter_clean = df_twitter_clean.drop(['doggo', 'floofer', 'pupper', 'puppo'], 1)

Test¶

df_twitter_clean['dog_stage'].value_counts()

pupper     244
doggo       84
puppo       32
floofer      4
Name: dog_stage, dtype: int64

All the 3 tables should be joined to form one dataset¶

Define¶

Merge all the tables

Code¶

df_json_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2327 entries, 0 to 2326
Data columns (total 5 columns):
tweet_id           2327 non-null object
favorite_count     2327 non-null int64
retweet_count      2327 non-null int64
followers_count    2327 non-null int64
friends_count      2327 non-null int64
dtypes: int64(4), object(1)
memory usage: 91.0+ KB

df_all = pd.merge(df_twitter_clean, df_predictions_clean, how = 'inner', on = ['tweet_id'])

# merging with twitter json clean
df_all = pd.merge(df_all, df_json_clean, how = 'inner', on = ['tweet_id'])

Test¶

# Confirming the fully merged dataset
df_all.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1826 entries, 0 to 1825
Data columns (total 25 columns):
tweet_id              1826 non-null object
timestamp             1826 non-null datetime64[ns]
source                1826 non-null object
text                  1826 non-null object
expanded_urls         1771 non-null object
rating_numerator      1826 non-null int64
rating_denominator    1826 non-null int64
name                  1193 non-null object
fraction              1826 non-null float64
dog_stage             346 non-null category
jpg_url               1826 non-null object
img_num               1826 non-null int64
p1                    1826 non-null object
p1_conf               1826 non-null float64
p1_dog                1826 non-null bool
p2                    1826 non-null object
p2_conf               1826 non-null float64
p2_dog                1826 non-null bool
p3                    1826 non-null object
p3_conf               1826 non-null float64
p3_dog                1826 non-null bool
favorite_count        1826 non-null int64
retweet_count         1826 non-null int64
followers_count       1826 non-null int64
friends_count         1826 non-null int64
dtypes: bool(3), category(1), datetime64[ns](1), float64(4), int64(7), object(9)
memory usage: 321.2+ KB

Storing Data¶

Saving gathered, assessed, and cleaned master dataset to a CSV file named "twitter_archive_master.csv".

df_all.to_csv('twitter_archive_master.csv', index=False)

Analyzing and Visualizing Data¶

In this section, I will be analyzing and visualizing the wrangled data.

Insights:¶

# Importing the new stored master dataset
twitter_archive_master = pd.read_csv('twitter_archive_master.csv')

# Changing back datatypes
twitter_archive_master['timestamp']= pd.to_datetime(twitter_archive_master['timestamp'])
twitter_archive_master['source'] = twitter_archive_master['source'].astype('category')
twitter_archive_master['tweet_id'] = twitter_archive_master['tweet_id'].astype('str')

1. Most popular dog names

most_popular_name = twitter_archive_master.name.value_counts()[1:10].sort_values(ascending=False).plot(kind='barh')
plt.xlabel('Dog Names');
plt.ylabel('Number of Dogs');
plt.title('Most popular dog names', size=15);
plt.savefig('popular-dog-names');

From the bar graph, it is observed that Charlie and Lucy were the most popular dog names

2. The most liked dog stages

twitter_archive_master.groupby('dog_stage').favorite_count.mean().plot(kind='bar', title='Most liked dog stages');
plt.ylabel('Likes');

3. The most retweeted dog stages

twitter_archive_master.groupby('dog_stage').retweet_count.mean().plot(kind='bar', title='Most retweeted dog stages');
plt.ylabel('Retweets');

While puppo was the most liked dog stage, floofer was the most retweeted dog stage

4. The relationship between favourite_count and retweet_count

twitter_archive_master.plot(x='favorite_count', y='retweet_count', kind='scatter', title = 'relationship between favourite_count and retweet_count');

From the above graph, it is observed that favourite count is directly proportional to retweet count

5. The most common source of tweets

most_common_source = twitter_archive_master['source'].value_counts()[0:5]
most_common_source.plot(kind='pie', title='Most common sources', autopct='%.1f%%', figsize=(10,10));

The above pie-chart shows that 'Twitter for iPhone' was the most common source of tweets

Visualization¶

Visualizing the distribution of tweets over time

Ref: https://stackoverflow.com/questions/25146121/extracting-just-month-and-year-separately-from-pandas-datetime-column

twitter_archive_master['tweet_id'].groupby([twitter_archive_master['timestamp'].dt.month, twitter_archive_master['timestamp'].dt.month]).count().plot(kind='line');
plt.title('Distribution of tweets over time', size=13);
plt.xlabel('Time(Monthly Distribution)');
plt.ylabel('Tweets');

twitter_archive_master['timestamp'].dt.month.value_counts()

12    434
1     243
3     189
2     186
7     146
6     140
5     104
4     103
11     77
10     72
9      68
8      64
Name: timestamp, dtype: int64

This shows that most tweets were tweeted in December followed by November and January

	tweet_id	jpg_url	img_num	p1	p1_conf	p1_dog	p2	p2_conf	p2_dog	p3	p3_conf	p3_dog
0	666020888022790149	https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg	1	Welsh_springer_spaniel	0.465074	True	collie	0.156665	True	Shetland_sheepdog	0.061428	True
1	666029285002620928	https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg	1	redbone	0.506826	True	miniature_pinscher	0.074192	True	Rhodesian_ridgeback	0.072010	True
2	666033412701032449	https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg	1	German_shepherd	0.596461	True	malinois	0.138584	True	bloodhound	0.116197	True
3	666044226329800704	https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg	1	Rhodesian_ridgeback	0.408143	True	redbone	0.360687	True	miniature_pinscher	0.222752	True
4	666049248165822465	https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg	1	miniature_pinscher	0.560311	True	Rottweiler	0.243682	True	Doberman	0.154629	True
5	666050758794694657	https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg	1	Bernese_mountain_dog	0.651137	True	English_springer	0.263788	True	Greater_Swiss_Mountain_dog	0.016199	True
6	666051853826850816	https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg	1	box_turtle	0.933012	False	mud_turtle	0.045885	False	terrapin	0.017885	False
7	666055525042405380	https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg	1	chow	0.692517	True	Tibetan_mastiff	0.058279	True	fur_coat	0.054449	False
8	666057090499244032	https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg	1	shopping_cart	0.962465	False	shopping_basket	0.014594	False	golden_retriever	0.007959	True
9	666058600524156928	https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg	1	miniature_poodle	0.201493	True	komondor	0.192305	True	soft-coated_wheaten_terrier	0.082086	True
10	666063827256086533	https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg	1	golden_retriever	0.775930	True	Tibetan_mastiff	0.093718	True	Labrador_retriever	0.072427	True
11	666071193221509120	https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg	1	Gordon_setter	0.503672	True	Yorkshire_terrier	0.174201	True	Pekinese	0.109454	True
12	666073100786774016	https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg	1	Walker_hound	0.260857	True	English_foxhound	0.175382	True	Ibizan_hound	0.097471	True
13	666082916733198337	https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg	1	pug	0.489814	True	bull_mastiff	0.404722	True	French_bulldog	0.048960	True
14	666094000022159362	https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg	1	bloodhound	0.195217	True	German_shepherd	0.078260	True	malinois	0.075628	True
15	666099513787052032	https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg	1	Lhasa	0.582330	True	Shih-Tzu	0.166192	True	Dandie_Dinmont	0.089688	True
16	666102155909144576	https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg	1	English_setter	0.298617	True	Newfoundland	0.149842	True	borzoi	0.133649	True
17	666104133288665088	https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg	1	hen	0.965932	False	cock	0.033919	False	partridge	0.000052	False
18	666268910803644416	https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg	1	desktop_computer	0.086502	False	desk	0.085547	False	bookcase	0.079480	False
19	666273097616637952	https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg	1	Italian_greyhound	0.176053	True	toy_terrier	0.111884	True	basenji	0.111152	True
20	666287406224695296	https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg	1	Maltese_dog	0.857531	True	toy_poodle	0.063064	True	miniature_poodle	0.025581	True
21	666293911632134144	https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg	1	three-toed_sloth	0.914671	False	otter	0.015250	False	great_grey_owl	0.013207	False
22	666337882303524864	https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg	1	ox	0.416669	False	Newfoundland	0.278407	True	groenendael	0.102643	True
23	666345417576210432	https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg	1	golden_retriever	0.858744	True	Chesapeake_Bay_retriever	0.054787	True	Labrador_retriever	0.014241	True
24	666353288456101888	https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg	1	malamute	0.336874	True	Siberian_husky	0.147655	True	Eskimo_dog	0.093412	True
25	666362758909284353	https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg	1	guinea_pig	0.996496	False	skunk	0.002402	False	hamster	0.000461	False
26	666373753744588802	https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg	1	soft-coated_wheaten_terrier	0.326467	True	Afghan_hound	0.259551	True	briard	0.206803	True
27	666396247373291520	https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg	1	Chihuahua	0.978108	True	toy_terrier	0.009397	True	papillon	0.004577	True
28	666407126856765440	https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg	1	black-and-tan_coonhound	0.529139	True	bloodhound	0.244220	True	flat-coated_retriever	0.173810	True
29	666411507551481857	https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg	1	coho	0.404640	False	barracouta	0.271485	False	gar	0.189945	False
...	...	...	...	...	...	...	...	...	...	...	...	...
2045	886366144734445568	https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg	1	French_bulldog	0.999201	True	Chihuahua	0.000361	True	Boston_bull	0.000076	True
2046	886680336477933568	https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg	1	convertible	0.738995	False	sports_car	0.139952	False	car_wheel	0.044173	False
2047	886736880519319552	https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg	1	kuvasz	0.309706	True	Great_Pyrenees	0.186136	True	Dandie_Dinmont	0.086346	True
2048	886983233522544640	https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg	2	Chihuahua	0.793469	True	toy_terrier	0.143528	True	can_opener	0.032253	False
2049	887101392804085760	https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg	1	Samoyed	0.733942	True	Eskimo_dog	0.035029	True	Staffordshire_bullterrier	0.029705	True
2050	887343217045368832	https://pbs.twimg.com/ext_tw_video_thumb/88734...	1	Mexican_hairless	0.330741	True	sea_lion	0.275645	False	Weimaraner	0.134203	True
2051	887473957103951883	https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg	2	Pembroke	0.809197	True	Rhodesian_ridgeback	0.054950	True	beagle	0.038915	True
2052	887517139158093824	https://pbs.twimg.com/ext_tw_video_thumb/88751...	1	limousine	0.130432	False	tow_truck	0.029175	False	shopping_cart	0.026321	False
2053	887705289381826560	https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg	1	basset	0.821664	True	redbone	0.087582	True	Weimaraner	0.026236	True
2054	888078434458587136	https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg	1	French_bulldog	0.995026	True	pug	0.000932	True	bull_mastiff	0.000903	True
2055	888202515573088257	https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg	2	Pembroke	0.809197	True	Rhodesian_ridgeback	0.054950	True	beagle	0.038915	True
2056	888554962724278272	https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg	3	Siberian_husky	0.700377	True	Eskimo_dog	0.166511	True	malamute	0.111411	True
2057	888804989199671297	https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg	1	golden_retriever	0.469760	True	Labrador_retriever	0.184172	True	English_setter	0.073482	True
2058	888917238123831296	https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg	1	golden_retriever	0.714719	True	Tibetan_mastiff	0.120184	True	Labrador_retriever	0.105506	True
2059	889278841981685760	https://pbs.twimg.com/ext_tw_video_thumb/88927...	1	whippet	0.626152	True	borzoi	0.194742	True	Saluki	0.027351	True
2060	889531135344209921	https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg	1	golden_retriever	0.953442	True	Labrador_retriever	0.013834	True	redbone	0.007958	True
2061	889638837579907072	https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg	1	French_bulldog	0.991650	True	boxer	0.002129	True	Staffordshire_bullterrier	0.001498	True
2062	889665388333682689	https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg	1	Pembroke	0.966327	True	Cardigan	0.027356	True	basenji	0.004633	True
2063	889880896479866881	https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg	1	French_bulldog	0.377417	True	Labrador_retriever	0.151317	True	muzzle	0.082981	False
2064	890006608113172480	https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg	1	Samoyed	0.957979	True	Pomeranian	0.013884	True	chow	0.008167	True
2065	890240255349198849	https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg	1	Pembroke	0.511319	True	Cardigan	0.451038	True	Chihuahua	0.029248	True
2066	890609185150312448	https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg	1	Irish_terrier	0.487574	True	Irish_setter	0.193054	True	Chesapeake_Bay_retriever	0.118184	True
2067	890729181411237888	https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg	2	Pomeranian	0.566142	True	Eskimo_dog	0.178406	True	Pembroke	0.076507	True
2068	890971913173991426	https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg	1	Appenzeller	0.341703	True	Border_collie	0.199287	True	ice_lolly	0.193548	False
2069	891087950875897856	https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg	1	Chesapeake_Bay_retriever	0.425595	True	Irish_terrier	0.116317	True	Indian_elephant	0.076902	False
2070	891327558926688256	https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg	2	basset	0.555712	True	English_springer	0.225770	True	German_short-haired_pointer	0.175219	True
2071	891689557279858688	https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg	1	paper_towel	0.170278	False	Labrador_retriever	0.168086	True	spatula	0.040836	False
2072	891815181378084864	https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg	1	Chihuahua	0.716012	True	malamute	0.078253	True	kelpie	0.031379	True
2073	892177421306343426	https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg	1	Chihuahua	0.323581	True	Pekinese	0.090647	True	papillon	0.068957	True
2074	892420643555336193	https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg	1	orange	0.097049	False	bagel	0.085851	False	banana	0.076110	False

	id_str	favorite_count	retweet_count	followers_count	friends_count
0	892420643555336193	33873	7034	9257748	21
1	892177421306343426	29384	5309	9257748	21
2	891815181378084864	22099	3490	9257748	21
3	891689557279858688	36998	7243	9257748	21
4	891327558926688256	35376	7784	9257748	21
5	891087950875897856	17845	2609	9257748	21
6	890971913173991426	10391	1674	9257748	21
7	890729181411237888	56951	15802	9257748	21
8	890609185150312448	24564	3635	9257748	21
9	890240255349198849	28002	6117	9257747	21
10	890006608113172480	27086	6166	9257747	21
11	889880896479866881	24608	4177	9257747	21
12	889665388333682689	42129	8371	9257747	21
13	889638837579907072	23721	3724	9257747	21
14	889531135344209921	13372	1885	9257747	21
15	889278841981685760	22165	4457	9257747	21
16	888917238123831296	25692	3774	9257747	21
17	888804989199671297	22513	3549	9257747	21
18	888554962724278272	17356	2887	9257747	21
19	888078434458587136	19179	2904	9257747	21
20	887705289381826560	26675	4548	9257747	21
21	887517139158093824	40809	9878	9257747	21
22	887473957103951883	60302	15068	9257747	21
23	887343217045368832	29632	8833	9257747	21
24	887101392804085760	27021	5003	9257747	21
25	886983233522544640	30458	6333	9257747	21
26	886736880519319552	10519	2636	9257747	21
27	886680336477933568	19780	3749	9257747	21
28	886366144734445568	18580	2635	9257747	21
29	886267009285017600	105	4	9257747	21
...	...	...	...	...	...
2297	666411507551481857	372	263	9257757	21
2298	666407126856765440	93	30	9257757	21
2299	666396247373291520	147	69	9257757	21
2300	666373753744588802	163	73	9257758	21
2301	666362758909284353	651	471	9257758	21
2302	666353288456101888	179	58	9257758	21
2303	666345417576210432	246	126	9257758	21
2304	666337882303524864	168	79	9257758	21
2305	666293911632134144	427	292	9257758	21
2306	666287406224695296	123	57	9257758	21
2307	666273097616637952	152	68	9257758	21
2308	666268910803644416	100	38	9257758	21
2309	666104133288665088	12934	5506	9257758	21
2310	666102155909144576	66	11	9257758	21
2311	666099513787052032	134	53	9257758	21
2312	666094000022159362	143	63	9257758	21
2313	666082916733198337	93	37	9257758	21
2314	666073100786774016	274	132	9257758	21
2315	666071193221509120	127	51	9257758	21
2316	666063827256086533	400	181	9257758	21
2317	666058600524156928	99	47	9257758	21
2318	666057090499244032	249	112	9258176	21
2319	666055525042405380	367	197	9257758	21
2320	666051853826850816	1031	707	9257758	21
2321	666050758794694657	115	50	9257758	21
2322	666049248165822465	89	37	9257758	21
2323	666044226329800704	247	115	9257758	21
2324	666033412701032449	100	36	9257758	21
2325	666029285002620928	112	39	9257758	21
2326	666020888022790149	2296	423	9257758	21

	tweet_id	jpg_url	img_num	p1	p1_conf	p1_dog	p2	p2_conf	p2_dog	p3	p3_conf	p3_dog
0	666020888022790149	https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg	1	Welsh_springer_spaniel	0.465074	True	collie	0.156665	True	Shetland_sheepdog	0.061428	True
1	666029285002620928	https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg	1	redbone	0.506826	True	miniature_pinscher	0.074192	True	Rhodesian_ridgeback	0.072010	True
2	666033412701032449	https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg	1	German_shepherd	0.596461	True	malinois	0.138584	True	bloodhound	0.116197	True
3	666044226329800704	https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg	1	Rhodesian_ridgeback	0.408143	True	redbone	0.360687	True	miniature_pinscher	0.222752	True
4	666049248165822465	https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg	1	miniature_pinscher	0.560311	True	Rottweiler	0.243682	True	Doberman	0.154629	True

	id_str	favorite_count	retweet_count	followers_count	friends_count
0	892420643555336193	33873	7034	9257748	21
1	892177421306343426	29384	5309	9257748	21
2	891815181378084864	22099	3490	9257748	21
3	891689557279858688	36998	7243	9257748	21
4	891327558926688256	35376	7784	9257748	21

	favorite_count	retweet_count	followers_count	friends_count
count	2327.00000	2327.000000	2.327000e+03	2327.0
mean	7068.45853	2478.686721	9.257762e+06	21.0
std	10983.78533	4197.058465	5.827342e+01	0.0
min	0.00000	1.000000	9.257747e+06	21.0
25%	1227.00000	497.500000	9.257750e+06	21.0
50%	3055.00000	1152.000000	9.257757e+06	21.0
75%	8621.00000	2872.000000	9.257757e+06	21.0
max	145178.00000	70966.000000	9.258238e+06	21.0

	tweet_id	in_reply_to_status_id	in_reply_to_user_id	timestamp	source	text	retweeted_status_id	retweeted_status_user_id	retweeted_status_timestamp	expanded_urls	rating_numerator	rating_denominator	name	doggo	floofer	pupper	puppo
0	892420643555336193	NaN	NaN	2017-08-01 16:23:56 +0000	<a href="http://twitter.com/download/iphone" r...	This is Phineas. He's a mystical boy. Only eve...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/892420643...	13	10	Phineas	None	None	None	None
1	892177421306343426	NaN	NaN	2017-08-01 00:17:27 +0000	<a href="http://twitter.com/download/iphone" r...	This is Tilly. She's just checking pup on you....	NaN	NaN	NaN	https://twitter.com/dog_rates/status/892177421...	13	10	Tilly	None	None	None	None
2	891815181378084864	NaN	NaN	2017-07-31 00:18:03 +0000	<a href="http://twitter.com/download/iphone" r...	This is Archie. He is a rare Norwegian Pouncin...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/891815181...	12	10	Archie	None	None	None	None
3	891689557279858688	NaN	NaN	2017-07-30 15:58:51 +0000	<a href="http://twitter.com/download/iphone" r...	This is Darla. She commenced a snooze mid meal...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/891689557...	13	10	Darla	None	None	None	None
4	891327558926688256	NaN	NaN	2017-07-29 16:00:24 +0000	<a href="http://twitter.com/download/iphone" r...	This is Franklin. He would like you to stop ca...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/891327558...	12	10	Franklin	None	None	None	None
5	891087950875897856	NaN	NaN	2017-07-29 00:08:17 +0000	<a href="http://twitter.com/download/iphone" r...	Here we have a majestic great white breaching ...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/891087950...	13	10	None	None	None	None	None
6	890971913173991426	NaN	NaN	2017-07-28 16:27:12 +0000	<a href="http://twitter.com/download/iphone" r...	Meet Jax. He enjoys ice cream so much he gets ...	NaN	NaN	NaN	https://gofundme.com/ydvmve-surgery-for-jax,ht...	13	10	Jax	None	None	None	None
7	890729181411237888	NaN	NaN	2017-07-28 00:22:40 +0000	<a href="http://twitter.com/download/iphone" r...	When you watch your owner call another dog a g...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/890729181...	13	10	None	None	None	None	None
8	890609185150312448	NaN	NaN	2017-07-27 16:25:51 +0000	<a href="http://twitter.com/download/iphone" r...	This is Zoey. She doesn't want to be one of th...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/890609185...	13	10	Zoey	None	None	None	None
9	890240255349198849	NaN	NaN	2017-07-26 15:59:51 +0000	<a href="http://twitter.com/download/iphone" r...	This is Cassie. She is a college pup. Studying...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/890240255...	14	10	Cassie	doggo	None	None	None
10	890006608113172480	NaN	NaN	2017-07-26 00:31:25 +0000	<a href="http://twitter.com/download/iphone" r...	This is Koda. He is a South Australian decksha...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/890006608...	13	10	Koda	None	None	None	None
11	889880896479866881	NaN	NaN	2017-07-25 16:11:53 +0000	<a href="http://twitter.com/download/iphone" r...	This is Bruno. He is a service shark. Only get...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/889880896...	13	10	Bruno	None	None	None	None
12	889665388333682689	NaN	NaN	2017-07-25 01:55:32 +0000	<a href="http://twitter.com/download/iphone" r...	Here's a puppo that seems to be on the fence a...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/889665388...	13	10	None	None	None	None	puppo
13	889638837579907072	NaN	NaN	2017-07-25 00:10:02 +0000	<a href="http://twitter.com/download/iphone" r...	This is Ted. He does his best. Sometimes that'...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/889638837...	12	10	Ted	None	None	None	None
14	889531135344209921	NaN	NaN	2017-07-24 17:02:04 +0000	<a href="http://twitter.com/download/iphone" r...	This is Stuart. He's sporting his favorite fan...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/889531135...	13	10	Stuart	None	None	None	puppo
15	889278841981685760	NaN	NaN	2017-07-24 00:19:32 +0000	<a href="http://twitter.com/download/iphone" r...	This is Oliver. You're witnessing one of his m...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/889278841...	13	10	Oliver	None	None	None	None
16	888917238123831296	NaN	NaN	2017-07-23 00:22:39 +0000	<a href="http://twitter.com/download/iphone" r...	This is Jim. He found a fren. Taught him how t...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/888917238...	12	10	Jim	None	None	None	None
17	888804989199671297	NaN	NaN	2017-07-22 16:56:37 +0000	<a href="http://twitter.com/download/iphone" r...	This is Zeke. He has a new stick. Very proud o...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/888804989...	13	10	Zeke	None	None	None	None
18	888554962724278272	NaN	NaN	2017-07-22 00:23:06 +0000	<a href="http://twitter.com/download/iphone" r...	This is Ralphus. He's powering up. Attempting ...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/888554962...	13	10	Ralphus	None	None	None	None
19	888202515573088257	NaN	NaN	2017-07-21 01:02:36 +0000	<a href="http://twitter.com/download/iphone" r...	RT @dog_rates: This is Canela. She attempted s...	8.874740e+17	4.196984e+09	2017-07-19 00:47:34 +0000	https://twitter.com/dog_rates/status/887473957...	13	10	Canela	None	None	None	None
20	888078434458587136	NaN	NaN	2017-07-20 16:49:33 +0000	<a href="http://twitter.com/download/iphone" r...	This is Gerald. He was just told he didn't get...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/888078434...	12	10	Gerald	None	None	None	None
21	887705289381826560	NaN	NaN	2017-07-19 16:06:48 +0000	<a href="http://twitter.com/download/iphone" r...	This is Jeffrey. He has a monopoly on the pool...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/887705289...	13	10	Jeffrey	None	None	None	None
22	887517139158093824	NaN	NaN	2017-07-19 03:39:09 +0000	<a href="http://twitter.com/download/iphone" r...	I've yet to rate a Venezuelan Hover Wiener. Th...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/887517139...	14	10	such	None	None	None	None
23	887473957103951883	NaN	NaN	2017-07-19 00:47:34 +0000	<a href="http://twitter.com/download/iphone" r...	This is Canela. She attempted some fancy porch...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/887473957...	13	10	Canela	None	None	None	None
24	887343217045368832	NaN	NaN	2017-07-18 16:08:03 +0000	<a href="http://twitter.com/download/iphone" r...	You may not have known you needed to see this ...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/887343217...	13	10	None	None	None	None	None
25	887101392804085760	NaN	NaN	2017-07-18 00:07:08 +0000	<a href="http://twitter.com/download/iphone" r...	This... is a Jubilant Antarctic House Bear. We...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/887101392...	12	10	None	None	None	None	None
26	886983233522544640	NaN	NaN	2017-07-17 16:17:36 +0000	<a href="http://twitter.com/download/iphone" r...	This is Maya. She's very shy. Rarely leaves he...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/886983233...	13	10	Maya	None	None	None	None
27	886736880519319552	NaN	NaN	2017-07-16 23:58:41 +0000	<a href="http://twitter.com/download/iphone" r...	This is Mingus. He's a wonderful father to his...	NaN	NaN	NaN	https://www.gofundme.com/mingusneedsus,https:/...	13	10	Mingus	None	None	None	None
28	886680336477933568	NaN	NaN	2017-07-16 20:14:00 +0000	<a href="http://twitter.com/download/iphone" r...	This is Derek. He's late for a dog meeting. 13...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/886680336...	13	10	Derek	None	None	None	None
29	886366144734445568	NaN	NaN	2017-07-15 23:25:31 +0000	<a href="http://twitter.com/download/iphone" r...	This is Roscoe. Another pupper fallen victim t...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/886366144...	12	10	Roscoe	None	None	pupper	None
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2326	666411507551481857	NaN	NaN	2015-11-17 00:24:19 +0000	<a href="http://twitter.com/download/iphone" r...	This is quite the dog. Gets really excited whe...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666411507...	2	10	quite	None	None	None	None
2327	666407126856765440	NaN	NaN	2015-11-17 00:06:54 +0000	<a href="http://twitter.com/download/iphone" r...	This is a southern Vesuvius bumblegruff. Can d...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666407126...	7	10	a	None	None	None	None
2328	666396247373291520	NaN	NaN	2015-11-16 23:23:41 +0000	<a href="http://twitter.com/download/iphone" r...	Oh goodness. A super rare northeast Qdoba kang...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666396247...	9	10	None	None	None	None	None
2329	666373753744588802	NaN	NaN	2015-11-16 21:54:18 +0000	<a href="http://twitter.com/download/iphone" r...	Those are sunglasses and a jean jacket. 11/10 ...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666373753...	11	10	None	None	None	None	None
2330	666362758909284353	NaN	NaN	2015-11-16 21:10:36 +0000	<a href="http://twitter.com/download/iphone" r...	Unique dog here. Very small. Lives in containe...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666362758...	6	10	None	None	None	None	None
2331	666353288456101888	NaN	NaN	2015-11-16 20:32:58 +0000	<a href="http://twitter.com/download/iphone" r...	Here we have a mixed Asiago from the Galápagos...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666353288...	8	10	None	None	None	None	None
2332	666345417576210432	NaN	NaN	2015-11-16 20:01:42 +0000	<a href="http://twitter.com/download/iphone" r...	Look at this jokester thinking seat belt laws ...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666345417...	10	10	None	None	None	None	None
2333	666337882303524864	NaN	NaN	2015-11-16 19:31:45 +0000	<a href="http://twitter.com/download/iphone" r...	This is an extremely rare horned Parthenon. No...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666337882...	9	10	an	None	None	None	None
2334	666293911632134144	NaN	NaN	2015-11-16 16:37:02 +0000	<a href="http://twitter.com/download/iphone" r...	This is a funny dog. Weird toes. Won't come do...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666293911...	3	10	a	None	None	None	None
2335	666287406224695296	NaN	NaN	2015-11-16 16:11:11 +0000	<a href="http://twitter.com/download/iphone" r...	This is an Albanian 3 1/2 legged Episcopalian...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666287406...	1	2	an	None	None	None	None
2336	666273097616637952	NaN	NaN	2015-11-16 15:14:19 +0000	<a href="http://twitter.com/download/iphone" r...	Can take selfies 11/10 https://t.co/ws2AMaNwPW	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666273097...	11	10	None	None	None	None	None
2337	666268910803644416	NaN	NaN	2015-11-16 14:57:41 +0000	<a href="http://twitter.com/download/iphone" r...	Very concerned about fellow dog trapped in com...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666268910...	10	10	None	None	None	None	None
2338	666104133288665088	NaN	NaN	2015-11-16 04:02:55 +0000	<a href="http://twitter.com/download/iphone" r...	Not familiar with this breed. No tail (weird)....	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666104133...	1	10	None	None	None	None	None
2339	666102155909144576	NaN	NaN	2015-11-16 03:55:04 +0000	<a href="http://twitter.com/download/iphone" r...	Oh my. Here you are seeing an Adobe Setter giv...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666102155...	11	10	None	None	None	None	None
2340	666099513787052032	NaN	NaN	2015-11-16 03:44:34 +0000	<a href="http://twitter.com/download/iphone" r...	Can stand on stump for what seems like a while...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666099513...	8	10	None	None	None	None	None
2341	666094000022159362	NaN	NaN	2015-11-16 03:22:39 +0000	<a href="http://twitter.com/download/iphone" r...	This appears to be a Mongolian Presbyterian mi...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666094000...	9	10	None	None	None	None	None
2342	666082916733198337	NaN	NaN	2015-11-16 02:38:37 +0000	<a href="http://twitter.com/download/iphone" r...	Here we have a well-established sunblockerspan...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666082916...	6	10	None	None	None	None	None
2343	666073100786774016	NaN	NaN	2015-11-16 01:59:36 +0000	<a href="http://twitter.com/download/iphone" r...	Let's hope this flight isn't Malaysian (lol). ...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666073100...	10	10	None	None	None	None	None
2344	666071193221509120	NaN	NaN	2015-11-16 01:52:02 +0000	<a href="http://twitter.com/download/iphone" r...	Here we have a northern speckled Rhododendron....	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666071193...	9	10	None	None	None	None	None
2345	666063827256086533	NaN	NaN	2015-11-16 01:22:45 +0000	<a href="http://twitter.com/download/iphone" r...	This is the happiest dog you will ever see. Ve...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666063827...	10	10	the	None	None	None	None
2346	666058600524156928	NaN	NaN	2015-11-16 01:01:59 +0000	<a href="http://twitter.com/download/iphone" r...	Here is the Rand Paul of retrievers folks! He'...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666058600...	8	10	the	None	None	None	None
2347	666057090499244032	NaN	NaN	2015-11-16 00:55:59 +0000	<a href="http://twitter.com/download/iphone" r...	My oh my. This is a rare blond Canadian terrie...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666057090...	9	10	a	None	None	None	None
2348	666055525042405380	NaN	NaN	2015-11-16 00:49:46 +0000	<a href="http://twitter.com/download/iphone" r...	Here is a Siberian heavily armored polar bear ...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666055525...	10	10	a	None	None	None	None
2349	666051853826850816	NaN	NaN	2015-11-16 00:35:11 +0000	<a href="http://twitter.com/download/iphone" r...	This is an odd dog. Hard on the outside but lo...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666051853...	2	10	an	None	None	None	None
2350	666050758794694657	NaN	NaN	2015-11-16 00:30:50 +0000	<a href="http://twitter.com/download/iphone" r...	This is a truly beautiful English Wilson Staff...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666050758...	10	10	a	None	None	None	None
2351	666049248165822465	NaN	NaN	2015-11-16 00:24:50 +0000	<a href="http://twitter.com/download/iphone" r...	Here we have a 1949 1st generation vulpix. Enj...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666049248...	5	10	None	None	None	None	None
2352	666044226329800704	NaN	NaN	2015-11-16 00:04:52 +0000	<a href="http://twitter.com/download/iphone" r...	This is a purebred Piers Morgan. Loves to Netf...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666044226...	6	10	a	None	None	None	None
2353	666033412701032449	NaN	NaN	2015-11-15 23:21:54 +0000	<a href="http://twitter.com/download/iphone" r...	Here is a very happy pup. Big fan of well-main...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666033412...	9	10	a	None	None	None	None
2354	666029285002620928	NaN	NaN	2015-11-15 23:05:30 +0000	<a href="http://twitter.com/download/iphone" r...	This is a western brown Mitsubishi terrier. Up...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666029285...	7	10	a	None	None	None	None
2355	666020888022790149	NaN	NaN	2015-11-15 22:32:08 +0000	<a href="http://twitter.com/download/iphone" r...	Here we have a Japanese Irish Setter. Lost eye...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/666020888...	8	10	None	None	None	None	None

	tweet_id	in_reply_to_status_id	in_reply_to_user_id	timestamp	source	text	retweeted_status_id	retweeted_status_user_id	retweeted_status_timestamp	expanded_urls	rating_numerator	rating_denominator	name	doggo	floofer	pupper	puppo
1416	698635131305795584	NaN	NaN	2016-02-13 22:29:29 +0000	<a href="http://twitter.com/download/iphone" r...	Here we are witnessing five Guatemalan Birch F...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/698635131...	12	10	None	None	None	None	None
1178	719551379208073216	NaN	NaN	2016-04-11 15:43:12 +0000	<a href="http://twitter.com/download/iphone" r...	This is Harnold. He accidentally opened the fr...	NaN	NaN	NaN	https://twitter.com/dog_rates/status/719551379...	10	10	Harnold	None	None	None	None
627	794983741416415232	NaN	NaN	2016-11-05 19:24:28 +0000	<a href="http://twitter.com/download/iphone" r...	RT @dog_rates: This is Rizzy. She smiles a lot...	7.895309e+17	4.196984e+09	2016-10-21 18:16:44 +0000	https://twitter.com/dog_rates/status/789530877...	12	10	Rizzy	None	None	None	None
1834	676121918416756736	NaN	NaN	2015-12-13 19:30:01 +0000	<a href="http://vine.co" rel="nofollow">Vine -...	Here we are witnessing a very excited dog. Cle...	NaN	NaN	NaN	https://vine.co/v/iZXg7VpeDAv	8	10	None	None	None	None	None

	tweet_id	timestamp	source	text	expanded_urls	rating_numerator	rating_denominator	name	doggo	floofer	pupper	puppo
234	847617282490613760	2017-03-31 01:11:22	<a href="http://twitter.com/download/iphone" r...	.@breaannanicolee PUPDATE: Cannon has a heart ...	NaN	13	10	None	None	None	None	None
1939	673887867907739649	2015-12-07 15:32:42	<a href="http://twitter.com/download/iphone" r...	When you're having a great time sleeping and y...	https://twitter.com/dog_rates/status/673887867...	10	10	None	None	None	None	None

	tweet_id	jpg_url	img_num	p1	p1_conf	p1_dog	p2	p2_conf	p2_dog	p3	p3_conf	p3_dog
1959	673576835670777856	https://pbs.twimg.com/media/DAOmEZiXYAAcv2S.jpg	1	Golden Retriever	0.673664	True	Kuvasz	0.157523	True	Labrador Retriever	0.126073	True
1430	697575480820686848	https://pbs.twimg.com/media/CrrtqjdXEAINleR.jpg	1	Blenheim Spaniel	0.427766	True	Shih-Tzu	0.219256	True	Welsh Springer Spaniel	0.144614	True
1884	674800520222154752	https://pbs.twimg.com/media/C8NNUDBUMAE0XxJ.jpg	1	Cardigan	0.413688	True	Boston Bull	0.381836	True	Doormat	0.065868	False
928	754747087846248448	https://pbs.twimg.com/media/CcBwOn0XEAA7bNQ.jpg	1	Golden Retriever	0.514085	True	Chesapeake Bay Retriever	0.173224	True	Brittany Spaniel	0.118384	True

Project: Wrangle and Analyze Data¶

Data Gathering¶

Assessing Data¶

Quality issues¶

Tidiness issues¶

Cleaning Data¶

There are retweets, and the columns: in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id', 'retweeted_status_user_id','retweeted_status_timestamp' have plenty of empty values¶

Define¶

Code¶

Test¶

Define:¶

Code¶

Test¶

timestamp values have a +0000 attached¶

Define¶

Code¶

Test¶

Erroneous datatype in tweet_id and timestamp¶

Define¶

Code¶

Test¶

Missing column for the fraction: rating_numerator / rating_denominator¶

Define¶

Code¶

Test¶

Incorrect dog names listed as a, the and an¶

Define¶

Code¶

Test¶

Source data has unwanted link tags attached¶

Define¶

Code¶

Test¶

Missing dog names listed as None instead of NaN¶

Define¶

Code¶

Test¶

The dog breed names are separated by a '_' instead of space¶

Define¶

Code¶

Test¶

Duplicated urls in jpg_url¶

Define¶

Code¶

Test¶

Naming convection of id_str not consistent with tweet_id¶

Define¶

Code¶

Test¶

The columns doggo, puppo, pupper, floofer should be combined into a single column¶

Define¶

Code¶

Test¶

All the 3 tables should be joined to form one dataset¶

Define¶

Code¶

Test¶

Storing Data¶

Analyzing and Visualizing Data¶

Insights:¶

Visualization¶

There are retweets, and the columns: `in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id', 'retweeted_status_user_id','retweeted_status_timestamp'` have plenty of empty values¶

`timestamp` values have a +0000 attached¶

Erroneous datatype in `tweet_id` and `timestamp`¶

Incorrect dog names listed as `a`, `the` and `an`¶

Naming convection of `id_str` not consistent with `tweet_id`¶