tweeta¶
tweeta.text¶
This module contains utilities related to processing tweets
Extract hashtags from the text
-
tweeta.text.
extract_mentions
(text)¶ Extract mentions from the text
-
tweeta.text.
fix_text
(text)¶ ftfy.fix_text and remove linebreaks
-
tweeta.text.
has_url
(text)¶
-
tweeta.text.
lang
(text)¶ lang detection based on the text
-
tweeta.text.
remove_lb
(text)¶ Remove linebreaks
-
tweeta.text.
remove_mentions
(text)¶ Remove mentions from the text
-
tweeta.text.
remove_url
(text)¶ Remove url from text
-
tweeta.text.
replace_slash
(text, sub=' ')¶
-
tweeta.text.
sanitize_nofunccall
(text)¶ This is equivailant to replace_slash(remove_url(fix_text(text)))
tweeta.tweet¶
This module contains functionality for extracting various data elements from a Tweet object
-
class
tweeta.tweet.
TweetaTweet
(in_data)¶ Bases:
object
-
created_at
(output_time_format='YMD')¶ Raw created_at are in constants.PARSE_TIME_FORMAT. This converts the datetime to other formats, e.g., YMD (predefined) or %Y-%m-%d (user defined)
-
fixed_text
()¶ Fix some of the unicodes, and remove linebreaks [ftfy.fix_text(remove_lb(text)))]
-
get
(field_name)¶ get arbitariry field values
-
has_quoted_status
()¶ Whether the quoted_status is not null Note that quote is different from retweet, is_quote == has_quoted_status
-
has_retweeted_status
()¶ Whether the retweeted_status is not null
-
has_url
()¶ Whether the tweet contains urls (check entities first)
Use mentiones from entities first otherwise use text
-
is_deleted
()¶ Whether th tweet has been deleted. If the tweet is deleted, none of the other attributes will be populated
-
is_en
()¶ Wether the tweet is written in English Use the lang attribute first if it exists, otherwise use langid
-
is_geotagged
()¶ Whether the tweet has been geo-tagged (either has geo or coordinates or place)
-
is_quote
()¶ Whether the tweet is a quote of another tweet (quoted_status is not null
-
is_retweet
()¶ Whether the tweet is a retweet (either start with (‘RT|Rt|rT|rt @’) or retweeted_status is not null ) Note that there might be cases where a tweet is a retweet (starts with RT), but retweeted_status is not filled Note that only ‘RT @’ is a true retweet. There are cases where tweets started with ‘RT’ but they are not retweets. e.g., “RT IF U NOT FRIENDLY..”
-
is_user_en
()¶ Whether the user is English speaking
-
is_valid
()¶ Whether the tweet contains all the root elements (‘text’ in tweet and ‘id’ in tweet and ‘created_at’ in tweet and ‘user’ in tweet)
-
json
()¶ Get the raw json
-
mentions
()¶ Use mentiones from entities first otherwise use text
-
text
()¶ Take full_text from extended tweet (default compatable mode for streaming api or ‘full_text’ in tweet, which replaces ‘text’ when use extended mode) https://developer.twitter.com/en/docs/tweets/tweet-updates
-
tweet
()¶ return the raw tweet object
-
tweet_id
()¶ Get tweet id (from ‘id_str’ first if avaliable) otherwise use ‘id’
-
user_description
()¶ Get user description, return empty string if it doesn’t exist
-
user_id
()¶ Get user id (from tweet[‘user’][‘id_str’] first)
-
user_location
()¶ Get user location, return empty string if it doesn’t exist
-
user_name
()¶ Get user name, return empty string if it doesn’t exist
-
user_screen_name
()¶ Get user screen name, return empty string if it doesn’t exist
-
tweeta.constants¶
Various constants related to Tweet